Title: | Scale Functions for Visualization |
---|---|
Description: | Graphical scales map data to aesthetics, and provide methods for automatically determining breaks and labels for axes and legends. |
Authors: | Hadley Wickham [aut], Thomas Lin Pedersen [cre, aut] , Dana Seidel [aut], Posit, PBC [cph, fnd] |
Maintainer: | Thomas Lin Pedersen <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.3.0.9000 |
Built: | 2024-03-27 05:58:01 UTC |
Source: | https://github.com/r-lib/scales |
Vectorised in both colour and alpha.
alpha(colour, alpha = NA)
colour |
colour |
alpha |
new alpha level in [0,1]. If alpha is |
alpha("red", 0.1)
alpha(colours(), 0.5)
alpha("red", seq(0, 1, length.out = 10))
alpha(c("first" = "gold", "second" = "lightgray", "third" = "#cd7f32"), .5)
Uses Wilkinson's extended breaks algorithm as implemented in the labeling package.
breaks_extended(n = 5, ...)
n |
Desired number of breaks. You may get slightly more or fewer breaks that requested. |
... |
other arguments passed on to |
Talbot, J., Lin, S., Hanrahan, P. (2010) An Extension of Wilkinson's Algorithm for Positioning Tick Labels on Axes, InfoVis 2010 http://vis.stanford.edu/files/2010-TickLabels-InfoVis.pdf.
demo_continuous(c(0, 10))
demo_continuous(c(0, 10), breaks = breaks_extended(3))
demo_continuous(c(0, 10), breaks = breaks_extended(10))
This algorithm starts by looking for integer powers of base
. If that
doesn't provide enough breaks, it then looks for additional intermediate
breaks which are integer multiples of integer powers of base. If that fails
(which it can for very small ranges), we fall back to extended_breaks()
breaks_log(n = 5, base = 10)
n |
desired number of breaks |
base |
base of logarithm to use |
The algorithm starts by looking for a set of integer powers of base
that
cover the range of the data. If that does not generate at least n - 2
breaks, we look for an integer between 1 and base
that splits the interval
approximately in half. For example, in the case of base = 10
, this integer
is 3 because log10(3) = 0.477
. This leaves 2 intervals: c(1, 3)
and
c(3, 10)
. If we still need more breaks, we look for another integer
that splits the largest remaining interval (on the log-scale) approximately
in half. For base = 10
, this is 5 because log10(5) = 0.699
.
The generic algorithm starts with a set of integers steps
containing
only 1 and a set of candidate integers containing all integers larger than 1
and smaller than base
. Then for each remaining candidate integer
x
, the smallest interval (on the log-scale) in the vector
sort(c(x, steps, base))
is calculated. The candidate x
which
yields the largest minimal interval is added to steps
and removed from
the candidate set. This is repeated until either a sufficient number of
breaks, >= n-2
, are returned or all candidates have been used.
demo_log10(c(1, 1e5))
demo_log10(c(1, 1e6))
# Request more breaks by setting n
demo_log10(c(1, 1e6), breaks = breaks_log(6))
# Some tricky ranges
demo_log10(c(2000, 9000))
demo_log10(c(2000, 14000))
demo_log10(c(2000, 85000), expand = c(0, 0))
# An even smaller range that requires falling back to linear breaks
demo_log10(c(1800, 2000))
Uses default R break algorithm as implemented in pretty()
. This is
primarily useful for date/times, as extended_breaks()
should do a slightly
better job for numeric scales.
breaks_pretty(n = 5, ...)
n |
Desired number of breaks. You may get slightly more or fewer breaks that requested. |
... |
other arguments passed on to |
one_month <- as.POSIXct(c("2020-05-01", "2020-06-01"))
demo_datetime(one_month)
demo_datetime(one_month, breaks = breaks_pretty(2))
demo_datetime(one_month, breaks = breaks_pretty(4))
# Tightly spaced date breaks often need custom labels too
demo_datetime(one_month, breaks = breaks_pretty(12))
demo_datetime(one_month,
breaks = breaks_pretty(12),
labels = label_date_short()
)
As timespan units span a variety of bases (1000 below seconds, 60 for second and minutes, 24 for hours, and 7 for days), the range of the input data determines the base used for calculating breaks
breaks_timespan(unit = c("secs", "mins", "hours", "days", "weeks"), n = 5)
unit |
The unit used to interpret numeric data input |
n |
Desired number of breaks. You may get slightly more or fewer breaks that requested. |
demo_timespan(seq(0, 100), breaks = breaks_timespan())
Useful for numeric, date, and date-time scales.
breaks_width(width, offset = 0)
width |
Distance between each break. Either a number, or for
date/times, a single string of the form |
offset |
Use if you don't want breaks to start at zero, or on a
conventional date or time boundary such as the 1st of January or midnight.
Either a number, or for date/times, a single string of the form
|
demo_continuous(c(0, 100))
demo_continuous(c(0, 100), breaks = breaks_width(10))
demo_continuous(c(0, 100), breaks = breaks_width(20, -4))
demo_continuous(c(0, 100), breaks = breaks_width(20, 4))
# This is also useful for dates
one_month <- as.POSIXct(c("2020-05-01", "2020-06-01"))
demo_datetime(one_month)
demo_datetime(one_month, breaks = breaks_width("1 week"))
demo_datetime(one_month, breaks = breaks_width("5 days"))
# This is so useful that scale_x_datetime() has a shorthand:
demo_datetime(one_month, date_breaks = "5 days")
# hms times also work
one_hour <- hms::hms(hours = 0:1)
demo_time(one_hour)
demo_time(one_hour, breaks = breaks_width("15 min"))
demo_time(one_hour, breaks = breaks_width("600 sec"))
# Offets are useful for years that begin on dates other than the 1st of
# January, such as the UK financial year, which begins on the 1st of April.
three_years <- as.POSIXct(c("2020-01-01", "2021-01-01", "2022-01-01"))
demo_datetime(
three_years,
breaks = breaks_width("1 year", offset = "3 months")
)
# The offset can be a vector, to create offsets that have compound units,
# such as the UK fiscal (tax) year, which begins on the 6th of April.
demo_datetime(
three_years,
breaks = breaks_width("1 year", offset = c("3 months", "5 days"))
)
Conveniently maps data values (numeric or factor/character) to colours according to a given palette, which can be provided in a variety of formats.
col_numeric(
palette,
domain,
na.color = "#808080",
alpha = FALSE,
reverse = FALSE
)
col_bin(
palette,
domain,
bins = 7,
pretty = TRUE,
na.color = "#808080",
alpha = FALSE,
reverse = FALSE,
right = FALSE
)
col_quantile(
palette,
domain,
n = 4,
probs = seq(0, 1, length.out = n + 1),
na.color = "#808080",
alpha = FALSE,
reverse = FALSE,
right = FALSE
)
col_factor(
palette,
domain,
levels = NULL,
ordered = FALSE,
na.color = "#808080",
alpha = FALSE,
reverse = FALSE
)
palette |
The colours or colour function that values will be mapped to |
domain |
The possible values that can be mapped. For If |
na.color |
The colour to return for |
alpha |
Whether alpha channels should be respected or ignored. If |
reverse |
Whether the colors (or color function) in |
bins |
Either a numeric vector of two or more unique cut points or a single number (greater than or equal to 2) giving the number of intervals into which the domain values are to be cut. |
pretty |
Whether to use the function |
right |
parameter supplied to |
n |
Number of equal-size quantiles desired. For more precise control,
use the |
probs |
See |
levels |
An alternate way of specifying levels; if specified, domain is ignored |
ordered |
If |
col_numeric
is a simple linear mapping from continuous numeric data
to an interpolated palette.
col_bin
also maps continuous numeric data, but performs
binning based on value (see the base::cut()
function). col_bin
defaults for the cut
function are include.lowest = TRUE
and
right = FALSE
.
col_quantile
similarly bins numeric data, but via the
stats::quantile()
function.
col_factor
maps factors to colours. If the palette is
discrete and has a different number of colours than the number of factors,
interpolation is used.
The palette
argument can be any of the following:
A character vector of RGB or named colours. Examples: palette()
, c("#000000", "#0000FF", "#FFFFFF")
, topo.colors(10)
The name of an RColorBrewer palette, e.g. "BuPu"
or "Greens"
.
The full name of a viridis palette: "viridis"
, "magma"
, "inferno"
, or "plasma"
.
A function that receives a single value between 0 and 1 and returns a colour. Examples: colorRamp(c("#000000", "#FFFFFF"), interpolate="spline")
.
A function that takes a single parameter x
; when called with a
vector of numbers (except for col_factor
, which expects
factors/characters), #RRGGBB colour strings are returned (unless
alpha = TRUE
in which case #RRGGBBAA may also be possible).
pal <- col_bin("Greens", domain = 0:100)
show_col(pal(sort(runif(10, 60, 100))))
# Exponential distribution, mapped continuously
show_col(col_numeric("Blues", domain = NULL)(sort(rexp(16))))
# Exponential distribution, mapped by interval
show_col(col_bin("Blues", domain = NULL, bins = 4)(sort(rexp(16))))
# Exponential distribution, mapped by quantile
show_col(col_quantile("Blues", domain = NULL)(sort(rexp(16))))
# Categorical data; by default, the values being coloured span the gamut...
show_col(col_factor("RdYlBu", domain = NULL)(LETTERS[1:5]))
# ...unless the data is a factor, without droplevels...
show_col(col_factor("RdYlBu", domain = NULL)(factor(LETTERS[1:5], levels = LETTERS)))
# ...or the domain is stated explicitly.
show_col(col_factor("RdYlBu", levels = LETTERS)(LETTERS[1:5]))
Transforms rgb to hcl, sets non-missing arguments and then backtransforms to rgb.
col2hcl(colour, h = NULL, c = NULL, l = NULL, alpha = NULL)
colour |
character vector of colours to be modified |
h |
Hue, |
c |
Chroma, |
l |
Luminance, |
alpha |
Alpha, |
reds <- rep("red", 6)
show_col(col2hcl(reds, h = seq(0, 180, length = 6)))
show_col(col2hcl(reds, c = seq(0, 80, length = 6)))
show_col(col2hcl(reds, l = seq(0, 100, length = 6)))
show_col(col2hcl(reds, alpha = seq(0, 1, length = 6)))
Returns a function that maps the interval [0,1] to a set of colours.
Interpolation is performed in the CIELAB colour space. Similar to
colorRamp(space = 'Lab')
, but hundreds of
times faster, and provides results in "#RRGGBB"
(or
"#RRGGBBAA"
) character form instead of RGB colour matrices.
colour_ramp(colors, na.color = NA, alpha = TRUE)
colors |
Colours to interpolate; must be a valid argument to
|
na.color |
The colour to map to |
alpha |
Whether to include alpha transparency channels in interpolation.
If |
A function that takes a numeric vector and returns a character vector of the same length with RGB or RGBA hex colours.
ramp <- colour_ramp(c("red", "green", "blue"))
show_col(ramp(seq(0, 1, length = 12)))
Continuous scale
cscale(x, palette, na.value = NA_real_, trans = transform_identity())
x |
vector of continuous values to scale |
palette |
palette to use. Built in palettes:
|
na.value |
value to use for missing values |
trans |
transformation object describing the how to transform the raw data prior to scaling. Defaults to the identity transformation which leaves the data unchanged. Built in transformations:
|
with(mtcars, plot(disp, mpg, cex = cscale(hp, pal_rescale())))
with(mtcars, plot(disp, mpg, cex = cscale(hp, pal_rescale(),
trans = transform_sqrt()
)))
with(mtcars, plot(disp, mpg, cex = cscale(hp, pal_area())))
with(mtcars, plot(disp, mpg,
pch = 20, cex = 5,
col = cscale(hp, pal_seq_gradient("grey80", "black"))
))
Discrete scale
dscale(x, palette, na.value = NA)
x |
vector of discrete values to scale |
palette |
aesthetic palette to use |
na.value |
aesthetic to use for missing values |
with(mtcars, plot(disp, mpg,
pch = 20, cex = 3,
col = dscale(factor(cyl), pal_brewer())
))
Expand a range with a multiplicative or additive constant
expand_range(range, mul = 0, add = 0, zero_width = 1)
range |
range of data, numeric vector of length 2 |
mul |
multiplicative constant |
add |
additive constant |
zero_width |
distance to use if range has zero width |
Scale bytes into human friendly units. Can use either SI units (e.g. kB = 1000 bytes) or binary units (e.g. kiB = 1024 bytes). See Units of Information on Wikipedia for more details.
label_bytes(units = "auto_si", accuracy = 1, scale = 1, ...)
units |
Unit to use. Should either one of:
|
accuracy |
A number to round to. Use (e.g.) Applied to rescaled data. |
scale |
A scaling factor: |
... |
Arguments passed on to
|
A labeller function that takes a numeric vector of breaks and returns a character vector of labels.
Other labels for continuous scales:
label_currency()
,
label_number_auto()
,
label_number_si()
,
label_ordinal()
,
label_parse()
,
label_percent()
,
label_pvalue()
,
label_scientific()
Other labels for log scales:
label_log()
,
label_number_si()
,
label_scientific()
demo_continuous(c(1, 1e6))
demo_continuous(c(1, 1e6), labels = label_bytes())
# Auto units are particularly nice on log scales
demo_log10(c(1, 1e7), labels = label_bytes())
# You can also set the units
demo_continuous(c(1, 1e6), labels = label_bytes("kB"))
# You can also use binary units where a megabyte is defined as
# (1024) ^ 2 bytes rather than (1000) ^ 2. You'll need to override
# the default breaks to make this more informative.
demo_continuous(c(1, 1024^2),
breaks = breaks_width(250 * 1024),
labels = label_bytes("auto_binary")
)
Format numbers as currency, rounding values to monetary or fractional monetary using unit a convenient heuristic.
label_currency(
accuracy = NULL,
scale = 1,
prefix = "$",
suffix = "",
big.mark = ",",
decimal.mark = ".",
trim = TRUE,
largest_with_fractional = 1e+05,
...
)
accuracy , largest_with_fractional
|
Number to round
to. If |
scale |
A scaling factor: |
prefix , suffix
|
Symbols to display before and after value. |
big.mark |
Character used between every 3 digits to separate thousands. |
decimal.mark |
The character to be used to indicate the numeric decimal point. |
trim |
Logical, if |
... |
Arguments passed on to
|
All label_()
functions return a "labelling" function, i.e. a function that
takes a vector x
and returns a character vector of length(x)
giving a
label for each input value.
Labelling functions are designed to be used with the labels
argument of
ggplot2 scales. The examples demonstrate their use with x scales, but
they work similarly for all scales, including those that generate legends
rather than axes.
Other labels for continuous scales:
label_bytes()
,
label_number_auto()
,
label_number_si()
,
label_ordinal()
,
label_parse()
,
label_percent()
,
label_pvalue()
,
label_scientific()
demo_continuous(c(0, 1), labels = label_currency())
demo_continuous(c(1, 100), labels = label_currency())
# Customise currency display with prefix and suffix
demo_continuous(c(1, 100), labels = label_currency(prefix = "USD "))
yen <- label_currency(
prefix = "¥",
suffix = "",
big.mark = ".",
decimal.mark = ","
)
demo_continuous(c(1000, 1100), labels = yen)
# Use style_negative = "parens" for finance style display
demo_continuous(c(-100, 100), labels = label_currency(style_negative = "parens"))
# Use scale_cut to use K/M/B where appropriate
demo_log10(c(1, 1e16),
breaks = log_breaks(7, 1e3),
labels = label_currency(scale_cut = cut_short_scale())
)
# cut_short_scale() uses B = one thousand million
# cut_long_scale() uses B = one million million
demo_log10(c(1, 1e16),
breaks = log_breaks(7, 1e3),
labels = label_currency(scale_cut = cut_long_scale())
)
# You can also define your own breaks
gbp <- label_currency(
prefix = "\u00a3",
scale_cut = c(0, k = 1e3, m = 1e6, bn = 1e9, tn = 1e12)
)
demo_log10(c(1, 1e12), breaks = log_breaks(5, 1e3), labels = gbp)
label_date()
and label_time()
label date/times using date/time format
strings. label_date_short()
automatically constructs a short format string
sufficient to uniquely identify labels. It's inspired by matplotlib's
ConciseDateFormatter
,
but uses a slightly different approach: ConciseDateFormatter
formats
"firsts" (e.g. first day of month, first day of day) specially;
date_short()
formats changes (e.g. new month, new year) specially.
label_timespan()
is intended to show time passed and adds common time units
suffix to the input (ns, us, ms, s, m, h, d, w).
label_date(format = "%Y-%m-%d", tz = "UTC", locale = NULL)
label_date_short(format = c("%Y", "%b", "%d", "%H:%M"), sep = "\n")
label_time(format = "%H:%M:%S", tz = "UTC", locale = NULL)
label_timespan(
unit = c("secs", "mins", "hours", "days", "weeks"),
space = FALSE,
...
)
format |
For For |
tz |
a time zone name, see |
locale |
Locale to use when for day and month names. The default
uses the current locale. Setting this argument requires stringi, and you
can see a complete list of supported locales with
|
sep |
Separator to use when combining date formats into a single string. |
unit |
The unit used to interpret numeric input |
space |
Add a space before the time unit? |
... |
Arguments passed on to
|
All label_()
functions return a "labelling" function, i.e. a function that
takes a vector x
and returns a character vector of length(x)
giving a
label for each input value.
Labelling functions are designed to be used with the labels
argument of
ggplot2 scales. The examples demonstrate their use with x scales, but
they work similarly for all scales, including those that generate legends
rather than axes.
date_range <- function(start, days) {
start <- as.POSIXct(start)
c(start, start + days * 24 * 60 * 60)
}
two_months <- date_range("2020-05-01", 60)
demo_datetime(two_months)
demo_datetime(two_months, labels = date_format("%m/%d"))
demo_datetime(two_months, labels = date_format("%e %b", locale = "fr"))
demo_datetime(two_months, labels = date_format("%e %B", locale = "es"))
# ggplot2 provides a short-hand:
demo_datetime(two_months, date_labels = "%m/%d")
# An alternative labelling system is label_date_short()
demo_datetime(two_months, date_breaks = "7 days", labels = label_date_short())
# This is particularly effective for dense labels
one_year <- date_range("2020-05-01", 365)
demo_datetime(one_year, date_breaks = "month")
demo_datetime(one_year, date_breaks = "month", labels = label_date_short())
label_log()
displays numbers as base^exponent, using superscript formatting.
label_log(base = 10, digits = 3)
base |
Base of logarithm to use |
digits |
Number of significant digits to show for the exponent. Argument
is passed on to |
All label_()
functions return a "labelling" function, i.e. a function that
takes a vector x
and returns a character vector of length(x)
giving a
label for each input value.
Labelling functions are designed to be used with the labels
argument of
ggplot2 scales. The examples demonstrate their use with x scales, but
they work similarly for all scales, including those that generate legends
rather than axes.
breaks_log()
for the related breaks algorithm.
Other labels for log scales:
label_bytes()
,
label_number_si()
,
label_scientific()
demo_log10(c(1, 1e5), labels = label_log())
demo_log10(c(1, 1e5), breaks = breaks_log(base = 2), labels = label_log(base = 2))
Use label_number()
force decimal display of numbers (i.e. don't use
scientific notation). label_comma()
is a special case
that inserts a comma every three digits.
label_number(
accuracy = NULL,
scale = 1,
prefix = "",
suffix = "",
big.mark = " ",
decimal.mark = ".",
style_positive = c("none", "plus", "space"),
style_negative = c("hyphen", "minus", "parens"),
scale_cut = NULL,
trim = TRUE,
...
)
label_comma(
accuracy = NULL,
scale = 1,
prefix = "",
suffix = "",
big.mark = ",",
decimal.mark = ".",
trim = TRUE,
digits,
...
)
accuracy |
A number to round to. Use (e.g.) Applied to rescaled data. |
scale |
A scaling factor: |
prefix |
Additional text to display before the number. The suffix is
applied to absolute value before |
suffix |
Additional text to display after the number. |
big.mark |
Character used between every 3 digits to separate thousands. |
decimal.mark |
The character to be used to indicate the numeric decimal point. |
style_positive |
A string that determines the style of positive numbers:
|
style_negative |
A string that determines the style of negative numbers:
|
scale_cut |
Named numeric vector that allows you to rescale large (or small) numbers and add a prefix. Built-in helpers include:
If you supply a vector |
trim |
Logical, if |
... |
Other arguments passed on to |
digits |
All label_()
functions return a "labelling" function, i.e. a function that
takes a vector x
and returns a character vector of length(x)
giving a
label for each input value.
Labelling functions are designed to be used with the labels
argument of
ggplot2 scales. The examples demonstrate their use with x scales, but
they work similarly for all scales, including those that generate legends
rather than axes.
demo_continuous(c(-1e6, 1e6))
demo_continuous(c(-1e6, 1e6), labels = label_number())
demo_continuous(c(-1e6, 1e6), labels = label_comma())
# Use scale to rescale very small or large numbers to generate
# more readable labels
demo_continuous(c(0, 1e6), labels = label_number())
demo_continuous(c(0, 1e6), labels = label_number(scale = 1 / 1e3))
demo_continuous(c(0, 1e-6), labels = label_number())
demo_continuous(c(0, 1e-6), labels = label_number(scale = 1e6))
#' Use scale_cut to automatically add prefixes for large/small numbers
demo_log10(
c(1, 1e9),
breaks = log_breaks(10),
labels = label_number(scale_cut = cut_short_scale())
)
demo_log10(
c(1, 1e9),
breaks = log_breaks(10),
labels = label_number(scale_cut = cut_si("m"))
)
demo_log10(
c(1e-9, 1),
breaks = log_breaks(10),
labels = label_number(scale_cut = cut_si("g"))
)
# use scale and scale_cut when data already uses SI prefix
# for example, if data was stored in kg
demo_log10(
c(1e-9, 1),
breaks = log_breaks(10),
labels = label_number(scale_cut = cut_si("g"), scale = 1e3)
)
#' # Use style arguments to vary the appearance of positive and negative numbers
demo_continuous(c(-1e3, 1e3), labels = label_number(
style_positive = "plus",
style_negative = "minus"
))
demo_continuous(c(-1e3, 1e3), labels = label_number(style_negative = "parens"))
# You can use prefix and suffix for other types of display
demo_continuous(c(32, 212), labels = label_number(suffix = "\u00b0F"))
demo_continuous(c(0, 100), labels = label_number(suffix = "\u00b0C"))
Switches between number_format()
and scientific_format()
based on a set of
heuristics designed to automatically generate useful labels across a wide
range of inputs
label_number_auto()
Other labels for continuous scales:
label_bytes()
,
label_currency()
,
label_number_si()
,
label_ordinal()
,
label_parse()
,
label_percent()
,
label_pvalue()
,
label_scientific()
# Very small and very large numbers get scientific notation
demo_continuous(c(0, 1e-6), labels = label_number_auto())
demo_continuous(c(0, 1e9), labels = label_number_auto())
# Other ranges get the numbers printed in full
demo_continuous(c(0, 1e-3), labels = label_number_auto())
demo_continuous(c(0, 1), labels = label_number_auto())
demo_continuous(c(0, 1e3), labels = label_number_auto())
demo_continuous(c(0, 1e6), labels = label_number_auto())
# Transformation is applied individually so you get as little
# scientific notation as possible
demo_log10(c(1, 1e7), labels = label_number_auto())
Round values to integers and then display as ordinal values (e.g. 1st, 2nd, 3rd). Built-in rules are provided for English, French, and Spanish.
label_ordinal(
prefix = "",
suffix = "",
big.mark = " ",
rules = ordinal_english(),
...
)
ordinal_english()
ordinal_french(gender = c("masculin", "feminin"), plural = FALSE)
ordinal_spanish()
prefix , suffix
|
Symbols to display before and after value. |
big.mark |
Character used between every 3 digits to separate thousands. |
rules |
Named list of regular expressions, matched in order. Name gives suffix, and value specifies which numbers to match. |
... |
Arguments passed on to
|
gender |
Masculin or feminin gender for French ordinal. |
plural |
Plural or singular for French ordinal. |
All label_()
functions return a "labelling" function, i.e. a function that
takes a vector x
and returns a character vector of length(x)
giving a
label for each input value.
Labelling functions are designed to be used with the labels
argument of
ggplot2 scales. The examples demonstrate their use with x scales, but
they work similarly for all scales, including those that generate legends
rather than axes.
Other labels for continuous scales:
label_bytes()
,
label_currency()
,
label_number_auto()
,
label_number_si()
,
label_parse()
,
label_percent()
,
label_pvalue()
,
label_scientific()
demo_continuous(c(1, 5))
demo_continuous(c(1, 5), labels = label_ordinal())
demo_continuous(c(1, 5), labels = label_ordinal(rules = ordinal_french()))
# The rules are just a set of regular expressions that are applied in turn
ordinal_french()
ordinal_english()
# Note that ordinal rounds values, so you may need to adjust the breaks too
demo_continuous(c(1, 10))
demo_continuous(c(1, 10), labels = label_ordinal())
demo_continuous(c(1, 10),
labels = label_ordinal(),
breaks = breaks_width(2)
)
label_parse()
produces expression from strings by parsing them;
label_math()
constructs expressions by replacing the pronoun .x
with each string.
label_parse()
label_math(expr = 10^.x, format = force)
expr |
expression to use |
format |
another format function to apply prior to mathematical transformation - this makes it easier to use floating point numbers in mathematical expressions. |
All label_()
functions return a "labelling" function, i.e. a function that
takes a vector x
and returns a character vector of length(x)
giving a
label for each input value.
Labelling functions are designed to be used with the labels
argument of
ggplot2 scales. The examples demonstrate their use with x scales, but
they work similarly for all scales, including those that generate legends
rather than axes.
plotmath for the details of mathematical formatting in R.
Other labels for continuous scales:
label_bytes()
,
label_currency()
,
label_number_auto()
,
label_number_si()
,
label_ordinal()
,
label_percent()
,
label_pvalue()
,
label_scientific()
Other labels for discrete scales:
label_wrap()
# Use label_parse() with discrete scales
greek <- c("alpha", "beta", "gamma")
demo_discrete(greek)
demo_discrete(greek, labels = label_parse())
# Use label_math() with continuous scales
demo_continuous(c(1, 5))
demo_continuous(c(1, 5), labels = label_math(alpha[.x]))
demo_continuous(c(1, 5), labels = label_math())
Label percentages (2.5%, 50%, etc)
label_percent(
accuracy = NULL,
scale = 100,
prefix = "",
suffix = "%",
big.mark = " ",
decimal.mark = ".",
trim = TRUE,
...
)
accuracy |
A number to round to. Use (e.g.) Applied to rescaled data. |
scale |
A scaling factor: |
prefix |
Additional text to display before the number. The suffix is
applied to absolute value before |
suffix |
Additional text to display after the number. |
big.mark |
Character used between every 3 digits to separate thousands. |
decimal.mark |
The character to be used to indicate the numeric decimal point. |
trim |
Logical, if |
... |
Arguments passed on to
|
All label_()
functions return a "labelling" function, i.e. a function that
takes a vector x
and returns a character vector of length(x)
giving a
label for each input value.
Labelling functions are designed to be used with the labels
argument of
ggplot2 scales. The examples demonstrate their use with x scales, but
they work similarly for all scales, including those that generate legends
rather than axes.
Other labels for continuous scales:
label_bytes()
,
label_currency()
,
label_number_auto()
,
label_number_si()
,
label_ordinal()
,
label_parse()
,
label_pvalue()
,
label_scientific()
demo_continuous(c(0, 1))
demo_continuous(c(0, 1), labels = label_percent())
# Use prefix and suffix to create your own variants
french_percent <- label_percent(
decimal.mark = ",",
suffix = " %"
)
demo_continuous(c(0, .01), labels = french_percent)
Formatter for p-values, using "<" and ">" for p-values close to 0 and 1.
label_pvalue(
accuracy = 0.001,
decimal.mark = ".",
prefix = NULL,
add_p = FALSE
)
accuracy |
A number to round to. Use (e.g.) Applied to rescaled data. |
decimal.mark |
The character to be used to indicate the numeric decimal point. |
prefix |
A character vector of length 3 giving the prefixes to
put in front of numbers. The default values are |
add_p |
Add "p=" before the value? |
All label_()
functions return a "labelling" function, i.e. a function that
takes a vector x
and returns a character vector of length(x)
giving a
label for each input value.
Labelling functions are designed to be used with the labels
argument of
ggplot2 scales. The examples demonstrate their use with x scales, but
they work similarly for all scales, including those that generate legends
rather than axes.
Other labels for continuous scales:
label_bytes()
,
label_currency()
,
label_number_auto()
,
label_number_si()
,
label_ordinal()
,
label_parse()
,
label_percent()
,
label_scientific()
demo_continuous(c(0, 1))
demo_continuous(c(0, 1), labels = label_pvalue())
demo_continuous(c(0, 1), labels = label_pvalue(accuracy = 0.1))
demo_continuous(c(0, 1), labels = label_pvalue(add_p = TRUE))
# Or provide your own prefixes
prefix <- c("p < ", "p = ", "p > ")
demo_continuous(c(0, 1), labels = label_pvalue(prefix = prefix))
Label numbers with scientific notation (e.g. 1e05, 1.5e-02)
label_scientific(
digits = 3,
scale = 1,
prefix = "",
suffix = "",
decimal.mark = ".",
trim = TRUE,
...
)
digits |
Number of digits to show before exponent. |
scale |
A scaling factor: |
prefix , suffix
|
Symbols to display before and after value. |
decimal.mark |
The character to be used to indicate the numeric decimal point. |
trim |
Logical, if |
... |
Other arguments passed on to |
All label_()
functions return a "labelling" function, i.e. a function that
takes a vector x
and returns a character vector of length(x)
giving a
label for each input value.
Labelling functions are designed to be used with the labels
argument of
ggplot2 scales. The examples demonstrate their use with x scales, but
they work similarly for all scales, including those that generate legends
rather than axes.
Other labels for continuous scales:
label_bytes()
,
label_currency()
,
label_number_auto()
,
label_number_si()
,
label_ordinal()
,
label_parse()
,
label_percent()
,
label_pvalue()
Other labels for log scales:
label_bytes()
,
label_log()
,
label_number_si()
demo_continuous(c(1, 10))
demo_continuous(c(1, 10), labels = label_scientific())
demo_continuous(c(1, 10), labels = label_scientific(digits = 3))
demo_log10(c(1, 1e9))
Uses strwrap()
to split long labels across multiple lines.
label_wrap(width)
width |
Number of characters per line. |
All label_()
functions return a "labelling" function, i.e. a function that
takes a vector x
and returns a character vector of length(x)
giving a
label for each input value.
Labelling functions are designed to be used with the labels
argument of
ggplot2 scales. The examples demonstrate their use with x scales, but
they work similarly for all scales, including those that generate legends
rather than axes.
Other labels for discrete scales:
label_parse()
x <- c(
"this is a long label",
"this is another long label",
"this a label this is even longer"
)
demo_discrete(x)
demo_discrete(x, labels = label_wrap(10))
demo_discrete(x, labels = label_wrap(20))
Generate minor breaks between major breaks either spaced with a fixed width, or having a fixed number.
minor_breaks_width(width, offset)
minor_breaks_n(n)
width |
Distance between each break. Either a number, or for
date/times, a single string of the form |
offset |
Use if you don't want breaks to start at zero, or on a
conventional date or time boundary such as the 1st of January or midnight.
Either a number, or for date/times, a single string of the form
|
n |
number of breaks |
demo_log10(c(1, 1e6))
if (FALSE) {
# Requires https://github.com/tidyverse/ggplot2/pull/3591
demo_log10(c(1, 1e6), minor_breaks = minor_breaks_n(10))
}
Mute standard colour
muted(colour, l = 30, c = 70)
colour |
character vector of colours to modify |
l |
new luminance |
c |
new chroma |
muted("red")
muted("blue")
show_col(c("red", "blue", muted("red"), muted("blue")))
This set of functions modify data values outside a given range.
The oob_*()
functions are designed to be passed as the oob
argument of
ggplot2 continuous and binned scales, with oob_discard
being an exception.
These functions affect out of bounds values in the following ways:
oob_censor()
replaces out of bounds values with NA
s. This is the
default oob
argument for continuous scales.
oob_censor_any()
acts like oob_censor()
, but also replaces infinite
values with NA
s.
oob_squish()
replaces out of bounds values with the nearest limit. This
is the default oob
argument for binned scales.
oob_squish_any()
acts like oob_squish()
, but also replaces infinite
values with the nearest limit.
oob_squish_infinite()
only replaces infinite values by the nearest limit.
oob_keep()
does not adjust out of bounds values. In position scales,
behaves as zooming limits without data removal.
oob_discard()
removes out of bounds values from the input. Not suitable
for ggplot2 scales.
oob_censor(x, range = c(0, 1), only.finite = TRUE)
oob_censor_any(x, range = c(0, 1))
oob_discard(x, range = c(0, 1))
oob_squish(x, range = c(0, 1), only.finite = TRUE)
oob_squish_any(x, range = c(0, 1))
oob_squish_infinite(x, range = c(0, 1))
oob_keep(x, range = c(0, 1))
censor(x, range = c(0, 1), only.finite = TRUE)
discard(x, range = c(0, 1))
squish(x, range = c(0, 1), only.finite = TRUE)
squish_infinite(x, range = c(0, 1))
x |
A numeric vector of values to modify. |
range |
A numeric vector of length two giving the minimum and maximum limit of the desired output range respectively. |
only.finite |
A logical of length one. When |
The oob_censor_any()
and oob_squish_any()
functions are the same
as oob_censor()
and oob_squish()
with the only.finite
argument set to
FALSE
.
Replacing position values with NA
s, as oob_censor()
does, will typically
lead to removal of those datapoints in ggplot.
Setting ggplot coordinate limits is equivalent to using oob_keep()
in
position scales.
Most oob_()
functions return a vector of numerical values of the
same length as the x
argument, wherein out of bounds values have been
modified. Only oob_discard()
returns a vector of less than or of equal
length to the x
argument.
censor()
, squish()
, squish_infinite()
and
discard()
are no longer recommended; please use oob_censor()
,
oob_squish()
, oob_squish_infinite()
and oob_discard()
instead.
oob_squish()
: Homer Strong [email protected]
# Censoring replaces out of bounds values with NAs
oob_censor(c(-Inf, -1, 0.5, 1, 2, NA, Inf))
oob_censor_any(c(-Inf, -1, 0.5, 1, 2, NA, Inf))
# Squishing replaces out of bounds values with the nearest range limit
oob_squish(c(-Inf, -1, 0.5, 1, 2, NA, Inf))
oob_squish_any(c(-Inf, -1, 0.5, 1, 2, NA, Inf))
oob_squish_infinite(c(-Inf, -1, 0.5, 1, 2, NA, Inf))
# Keeping does not alter values
oob_keep(c(-Inf, -1, 0.5, 1, 2, NA, Inf))
# Discarding will remove out of bounds values
oob_discard(c(-Inf, -1, 0.5, 1, 2, NA, Inf))
Area palettes (continuous)
pal_area(range = c(1, 6))
area_pal(range = c(1, 6))
abs_area(max)
range |
Numeric vector of length two, giving range of possible sizes. Should be greater than 0. |
max |
A number representing the maximum size. |
Colour Brewer palette (discrete)
pal_brewer(type = "seq", palette = 1, direction = 1)
brewer_pal(type = "seq", palette = 1, direction = 1)
type |
One of "seq" (sequential), "div" (diverging) or "qual" (qualitative) |
palette |
If a string, will use that named palette. If a number, will
index into the list of palettes of appropriate |
direction |
Sets the order of colours in the scale. If 1, the default,
colours are as output by |
show_col(pal_brewer()(10))
show_col(pal_brewer("div")(5))
show_col(pal_brewer(palette = "Greens")(5))
# Can use with gradient_n to create a continuous gradient
cols <- pal_brewer("div")(5)
show_col(pal_gradient_n(cols)(seq(0, 1, length.out = 30)))
Dichromat (colour-blind) palette (discrete)
pal_dichromat(name)
dichromat_pal(name)
name |
Name of colour palette. One of:
|
if (requireNamespace("dichromat", quietly = TRUE)) {
show_col(pal_dichromat("BluetoOrange.10")(10))
show_col(pal_dichromat("BluetoOrange.10")(5))
# Can use with gradient_n to create a continous gradient
cols <- pal_dichromat("DarkRedtoBlue.12")(12)
show_col(pal_gradient_n(cols)(seq(0, 1, length.out = 30)))
}
Diverging colour gradient (continuous).
pal_div_gradient(
low = mnsl("10B 4/6"),
mid = mnsl("N 8/0"),
high = mnsl("10R 4/6"),
space = "Lab"
)
div_gradient_pal(
low = mnsl("10B 4/6"),
mid = mnsl("N 8/0"),
high = mnsl("10R 4/6"),
space = "Lab"
)
low |
colour for low end of gradient. |
mid |
colour for mid point |
high |
colour for high end of gradient. |
space |
colour space in which to calculate gradient. Must be "Lab" - other values are deprecated. |
x <- seq(-1, 1, length.out = 100)
r <- sqrt(outer(x^2, x^2, "+"))
image(r, col = pal_div_gradient()(seq(0, 1, length.out = 12)))
image(r, col = pal_div_gradient()(seq(0, 1, length.out = 30)))
image(r, col = pal_div_gradient()(seq(0, 1, length.out = 100)))
library(munsell)
pal <- pal_div_gradient(low = mnsl(complement("10R 4/6"), fix = TRUE))
image(r, col = pal(seq(0, 1, length.out = 100)))
Arbitrary colour gradient palette (continuous)
pal_gradient_n(colours, values = NULL, space = "Lab")
gradient_n_pal(colours, values = NULL, space = "Lab")
colours |
vector of colours |
values |
if colours should not be evenly positioned along the gradient
this vector gives the position (between 0 and 1) for each colour in the
|
space |
colour space in which to calculate gradient. Must be "Lab" - other values are deprecated. |
Grey scale palette (discrete)
pal_grey(start = 0.2, end = 0.8)
grey_pal(start = 0.2, end = 0.8)
start |
grey value at low end of palette |
end |
grey value at high end of palette |
pal_seq_gradient()
for continuous version
show_col(pal_grey()(25))
show_col(pal_grey(0, 1)(25))
Hue palette (discrete)
pal_hue(h = c(0, 360) + 15, c = 100, l = 65, h.start = 0, direction = 1)
hue_pal(h = c(0, 360) + 15, c = 100, l = 65, h.start = 0, direction = 1)
h |
range of hues to use, in [0, 360] |
c |
chroma (intensity of colour), maximum value varies depending on combination of hue and luminance. |
l |
luminance (lightness), in [0, 100] |
h.start |
hue to start at |
direction |
direction to travel around the colour wheel, 1 = clockwise, -1 = counter-clockwise |
show_col(pal_hue()(4))
show_col(pal_hue()(9))
show_col(pal_hue(l = 90)(9))
show_col(pal_hue(l = 30)(9))
show_col(pal_hue()(9))
show_col(pal_hue(direction = -1)(9))
show_col(pal_hue(h.start = 30)(9))
show_col(pal_hue(h.start = 90)(9))
show_col(pal_hue()(9))
show_col(pal_hue(h = c(0, 90))(9))
show_col(pal_hue(h = c(90, 180))(9))
show_col(pal_hue(h = c(180, 270))(9))
show_col(pal_hue(h = c(270, 360))(9))
Leaves values unchanged - useful when the data is already scaled.
pal_identity()
identity_pal()
Based on a set supplied by Richard Pearson, University of Manchester
pal_linetype()
linetype_pal()
Manual palette (discrete)
pal_manual(values)
manual_pal(values)
values |
vector of values to be used as a palette. |
Just rescales the input to the specific output range. Useful for alpha, size, and continuous position.
pal_rescale(range = c(0.1, 1))
rescale_pal(range = c(0.1, 1))
range |
Numeric vector of length two, giving range of possible values. Should be between 0 and 1. |
Sequential colour gradient palette (continuous)
pal_seq_gradient(low = mnsl("10B 4/6"), high = mnsl("10R 4/6"), space = "Lab")
seq_gradient_pal(low = mnsl("10B 4/6"), high = mnsl("10R 4/6"), space = "Lab")
low |
colour for low end of gradient. |
high |
colour for high end of gradient. |
space |
colour space in which to calculate gradient. Must be "Lab" - other values are deprecated. |
x <- seq(0, 1, length.out = 25)
show_col(pal_seq_gradient()(x))
show_col(pal_seq_gradient("white", "black")(x))
library(munsell)
show_col(pal_seq_gradient("white", mnsl("10R 4/6"))(x))
Shape palette (discrete)
pal_shape(solid = TRUE)
shape_pal(solid = TRUE)
solid |
should shapes be solid or not? |
Viridis palette
pal_viridis(alpha = 1, begin = 0, end = 1, direction = 1, option = "D")
viridis_pal(alpha = 1, begin = 0, end = 1, direction = 1, option = "D")
alpha |
The alpha transparency, a number in [0,1], see argument alpha in
|
begin , end
|
The (corrected) hue in |
direction |
Sets the order of colors in the scale. If 1, the default, colors are ordered from darkest to lightest. If -1, the order of colors is reversed. |
option |
A character string indicating the color map option to use. Eight options are available:
|
https://bids.github.io/colormap/
show_col(pal_viridis()(10))
show_col(pal_viridis(direction = -1)(6))
show_col(pal_viridis(begin = 0.2, end = 0.8)(4))
show_col(pal_viridis(option = "plasma")(6))
Mutable ranges have a two methods (train
and reset
), and
make it possible to build up complete ranges with multiple passes.
Rescale continuous vector to have specified minimum and maximum
rescale(x, to, from, ...)
## S3 method for class 'numeric'
rescale(x, to = c(0, 1), from = range(x, na.rm = TRUE, finite = TRUE), ...)
## S3 method for class 'dist'
rescale(x, to = c(0, 1), from = range(x, na.rm = TRUE, finite = TRUE), ...)
## S3 method for class 'logical'
rescale(x, to = c(0, 1), from = range(x, na.rm = TRUE, finite = TRUE), ...)
## S3 method for class 'POSIXt'
rescale(x, to = c(0, 1), from = range(x, na.rm = TRUE, finite = TRUE), ...)
## S3 method for class 'Date'
rescale(x, to = c(0, 1), from = range(x, na.rm = TRUE, finite = TRUE), ...)
## S3 method for class 'integer64'
rescale(x, to = c(0, 1), from = range(x, na.rm = TRUE), ...)
## S3 method for class 'difftime'
rescale(x, to = c(0, 1), from = range(x, na.rm = TRUE, finite = TRUE), ...)
## S3 method for class 'AsIs'
rescale(x, to, from, ...)
x |
continuous vector of values to manipulate. |
to |
output range (numeric vector of length two) |
from |
input range (vector of length two). If not given, is
calculated from the range of |
... |
other arguments passed on to methods |
Objects of class <AsIs>
are returned unaltered.
rescale(1:100)
rescale(runif(50))
rescale(1)
Rescale numeric vector to have specified maximum
rescale_max(x, to = c(0, 1), from = range(x, na.rm = TRUE))
x |
numeric vector of values to manipulate. |
to |
output range (numeric vector of length two) |
from |
input range (numeric vector of length two). If not given, is
calculated from the range of |
rescale_max(1:100)
rescale_max(runif(50))
rescale_max(1)
Rescale vector to have specified minimum, midpoint, and maximum
rescale_mid(x, to, from, mid, ...)
## S3 method for class 'numeric'
rescale_mid(x, to = c(0, 1), from = range(x, na.rm = TRUE), mid = 0, ...)
## S3 method for class 'logical'
rescale_mid(x, to = c(0, 1), from = range(x, na.rm = TRUE), mid = 0, ...)
## S3 method for class 'dist'
rescale_mid(x, to = c(0, 1), from = range(x, na.rm = TRUE), mid = 0, ...)
## S3 method for class 'POSIXt'
rescale_mid(x, to = c(0, 1), from = range(x, na.rm = TRUE), mid, ...)
## S3 method for class 'Date'
rescale_mid(x, to = c(0, 1), from = range(x, na.rm = TRUE), mid, ...)
## S3 method for class 'integer64'
rescale_mid(x, to = c(0, 1), from = range(x, na.rm = TRUE), mid = 0, ...)
## S3 method for class 'AsIs'
rescale_mid(x, to, from, ...)
x |
vector of values to manipulate. |
to |
output range (numeric vector of length two) |
from |
input range (vector of length two). If not given, is
calculated from the range of |
mid |
mid-point of input range |
... |
other arguments passed on to methods |
Objects of class <AsIs>
are returned unaltered.
rescale_mid(1:100, mid = 50.5)
rescale_mid(runif(50), mid = 0.5)
rescale_mid(1)
Don't perform rescaling
rescale_none(x, ...)
x |
numeric vector of values to manipulate. |
... |
all other arguments ignored |
rescale_none(1:100)
Strips attributes and always returns a numeric vector
train_continuous(new, existing = NULL)
new |
New data to add to scale |
existing |
Optional existing scale to update |
Train (update) a discrete scale
train_discrete(new, existing = NULL, drop = FALSE, na.rm = FALSE, fct = NA)
new |
New data to add to scale |
existing |
Optional existing scale to update |
drop |
|
na.rm |
If |
fct |
Treat |
Inverse Hyperbolic Sine transformation
transform_asinh()
asinh_trans()
plot(transform_asinh(), xlim = c(-1e2, 1e2))
This is the variance stabilising transformation for the binomial distribution.
transform_asn()
asn_trans()
plot(transform_asn(), xlim = c(0, 1))
Arc-tangent transformation
transform_atanh()
atanh_trans()
plot(transform_atanh(), xlim = c(-1, 1))
The Box-Cox transformation is a flexible transformation, often used to transform data towards normality. The modulus transformation generalises Box-Cox to also work with negative values.
transform_boxcox(p, offset = 0)
boxcox_trans(p, offset = 0)
transform_modulus(p, offset = 1)
modulus_trans(p, offset = 1)
p |
Transformation exponent, |
offset |
Constant offset. 0 for Box-Cox type 1,
otherwise any non-negative constant (Box-Cox type 2). |
The Box-Cox power transformation (type 1) requires strictly positive values and
takes the following form for y > 0
:
When y = 0
, the natural log transform is used.
The modulus transformation implements a generalisation of the Box-Cox
transformation that works for data with both positive and negative values.
The equation takes the following forms, when y != 0
:
and when y = 0
:
Box, G. E., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society. Series B (Methodological), 211-252. https://www.jstor.org/stable/2984418
John, J. A., & Draper, N. R. (1980). An alternative family of transformations. Applied Statistics, 190-197. https://www.jstor.org/stable/2986305
plot(transform_boxcox(-1), xlim = c(0, 10))
plot(transform_boxcox(0), xlim = c(0, 10))
plot(transform_boxcox(1), xlim = c(0, 10))
plot(transform_boxcox(2), xlim = c(0, 10))
plot(transform_modulus(-1), xlim = c(-10, 10))
plot(transform_modulus(0), xlim = c(-10, 10))
plot(transform_modulus(1), xlim = c(-10, 10))
plot(transform_modulus(2), xlim = c(-10, 10))
This transformer provides a general mechanism for composing two or more transformers together. The most important use case is to combine reverse with other transformations.
transform_compose(...)
compose_trans(...)
... |
One or more transformers, either specified with string or as individual transformer objects. |
demo_continuous(10^c(-2:4), trans = "log10", labels = label_log())
demo_continuous(10^c(-2:4), trans = c("log10", "reverse"), labels = label_log())
Transformation for dates (class Date)
transform_date()
date_trans()
years <- seq(as.Date("1910/1/1"), as.Date("1999/1/1"), "years")
t <- transform_date()
t$transform(years)
t$inverse(t$transform(years))
t$format(t$breaks(range(years)))
Exponential transformation (inverse of log transformation)
transform_exp(base = exp(1))
exp_trans(base = exp(1))
base |
Base of logarithm |
plot(transform_exp(0.5), xlim = c(-2, 2))
plot(transform_exp(1), xlim = c(-2, 2))
plot(transform_exp(2), xlim = c(-2, 2))
plot(transform_exp(), xlim = c(-2, 2))
Identity transformation (do nothing)
transform_identity()
identity_trans()
plot(transform_identity(), xlim = c(-1, 1))
transform_log()
: log(x)
log1p()
: log(x + 1)
transform_pseudo_log()
: smoothly transition to linear scale around 0.
transform_log(base = exp(1))
transform_log10()
transform_log2()
transform_log1p()
log_trans(base = exp(1))
log10_trans()
log2_trans()
log1p_trans()
transform_pseudo_log(sigma = 1, base = exp(1))
pseudo_log_trans(sigma = 1, base = exp(1))
base |
base of logarithm |
sigma |
Scaling factor for the linear part of pseudo-log transformation. |
plot(transform_log2(), xlim = c(0, 5))
plot(transform_log(), xlim = c(0, 5))
plot(transform_log10(), xlim = c(0, 5))
plot(transform_log(), xlim = c(0, 2))
plot(transform_log1p(), xlim = c(-1, 1))
# The pseudo-log is defined for all real numbers
plot(transform_pseudo_log(), xlim = c(-5, 5))
lines(transform_log(), xlim = c(0, 5), col = "red")
# For large positives numbers it's very close to log
plot(transform_pseudo_log(), xlim = c(1, 20))
lines(transform_log(), xlim = c(1, 20), col = "red")
Probability transformation
transform_probability(distribution, ...)
transform_logit()
transform_probit()
probability_trans(distribution, ...)
logit_trans()
probit_trans()
distribution |
probability distribution. Should be standard R abbreviation so that "p" + distribution is a valid cumulative distribution function, "q" + distribution is a valid quantile function, and "d" + distribution is a valid probability density function. |
... |
other arguments passed on to distribution and quantile functions |
plot(transform_logit(), xlim = c(0, 1))
plot(transform_probit(), xlim = c(0, 1))
Reciprocal transformation
transform_reciprocal()
reciprocal_trans()
plot(transform_reciprocal(), xlim = c(0, 1))
reversing transformation works by multiplying the input with -1. This means that reverse transformation cannot easily be composed with transformations that require positive input unless the reversing is done as a final step.
transform_reverse()
reverse_trans()
plot(transform_reverse(), xlim = c(-1, 1))
This is the variance stabilising transformation for the Poisson distribution.
transform_sqrt()
sqrt_trans()
plot(transform_sqrt(), xlim = c(0, 5))
Transformation for date-times (class POSIXt)
transform_time(tz = NULL)
time_trans(tz = NULL)
tz |
Optionally supply the time zone. If |
hours <- seq(ISOdate(2000, 3, 20, tz = ""), by = "hour", length.out = 10)
t <- transform_time()
t$transform(hours)
t$inverse(t$transform(hours))
t$format(t$breaks(range(hours)))
transform_timespan()
provides transformations for data encoding time passed
along with breaks and label formatting showing standard unit of time fitting
the range of the data. transform_hms()
provides the same but using standard
hms idioms and formatting.
transform_timespan(unit = c("secs", "mins", "hours", "days", "weeks"))
timespan_trans(unit = c("secs", "mins", "hours", "days", "weeks"))
transform_hms()
hms_trans()
unit |
The unit used to interpret numeric input |
# transform_timespan allows you to specify the time unit numeric data is
# interpreted in
trans_min <- transform_timespan("mins")
demo_timespan(seq(0, 100), trans = trans_min)
# Input already in difftime format is interpreted correctly
demo_timespan(as.difftime(seq(0, 100), units = "secs"), trans = trans_min)
if (require("hms")) {
# transform_hms always assumes seconds
hms <- round(runif(10) * 86400)
t <- transform_hms()
t$transform(hms)
t$inverse(t$transform(hms))
t$breaks(hms)
# The break labels also follow the hms format
demo_timespan(hms, trans = t)
}
The Yeo-Johnson transformation is a flexible transformation that is similar
to Box-Cox, transform_boxcox()
, but does not require input values to be
greater than zero.
transform_yj(p)
yj_trans(p)
p |
Transformation exponent, |
The transformation takes one of four forms depending on the values of y
and .
and
:
and
:
and
:
and
:
Yeo, I., & Johnson, R. (2000). A New Family of Power Transformations to Improve Normality or Symmetry. Biometrika, 87(4), 954-959. https://www.jstor.org/stable/2673623
plot(transform_yj(-1), xlim = c(-10, 10))
plot(transform_yj(0), xlim = c(-10, 10))
plot(transform_yj(1), xlim = c(-10, 10))
plot(transform_yj(2), xlim = c(-10, 10))
The machine epsilon is the difference between 1.0 and the next number that can be represented by the machine. By default, this function uses epsilon * 1000 as the tolerance. First it scales the values so that they have a mean of 1, and then it checks if the difference between them is larger than the tolerance.
zero_range(x, tol = 1000 * .Machine$double.eps)
x |
numeric range: vector of length 2 |
tol |
A value specifying the tolerance. |
logical TRUE
if the relative difference of the endpoints of
the range are not distinguishable from 0.
eps <- .Machine$double.eps
zero_range(c(1, 1 + eps))
zero_range(c(1, 1 + 99 * eps))
zero_range(c(1, 1 + 1001 * eps))
zero_range(c(1, 1 + 2 * eps), tol = eps)
# Scaling up or down all the values has no effect since the values
# are rescaled to 1 before checking against tol
zero_range(100000 * c(1, 1 + eps))
zero_range(100000 * c(1, 1 + 1001 * eps))
zero_range(.00001 * c(1, 1 + eps))
zero_range(.00001 * c(1, 1 + 1001 * eps))
# NA values
zero_range(c(1, NA)) # NA
zero_range(c(1, NaN)) # NA
# Infinite values
zero_range(c(1, Inf)) # FALSE
zero_range(c(-Inf, Inf)) # FALSE
zero_range(c(Inf, Inf)) # TRUE