Title: | High Precision Timing of R Expressions |
---|---|
Description: | Tools to accurately benchmark and analyze execution times for R expressions. |
Authors: | Jim Hester [aut], Davis Vaughan [aut, cre], Drew Schmidt [ctb] (read_proc_file implementation), Posit Software, PBC [cph, fnd] |
Maintainer: | Davis Vaughan <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.3.9000 |
Built: | 2024-12-04 04:11:29 UTC |
Source: | https://github.com/r-lib/bench |
This is typically needed only if you are performing additional manipulations
after calling mark()
.
as_bench_mark(x)
as_bench_mark(x)
x |
Object to be coerced |
Construct, manipulate and display vectors of elapsed times in seconds. These are numeric vectors, so you can compare them numerically, but they can also be compared to human readable values such as '10ms'.
as_bench_time(x)
as_bench_time(x)
x |
A numeric or character vector. Character representations can use shorthand sizes (see examples). |
as_bench_time("1ns") as_bench_time("1") as_bench_time("1us") as_bench_time("1ms") as_bench_time("1s") as_bench_time("100ns") < "1ms" sum(as_bench_time(c("1MB", "5MB", "500KB")))
as_bench_time("1ns") as_bench_time("1") as_bench_time("1us") as_bench_time("1ms") as_bench_time("1s") as_bench_time("100ns") < "1ms" sum(as_bench_time(c("1MB", "5MB", "500KB")))
Autoplot method for bench_mark objects
autoplot.bench_mark( object, type = c("beeswarm", "jitter", "ridge", "boxplot", "violin"), ... ) ## S3 method for class 'bench_mark' plot(x, ..., type = c("beeswarm", "jitter", "ridge", "boxplot", "violin"), y)
autoplot.bench_mark( object, type = c("beeswarm", "jitter", "ridge", "boxplot", "violin"), ... ) ## S3 method for class 'bench_mark' plot(x, ..., type = c("beeswarm", "jitter", "ridge", "boxplot", "violin"), y)
object |
A |
type |
The type of plot. Plotting geoms used for each type are
|
... |
Additional arguments passed to the plotting geom. |
x |
A |
y |
Ignored, required for compatibility with the |
This function requires some optional dependencies. ggplot2, tidyr, and depending on the plot type ggbeeswarm, ggridges.
For type
of beeswarm
and jitter
the points are colored by the highest
level garbage collection performed during each iteration.
For plots with 2 parameters ggplot2::facet_grid()
is used to construct a
2d facet. For other numbers of parameters ggplot2::facet_wrap()
is used
instead.
dat <- data.frame(x = runif(10000, 1, 1000), y=runif(10000, 1, 1000)) res <- bench::mark( dat[dat$x > 500, ], dat[which(dat$x > 500), ], subset(dat, x > 500)) if (require(ggplot2) && require(tidyr) && require(ggbeeswarm)) { # Beeswarm plot autoplot(res) # ridge (joyplot) autoplot(res, "ridge") # If you want to have the plots ordered by execution time you can do so by # ordering factor levels in the expressions. if (require(dplyr) && require(forcats)) { res %>% mutate(expression = forcats::fct_reorder(as.character(expression), min, .desc = TRUE)) %>% as_bench_mark() %>% autoplot("violin") } }
dat <- data.frame(x = runif(10000, 1, 1000), y=runif(10000, 1, 1000)) res <- bench::mark( dat[dat$x > 500, ], dat[which(dat$x > 500), ], subset(dat, x > 500)) if (require(ggplot2) && require(tidyr) && require(ggbeeswarm)) { # Beeswarm plot autoplot(res) # ridge (joyplot) autoplot(res, "ridge") # If you want to have the plots ordered by execution time you can do so by # ordering factor levels in the expressions. if (require(dplyr) && require(forcats)) { res %>% mutate(expression = forcats::fct_reorder(as.character(expression), min, .desc = TRUE)) %>% as_bench_mark() %>% autoplot("violin") } }
Construct, manipulate and display vectors of byte sizes. These are numeric vectors, so you can compare them numerically, but they can also be compared to human readable values such as '10MB'.
as_bench_bytes(x) bench_bytes(x)
as_bench_bytes(x) bench_bytes(x)
x |
A numeric or character vector. Character representations can use shorthand sizes (see examples). |
These memory sizes are always assumed to be base 1024, rather than 1000.
bench_bytes("1") bench_bytes("1K") bench_bytes("1Kb") bench_bytes("1KiB") bench_bytes("1MB") bench_bytes("1KB") < "1MB" sum(bench_bytes(c("1MB", "5MB", "500KB")))
bench_bytes("1") bench_bytes("1K") bench_bytes("1Kb") bench_bytes("1KiB") bench_bytes("1MB") bench_bytes("1KB") < "1MB" sum(bench_bytes(c("1MB", "5MB", "500KB")))
Uses OS system APIs to return the load average for the past 1, 5 and 15 minutes.
bench_load_average()
bench_load_average()
Measure memory that an expression used.
bench_memory(expr)
bench_memory(expr)
expr |
A expression to be measured. |
A tibble with two columns
The total amount of memory allocated
The raw memory allocations as parsed by profmem::readRprofmem()
if (capabilities("profmem")) { bench_memory(1 + 1:10000) }
if (capabilities("profmem")) { bench_memory(1 + 1:10000) }
The memory reported here will likely differ from that reported by gc()
, as
this includes all memory from the R process, including any child processes
and memory allocated outside R's garbage collector heap.
bench_process_memory()
bench_process_memory()
The OS APIs used are as follows
Measure Process CPU and real time that an expression used.
bench_time(expr)
bench_time(expr)
expr |
A expression to be timed. |
On some systems (such as macOS) the process clock has lower precision than the realtime clock, as a result there may be cases where the process time is larger than the real time for fast expressions.
A bench_time
object with two values.
process
- The process CPU usage of the expression evaluation.
real
- The wallclock time of the expression evaluation.
bench_memory()
To measure memory allocations for a given expression.
# This will use ~.5 seconds of real time, but very little process time. bench_time(Sys.sleep(.5))
# This will use ~.5 seconds of real time, but very little process time. bench_time(Sys.sleep(.5))
Time is expressed as seconds since some arbitrary time in the past; it is not correlated in any way to the time of day, and thus is not subject to resetting or drifting. The hi-res timer is ideally suited to performance measurement tasks, where cheap, accurate interval timing is required.
hires_time()
hires_time()
hires_time() # R rounds doubles to 7 digits by default, see greater precision by setting # the digits argument when printing print(hires_time(), digits = 20) # Generally used by recording two times and then subtracting them start <- hires_time() end <- hires_time() elapsed <- end - start elapsed
hires_time() # R rounds doubles to 7 digits by default, see greater precision by setting # the digits argument when printing print(hires_time(), digits = 20) # Generally used by recording two times and then subtracting them start <- hires_time() end <- hires_time() elapsed <- end - start elapsed
bench_mark
objects in knitr documentsBy default, data columns (result
, memory
, time
, gc
) are omitted when
printing in knitr. If you would like to include these columns, set the knitr
chunk option bench.all_columns = TRUE
.
knit_print.bench_mark(x, ..., options)
knit_print.bench_mark(x, ..., options)
x |
An R object to be printed |
... |
Additional arguments passed to the S3 method. Currently ignored,
except two optional arguments |
options |
A list of knitr chunk options set in the currently evaluated chunk. |
You can set bench.all_columns = TRUE
to show all columns of the bench mark
object.
```{r, bench.all_columns = TRUE} bench::mark( subset(mtcars, cyl == 3), mtcars[mtcars$cyl == 3, ] ) ```
Benchmark a list of quoted expressions. Each expression will always run at least twice, once to measure the memory allocation and store results and one or more times to measure timing.
mark( ..., min_time = 0.5, iterations = NULL, min_iterations = 1, max_iterations = 10000, check = TRUE, memory = capabilities("profmem"), filter_gc = TRUE, relative = FALSE, time_unit = NULL, exprs = NULL, env = parent.frame() )
mark( ..., min_time = 0.5, iterations = NULL, min_iterations = 1, max_iterations = 10000, check = TRUE, memory = capabilities("profmem"), filter_gc = TRUE, relative = FALSE, time_unit = NULL, exprs = NULL, env = parent.frame() )
... |
Expressions to benchmark, if named the |
min_time |
The minimum number of seconds to run each expression, set to
|
iterations |
If not |
min_iterations |
Each expression will be evaluated a minimum of |
max_iterations |
Each expression will be evaluated a maximum of |
check |
Check if results are consistent. If |
memory |
If |
filter_gc |
If |
relative |
If |
time_unit |
If |
exprs |
A list of quoted expressions. If supplied overrides expressions
defined in |
env |
The environment which to evaluate the expressions |
A tibble with the additional summary columns. The following summary columns are computed
expression
- bench_expr
The deparsed expression that was evaluated
(or its name if one was provided).
min
- bench_time
The minimum execution time.
median
- bench_time
The sample median of execution time.
itr/sec
- double
The estimated number of executions performed per
second.
mem_alloc
- bench_bytes
Total amount of memory allocated by R while
running the expression. Memory allocated outside the R heap, e.g. by
malloc()
or new
directly is not tracked, take care to avoid
misinterpreting the results if running code that may do this.
gc/sec
- double
The number of garbage collections per second.
n_itr
- integer
Total number of iterations after filtering
garbage collections (if filter_gc == TRUE
).
n_gc
- double
Total number of garbage collections performed over all
iterations. This is a psudo-measure of the pressure on the garbage collector, if
it varies greatly between to alternatives generally the one with fewer
collections will cause fewer allocation in real usage.
total_time
- bench_time
The total time to perform the benchmarks.
result
- list
A list column of the object(s) returned by the
evaluated expression(s).
memory
- list
A list column with results from Rprofmem()
.
time
- list
A list column of bench_time
vectors for each evaluated
expression.
gc
- list
A list column with tibbles containing the level of
garbage collection (0-2, columns) for each iteration (rows).
press()
to run benchmarks across a grid of parameters.
dat <- data.frame(x = runif(100, 1, 1000), y=runif(10, 1, 1000)) mark( min_time = .1, dat[dat$x > 500, ], dat[which(dat$x > 500), ], subset(dat, x > 500))
dat <- data.frame(x = runif(100, 1, 1000), y=runif(10, 1, 1000)) mark( min_time = .1, dat[dat$x > 500, ], dat[which(dat$x > 500), ], subset(dat, x > 500))
press()
is used to run mark()
across a grid of parameters and
then press the results together.
The parameters you want to set are given as named arguments and a grid of all possible combinations is automatically created.
The code to setup and benchmark is given by one unnamed expression (often
delimited by \{
).
If replicates are desired a dummy variable can be used, e.g. rep = 1:5
for
replicates.
press(..., .grid = NULL)
press(..., .grid = NULL)
... |
If named, parameters to define, if unnamed the expression to run. Only one unnamed expression is permitted. |
.grid |
A pre-built grid of values to use, typically a data.frame or tibble. This is useful if you only want to benchmark a subset of all possible combinations. |
# Helper function to create a simple data.frame of the specified dimensions create_df <- function(rows, cols) { as.data.frame(setNames( replicate(cols, runif(rows, 1, 1000), simplify = FALSE), rep_len(c("x", letters), cols))) } # Run 4 data sizes across 3 samples with 2 replicates (24 total benchmarks) press( rows = c(1000, 10000), cols = c(10, 100), rep = 1:2, { dat <- create_df(rows, cols) bench::mark( min_time = .05, bracket = dat[dat$x > 500, ], which = dat[which(dat$x > 500), ], subset = subset(dat, x > 500) ) } )
# Helper function to create a simple data.frame of the specified dimensions create_df <- function(rows, cols) { as.data.frame(setNames( replicate(cols, runif(rows, 1, 1000), simplify = FALSE), rep_len(c("x", letters), cols))) } # Run 4 data sizes across 3 samples with 2 replicates (24 total benchmarks) press( rows = c(1000, 10000), cols = c(10, 100), rep = 1:2, { dat <- create_df(rows, cols) bench::mark( min_time = .05, bracket = dat[dat$x > 500, ], which = dat[which(dat$x > 500), ], subset = subset(dat, x > 500) ) } )
Summarize mark results.
## S3 method for class 'bench_mark' summary(object, filter_gc = TRUE, relative = FALSE, time_unit = NULL, ...)
## S3 method for class 'bench_mark' summary(object, filter_gc = TRUE, relative = FALSE, time_unit = NULL, ...)
object |
bench_mark object to summarize. |
filter_gc |
If |
relative |
If |
time_unit |
If |
... |
Additional arguments ignored. |
If filter_gc == TRUE
(the default) runs that contain a garbage
collection will be removed before summarizing. This is most useful for fast
expressions when the majority of runs do not contain a gc. Call
summary(filter_gc = FALSE)
if you would like to compute summaries with
these times, such as expressions with lots of allocations when all or most
runs contain a gc.
A tibble with the additional summary columns. The following summary columns are computed
expression
- bench_expr
The deparsed expression that was evaluated
(or its name if one was provided).
min
- bench_time
The minimum execution time.
median
- bench_time
The sample median of execution time.
itr/sec
- double
The estimated number of executions performed per
second.
mem_alloc
- bench_bytes
Total amount of memory allocated by R while
running the expression. Memory allocated outside the R heap, e.g. by
malloc()
or new
directly is not tracked, take care to avoid
misinterpreting the results if running code that may do this.
gc/sec
- double
The number of garbage collections per second.
n_itr
- integer
Total number of iterations after filtering
garbage collections (if filter_gc == TRUE
).
n_gc
- double
Total number of garbage collections performed over all
iterations. This is a psudo-measure of the pressure on the garbage collector, if
it varies greatly between to alternatives generally the one with fewer
collections will cause fewer allocation in real usage.
total_time
- bench_time
The total time to perform the benchmarks.
result
- list
A list column of the object(s) returned by the
evaluated expression(s).
memory
- list
A list column with results from Rprofmem()
.
time
- list
A list column of bench_time
vectors for each evaluated
expression.
gc
- list
A list column with tibbles containing the level of
garbage collection (0-2, columns) for each iteration (rows).
dat <- data.frame(x = runif(10000, 1, 1000), y=runif(10000, 1, 1000)) # `bench::mark()` implicitly calls summary() automatically results <- bench::mark( dat[dat$x > 500, ], dat[which(dat$x > 500), ], subset(dat, x > 500)) # However you can also do so explicitly to filter gc differently. summary(results, filter_gc = FALSE) # Or output relative times summary(results, relative = TRUE)
dat <- data.frame(x = runif(10000, 1, 1000), y=runif(10000, 1, 1000)) # `bench::mark()` implicitly calls summary() automatically results <- bench::mark( dat[dat$x > 500, ], dat[which(dat$x > 500), ], subset(dat, x > 500)) # However you can also do so explicitly to filter gc differently. summary(results, filter_gc = FALSE) # Or output relative times summary(results, relative = TRUE)
Given an block of expressions in {}
workout()
individually times each
expression in the group. workout_expressions()
is a lower level function most
useful when reading lists of calls from a file.
workout(expr, description = NULL) workout_expressions(exprs, env = parent.frame(), description = NULL)
workout(expr, description = NULL) workout_expressions(exprs, env = parent.frame(), description = NULL)
expr |
one or more expressions to workout, use |
description |
A name to label each expression, if not supplied the deparsed expression will be used. |
exprs |
A list of calls to measure. |
env |
The environment in which the expressions should be evaluated. |
workout({ x <- 1:1000 evens <- x %% 2 == 0 y <- x[evens] length(y) length(which(evens)) sum(evens) }) # The equivalent to the above, reading the code from a file workout_expressions(as.list(parse(system.file("examples/exprs.R", package = "bench"))))
workout({ x <- 1:1000 evens <- x %% 2 == 0 y <- x[evens] length(y) length(which(evens)) sum(evens) }) # The equivalent to the above, reading the code from a file workout_expressions(as.list(parse(system.file("examples/exprs.R", package = "bench"))))