Performance of the bit package


A performance example

Before we measure performance of the main functionality of the package, note that something simple as ‘(a:b)[-i]’ can and has been accelerated in this package:

a <- 1L
b <- 1e7L
i <- sample(a:b, 1e3)
x <- c(
  R = median(microbenchmark((a:b)[-i], times=times)$time),
  bit = median(microbenchmark(bit_rangediff(c(a, b), i), times=times)$time),
  merge = median(microbenchmark(merge_rangediff(c(a, b), bit_sort(i)), times=times)$time)
)
knitr::kable(as.data.frame(as.list(x / x["R"] * 100)), caption="% of time relative to R", digits=1)
% of time relative to R
R bit merge
100 24.9 25

The vignette is compiled with the following performance settings: 5 replications with domain size small 1000 and big 10^{6}, sample size small 1000 and big 10^{6}.

Boolean data types

“A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.” “Il semble que la perfection soit atteinte non quand il n’y a plus rien à ajouter, mais quand il n’y a plus rien à retrancher” (Antoine de St. Exupery, Terre des Hommes (Gallimard, 1939), p. 60.)

We compare memory consumption (n=1e+06) and runtime (median of 5 replications) of the different booltypes for the following filter scenarios:

knitr::kable(
  data.frame(
    coin="random 50%",
    often="random 99%",
    rare="random 1%",
    chunk="contiguous chunk of 5%"
  ),
  caption="selection characteristic"
)
selection characteristic
coin often rare chunk
random 50% random 99% random 1% contiguous chunk of 5%
# nolint end: strings_as_factors_linter.

There are substantial savings in skewed filter situations:

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'rare' scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘rare’ scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'often' scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘often’ scenario

Even in non-skewed situations the new booltypes are competitive:

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'coin' scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘coin’ scenario

Detailed tables follow.

% memory consumption of filter

% bytes of logical
coin often rare chunk
logical 100.0 100.0 100.0 100.0
bit 3.2 3.2 3.2 3.2
bitwhich 50.0 1.0 1.0 5.0
which 50.1 99.0 1.0 5.0
ri NA NA NA 0.0

% time extracting

% time of logical
coin often rare chunk
logical 8.1 6.2 6.8 NA
bit 18.6 18.8 19.0 NA
bitwhich 177.9 51.9 7.1 NA
which NA NA NA NA
ri NA NA NA NA

% time assigning

% time of logical
coin often rare chunk
logical 45.9 45.4 45.2 NA
bit 83.2 16.4 10.8 NA
bitwhich 188.1 49.8 45.9 NA
which NA NA NA NA
ri NA NA NA NA

% time subscripting with ‘which’

% time of logical
coin often rare chunk
logical 32.7 34.8 1.2 NA
bit 21.9 43.3 0.9 NA
bitwhich 32.5 59.9 1.7 NA
which NA NA NA NA
ri NA NA NA NA

% time assigning with ‘which’

% time of logical
coin often rare chunk
logical 19.5 38.2 0.6 NA
bit 17.9 40.4 0.5 NA
bitwhich 144.0 22.9 2.5 NA
which NA NA NA NA
ri NA NA NA NA

% time Boolean NOT

% time for Boolean NOT
coin often rare chunk
logical 18.5 19.5 23.8 18.8
bit 0.8 0.7 0.7 0.9
bitwhich 8.6 0.7 0.8 1.6
which NA NA NA NA
ri NA NA NA NA

% time Boolean AND

% time for Boolean &
coin often rare chunk
logical 99.7 28.2 27.1 25.5
bit 4.4 5.1 4.5 4.6
bitwhich 32.9 5.2 4.6 6.0
which NA NA NA NA
ri NA NA NA NA

% time Boolean OR

% time for Boolean |
coin often rare chunk
logical 100.0 33.3 25.1 25.0
bit 4.5 4.7 4.9 4.3
bitwhich 18.1 4.3 5.3 7.5
which NA NA NA NA
ri NA NA NA NA

% time Boolean EQUALITY

% time for Boolean ==
coin often rare chunk
logical 26.1 26.2 26.2 26.1
bit 4.5 4.5 4.2 4.5
bitwhich 17.6 4.5 4.7 5.6
which NA NA NA NA
ri NA NA NA NA

% time Boolean XOR

% time for Boolean !=
coin often rare chunk
logical 24.8 25.0 24.8 24.8
bit 4.4 4.3 4.3 4.4
bitwhich 17.9 4.5 4.5 5.9
which NA NA NA NA
ri NA NA NA NA

% time Boolean SUMMARY

% time for Boolean summary
coin often
logical 88.4 21.2
bit 4.3 4.5

Fast methods for integer set operations

“The space-efficient structure of bitmaps dramatically reduced the run time of sorting” (Jon Bently, Programming Pearls, Cracking the oyster, p. 7)

Execution time for R (R) and bit (b)

Execution time for R (R) and bit (b)

Execution time for R, bit and merge relative to most expensive R in 'unsorted bigbig' scenario

Execution time for R, bit and merge relative to most expensive R in ‘unsorted bigbig’ scenario

Execution time for R, bit and merge in 'sorted bigbig' scenario

Execution time for R, bit and merge in ‘sorted bigbig’ scenario

% time for sorting

sorted data relative to R’s sort
small big
sort 130.3 172.8
sortunique 83.9 34.9
unsorted data relative to R’s sort
small big
sort 24.2 29.6
sortunique 16.9 12.8

% time for unique

sorted data relative to R
small big
bit 123.9 15.3
merge 26.3 6.2
sort 0.0 0.0
unsorted data relative to R
small big
bit 108.4 15.5
merge 116.0 58.3
sort 91.5 50.9

% time for duplicated

sorted data relative to R
small big
bit 264.0 15.2
merge 24.7 4.7
sort 0.0 0.0
unsorted data relative to R
small big
bit 228.8 15.4
merge 140.3 59.8
sort 115.2 53.8

% time for anyDuplicated

sorted data relative to R
small big
bit 145.1 18.8
merge 18.6 2.4
sort 0.0 0.0
unsorted data relative to R
small big
bit 145.3 17.4
merge 150.6 64.9
sort 130.5 62.1

% time for sumDuplicated

sorted data relative to R
small big
bit 116.6 15.1
merge 15.0 3.2
sort 0.0 0.0
unsorted data relative to R
small big
bit 104.6 15.1
merge 106.9 58.4
sort 92.9 54.4

% time for match

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit NA NA NA NA
merge 29 0 22.9 3.1
sort 0 0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit NA NA NA NA
merge 215.2 53 71.2 60.5
sort 185.7 53 62.6 56.8

% time for in

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 167.6 4.2 13.9 12.4
merge 20.6 0.0 16.3 3.3
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 171.3 4.7 6.6 10.8
merge 188.3 56.7 61.9 54.8
sort 166.5 56.7 54.3 51.4

% time for notin

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 222.9 4.2 12.8 10.8
merge 19.4 0.0 7.7 2.5
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 269.0 4.8 6.1 10.7
merge 164.3 57.6 52.7 51.5
sort 144.4 57.6 49.1 48.6

% time for union

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 83.0 27.7 24.3 24.3
merge 51.9 16.5 15.0 18.5
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 77.3 28.5 30.4 9.7
merge 131.1 61.0 65.3 27.8
sort 81.0 42.7 45.1 21.5

% time for intersect

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 48.5 7.7 5.7 8.8
merge 13.5 0.0 0.0 4.5
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 44.7 8.8 5.0 5.9
merge 71.7 56.6 26.4 30.3
sort 58.5 56.5 26.4 26.4

% time for setdiff

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 57.2 4.1 14.3 7.5
merge 16.8 0.0 11.7 4.6
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 45.1 4.7 11.8 6.4
merge 88.5 60.0 35.4 37.3
sort 72.0 59.9 24.7 32.5

% time for symdiff

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 40.2 11.2 7.7 6.6
merge 8.8 5.1 4.5 2.1
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 41.6 9.3 7.2 6.2
merge 43.8 16.4 16.5 18.4
sort 34.5 11.5 11.5 16.1

% time for setequal

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 88.7 90.2 10.3 9.3
merge 22.9 19.1 4.9 4.4
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 94.5 83.1 6.5 7.7
merge 110.1 35780.5 17.6 36.6
sort 86.3 35762.5 13.6 31.9

% time for setearly

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 89.7 3.2 13.2 6.9
merge 25.4 0.0 0.2 3.3
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 91.2 2.2 4.8 7.8
merge 108.1 29.5 64.0 37.1
sort 82.9 29.5 63.9 32.4