Performance of the bit package


A performance example

Before we measure performance of the main functionality of the package, note that something simple as ‘(a:b)[-i]’ can and has been accelerated in this package:

a <- 1L
b <- 1e7L
i <- sample(a:b, 1e3)
x <- c(
  R = median(microbenchmark((a:b)[-i], times=times)$time),
  bit = median(microbenchmark(bit_rangediff(c(a, b), i), times=times)$time),
  merge = median(microbenchmark(merge_rangediff(c(a, b), bit_sort(i)), times=times)$time)
)
knitr::kable(as.data.frame(as.list(x / x["R"] * 100)), caption="% of time relative to R", digits=1)
% of time relative to R
R bit merge
100 28.2 28.4

The vignette is compiled with the following performance settings: 5 replications with domain size small 1000 and big 10^{6}, sample size small 1000 and big 10^{6}.

Boolean data types

“A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.” “Il semble que la perfection soit atteinte non quand il n’y a plus rien à ajouter, mais quand il n’y a plus rien à retrancher” (Antoine de St. Exupery, Terre des Hommes (Gallimard, 1939), p. 60.)

We compare memory consumption (n=1e+06) and runtime (median of 5 replications) of the different booltypes for the following filter scenarios:

knitr::kable(
  data.frame(
    coin="random 50%",
    often="random 99%",
    rare="random 1%",
    chunk="contiguous chunk of 5%"
  ),
  caption="selection characteristic"
)
selection characteristic
coin often rare chunk
random 50% random 99% random 1% contiguous chunk of 5%
# nolint end: strings_as_factors_linter.

There are substantial savings in skewed filter situations:

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'rare' scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘rare’ scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'often' scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘often’ scenario

Even in non-skewed situations the new booltypes are competitive:

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'coin' scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘coin’ scenario

Detailed tables follow.

% memory consumption of filter

% bytes of logical
coin often rare chunk
logical 100.0 100.0 100.0 100.0
bit 3.2 3.2 3.2 3.2
bitwhich 50.0 1.0 1.0 5.0
which 50.0 99.0 1.0 5.0
ri NA NA NA 0.0

% time extracting

% time of logical
coin often rare chunk
logical 20.4 5.9 6.0 NA
bit 17.9 18.2 18.1 NA
bitwhich 27.3 50.2 6.4 NA
which NA NA NA NA
ri NA NA NA NA

% time assigning

% time of logical
coin often rare chunk
logical 39.1 39.3 39.0 NA
bit 82.9 16.3 10.5 NA
bitwhich 185.3 46.2 39.6 NA
which NA NA NA NA
ri NA NA NA NA

% time subscripting with ‘which’

% time of logical
coin often rare chunk
logical 20.3 35.3 0.9 NA
bit 23.3 44.9 0.9 NA
bitwhich 31.0 59.9 1.7 NA
which NA NA NA NA
ri NA NA NA NA

% time assigning with ‘which’

% time of logical
coin often rare chunk
logical 19.8 39.0 0.6 NA
bit 18.8 40.8 0.5 NA
bitwhich 35.2 29.1 2.0 NA
which NA NA NA NA
ri NA NA NA NA

% time Boolean NOT

% time for Boolean NOT
coin often rare chunk
logical 18.2 18.4 18.6 18.3
bit 0.8 0.8 0.8 0.7
bitwhich 9.4 0.7 0.7 1.5
which NA NA NA NA
ri NA NA NA NA

% time Boolean AND

% time for Boolean &
coin often rare chunk
logical 98.9 27.3 27.1 25.5
bit 4.1 4.4 4.2 4.1
bitwhich 17.7 4.9 4.2 5.5
which NA NA NA NA
ri NA NA NA NA

% time Boolean OR

% time for Boolean |
coin often rare chunk
logical 100.0 33.5 25.2 25.1
bit 4.3 4.2 4.2 4.2
bitwhich 35.7 4.2 4.8 7.0
which NA NA NA NA
ri NA NA NA NA

% time Boolean EQUALITY

% time for Boolean ==
coin often rare chunk
logical 26.2 26.1 26.1 26.2
bit 4.2 4.4 4.3 4.2
bitwhich 13.8 4.2 4.1 4.9
which NA NA NA NA
ri NA NA NA NA

% time Boolean XOR

% time for Boolean !=
coin often rare chunk
logical 24.9 24.8 24.8 24.8
bit 4.2 4.1 4.3 4.3
bitwhich 17.4 4.1 4.2 5.1
which NA NA NA NA
ri NA NA NA NA

% time Boolean SUMMARY

% time for Boolean summary
coin often
logical 87.1 20.5
bit 3.4 3.4

Fast methods for integer set operations

“The space-efficient structure of bitmaps dramatically reduced the run time of sorting” (Jon Bently, Programming Pearls, Cracking the oyster, p. 7)

Execution time for R (R) and bit (b)

Execution time for R (R) and bit (b)

Execution time for R, bit and merge relative to most expensive R in 'unsorted bigbig' scenario

Execution time for R, bit and merge relative to most expensive R in ‘unsorted bigbig’ scenario

Execution time for R, bit and merge in 'sorted bigbig' scenario

Execution time for R, bit and merge in ‘sorted bigbig’ scenario

% time for sorting

sorted data relative to R’s sort
small big
sort 129.9 177.8
sortunique 83.8 37.6
unsorted data relative to R’s sort
small big
sort 25.5 33.9
sortunique 17.4 12.0

% time for unique

sorted data relative to R
small big
bit 122.4 24.2
merge 22.2 9.8
sort 0.0 0.0
unsorted data relative to R
small big
bit 121.2 16.8
merge 124.4 69.4
sort 100.5 61.5

% time for duplicated

sorted data relative to R
small big
bit 203.0 28.1
merge 23.5 8.7
sort 0.0 0.0
unsorted data relative to R
small big
bit 165.7 17.9
merge 107.7 78.2
sort 88.1 71.3

% time for anyDuplicated

sorted data relative to R
small big
bit 155.6 32.3
merge 18.9 3.7
sort 0.0 0.0
unsorted data relative to R
small big
bit 144.9 18.8
merge 134.9 80.0
sort 117.3 77.3

% time for sumDuplicated

sorted data relative to R
small big
bit 111.8 25.7
merge 15.2 5.5
sort 0.0 0.0
unsorted data relative to R
small big
bit 96.0 17.5
merge 99.8 77.2
sort 85.7 72.5

% time for match

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit NA NA NA NA
merge 26.1 0 11.3 7.7
sort 0.0 0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit NA NA NA NA
merge 198.4 80.1 84.3 78.5
sort 171.1 80.1 81.1 73.8

% time for in

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 164.1 6.5 14.8 23.2
merge 22.4 0.0 13.3 5.5
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 167.5 5.6 5.3 14.3
merge 183.9 80.2 77.8 74.0
sort 161.1 80.2 73.1 70.3

% time for notin

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 220.9 6.8 10.4 22.0
merge 17.3 0.0 5.0 5.1
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 408.7 5.7 4.9 13.0
merge 161.3 80.6 64.2 66.8
sort 143.0 80.6 61.9 63.4

% time for union

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 80.8 30.1 30.2 26.3
merge 48.7 13.3 13.3 19.4
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 82.7 23.8 23.6 17.2
merge 136.8 69.6 69.7 67.8
sort 84.6 58.1 58.2 54.2

% time for intersect

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 49.3 13.3 5.3 14.2
merge 16.1 0.0 0.0 8.1
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 54.2 10.3 4.2 6.8
merge 78.3 76.9 31.3 41.7
sort 61.7 76.9 31.3 36.4

% time for setdiff

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 41.8 6.5 20.3 12.2
merge 15.1 0.0 8.5 8.4
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 49.1 4.9 12.4 7.5
merge 92.6 72.3 35.3 50.4
sort 74.6 72.3 29.5 44.0

% time for symdiff

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 36.4 12.9 12.3 10.3
merge 8.3 3.9 3.7 3.3
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 37.0 8.1 8.5 8.0
merge 40.1 17.9 19.0 25.9
sort 31.3 15.0 15.9 23.0

% time for setequal

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 83.8 91.1 14.5 14.9
merge 21.0 17.7 6.0 6.2
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 86.4 71.8 8.5 9.0
merge 101.4 37622.5 24.8 48.0
sort 80.1 37606.7 20.2 43.2

% time for setearly

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 78.8 4.1 18.0 17.3
merge 19.9 0.0 0.1 7.2
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 81.4 2.4 5.4 8.0
merge 97.9 36.4 80.8 42.4
sort 77.1 36.4 80.8 38.1