Performance of the bit package


A performance example

Before we measure performance of the main functionality of the package, note that something simple as ‘(a:b)[-i]’ can and has been accelerated in this package:

a <- 1L
b <- 1e7L
i <- sample(a:b, 1e3)
x <- c(
  R = median(microbenchmark((a:b)[-i], times=times)$time),
  bit = median(microbenchmark(bit_rangediff(c(a, b), i), times=times)$time),
  merge = median(microbenchmark(merge_rangediff(c(a, b), bit_sort(i)), times=times)$time)
)
knitr::kable(as.data.frame(as.list(x / x["R"] * 100)), caption="% of time relative to R", digits=1)
% of time relative to R
R bit merge
100 27.8 28

The vignette is compiled with the following performance settings: 5 replications with domain size small 1000 and big 10^{6}, sample size small 1000 and big 10^{6}.

Boolean data types

“A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.” “Il semble que la perfection soit atteinte non quand il n’y a plus rien à ajouter, mais quand il n’y a plus rien à retrancher” (Antoine de St. Exupery, Terre des Hommes (Gallimard, 1939), p. 60.)

We compare memory consumption (n=1e+06) and runtime (median of 5 replications) of the different booltypes for the following filter scenarios:

knitr::kable(
  data.frame(
    coin="random 50%",
    often="random 99%",
    rare="random 1%",
    chunk="contiguous chunk of 5%"
  ),
  caption="selection characteristic"
)
selection characteristic
coin often rare chunk
random 50% random 99% random 1% contiguous chunk of 5%
# nolint end: strings_as_factors_linter.

There are substantial savings in skewed filter situations:

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'rare' scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘rare’ scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'often' scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘often’ scenario

Even in non-skewed situations the new booltypes are competitive:

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'coin' scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘coin’ scenario

Detailed tables follow.

% memory consumption of filter

% bytes of logical
coin often rare chunk
logical 100.0 100.0 100.0 100.0
bit 3.2 3.2 3.2 3.2
bitwhich 50.0 1.0 1.0 5.0
which 50.0 99.0 1.0 5.0
ri NA NA NA 0.0

% time extracting

% time of logical
coin often rare chunk
logical 22.3 5.9 5.7 NA
bit 18.4 18.5 18.5 NA
bitwhich 24.9 50.9 6.4 NA
which NA NA NA NA
ri NA NA NA NA

% time assigning

% time of logical
coin often rare chunk
logical 45.7 45.6 45.9 NA
bit 83.9 16.5 11.3 NA
bitwhich 187.0 47.5 41.0 NA
which NA NA NA NA
ri NA NA NA NA

% time subscripting with ‘which’

% time of logical
coin often rare chunk
logical 25.1 35.4 1.0 NA
bit 23.6 46.2 0.8 NA
bitwhich 32.4 62.9 1.8 NA
which NA NA NA NA
ri NA NA NA NA

% time assigning with ‘which’

% time of logical
coin often rare chunk
logical 19.9 38.7 0.7 NA
bit 18.2 38.4 0.5 NA
bitwhich 39.4 29.5 2.6 NA
which NA NA NA NA
ri NA NA NA NA

% time Boolean NOT

% time for Boolean NOT
coin often rare chunk
logical 18.5 18.6 18.6 18.4
bit 0.9 1.0 0.9 0.9
bitwhich 8.7 0.9 0.9 1.8
which NA NA NA NA
ri NA NA NA NA

% time Boolean AND

% time for Boolean &
coin often rare chunk
logical 100.0 28.7 27.4 25.9
bit 4.4 4.5 4.6 4.5
bitwhich 18.2 5.4 4.7 5.8
which NA NA NA NA
ri NA NA NA NA

% time Boolean OR

% time for Boolean |
coin often rare chunk
logical 98.4 27.4 25.6 25.1
bit 4.8 4.8 4.8 4.7
bitwhich 38.3 4.8 5.6 7.5
which NA NA NA NA
ri NA NA NA NA

% time Boolean EQUALITY

% time for Boolean ==
coin often rare chunk
logical 26.6 26.5 26.6 26.6
bit 4.5 5.2 4.9 4.7
bitwhich 14.3 4.5 4.4 5.4
which NA NA NA NA
ri NA NA NA NA

% time Boolean XOR

% time for Boolean !=
coin often rare chunk
logical 25.2 25.5 25.3 25.3
bit 4.6 4.4 4.6 4.5
bitwhich 17.5 4.5 4.4 5.4
which NA NA NA NA
ri NA NA NA NA

% time Boolean SUMMARY

% time for Boolean summary
coin often
logical 88.2 20.9
bit 3.5 3.6

Fast methods for integer set operations

“The space-efficient structure of bitmaps dramatically reduced the run time of sorting” (Jon Bently, Programming Pearls, Cracking the oyster, p. 7)

Execution time for R (R) and bit (b)

Execution time for R (R) and bit (b)

Execution time for R, bit and merge relative to most expensive R in 'unsorted bigbig' scenario

Execution time for R, bit and merge relative to most expensive R in ‘unsorted bigbig’ scenario

Execution time for R, bit and merge in 'sorted bigbig' scenario

Execution time for R, bit and merge in ‘sorted bigbig’ scenario

% time for sorting

sorted data relative to R’s sort
small big
sort 140.6 174.3
sortunique 78.2 37.9
unsorted data relative to R’s sort
small big
sort 24.0 32.0
sortunique 18.5 11.6

% time for unique

sorted data relative to R
small big
bit 112.7 13.1
merge 24.0 5.4
sort 0.0 0.0
unsorted data relative to R
small big
bit 130.8 15.4
merge 108.5 63.5
sort 85.2 56.2

% time for duplicated

sorted data relative to R
small big
bit 178.9 16.6
merge 22.0 5.2
sort 0.0 0.0
unsorted data relative to R
small big
bit 192.3 16.2
merge 130.7 70.5
sort 106.4 64.2

% time for anyDuplicated

sorted data relative to R
small big
bit 156.2 24.0
merge 19.0 3.4
sort 0.0 0.0
unsorted data relative to R
small big
bit 127.3 17.8
merge 124.5 76.2
sort 108.1 73.0

% time for sumDuplicated

sorted data relative to R
small big
bit 112.2 18.3
merge 12.9 3.9
sort 0.0 0.0
unsorted data relative to R
small big
bit 107.9 16.2
merge 104.7 71.4
sort 91.7 67.0

% time for match

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit NA NA NA NA
merge 31.1 0 10.8 4.7
sort 0.0 0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit NA NA NA NA
merge 191.3 79.3 84.2 78.2
sort 161.1 79.2 81.0 73.4

% time for in

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 158.3 5.8 15.1 17.1
merge 25.0 0.0 13.0 4.0
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 148.5 5.0 5.5 12.6
merge 169.6 71.9 77.7 72.8
sort 146.4 71.9 73.0 69.1

% time for notin

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 242.1 4 6.5 13.6
merge 25.0 0 4.1 3.4
sort 0.0 0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 227.0 5.1 4.9 13.3
merge 148.6 71.4 64.7 68.4
sort 126.1 71.4 61.8 64.9

% time for union

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 88.0 21.0 22.4 26.1
merge 52.8 9.5 9.7 19.3
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 80.8 23.3 21.9 10.5
merge 124.1 68.1 64.7 41.4
sort 74.4 56.5 53.9 32.9

% time for intersect

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 56.7 8.4 4.5 9.9
merge 17.1 0.0 0.0 5.7
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 59.0 9.4 4.0 6.2
merge 79.3 69.9 29.6 37.4
sort 60.9 69.9 29.5 32.7

% time for setdiff

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 61.9 5.4 15.3 10.5
merge 19.8 0.0 6.3 7.3
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 55.3 4.8 11.8 6.6
merge 86.6 69.9 33.1 44.4
sort 67.5 69.9 27.6 38.8

% time for symdiff

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 37.3 8.3 11.2 9.8
merge 8.1 2.5 3.5 3.2
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 35.8 7.7 7.7 7.5
merge 39.8 16.9 17.3 24.0
sort 31.5 14.1 14.4 21.2

% time for setequal

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 79.6 78.8 12 14.3
merge 19.9 16.3 5 6.0
sort 0.0 0.0 0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 80.1 79.7 8.0 8.6
merge 97.4 39276.3 23.1 45.6
sort 77.6 39258.7 18.8 41.0

% time for setearly

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 79.0 3.3 18.3 12.6
merge 21.9 0.0 0.1 5.3
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 77.3 2.3 5.3 7.4
merge 94.3 33.6 80.9 38.4
sort 73.4 33.6 80.9 34.5