Performance of the bit package


A performance example

Before we measure performance of the main functionality of the package, note that something simple as ‘(a:b)[-i]’ can and has been accelerated in this package:

a <- 1L
b <- 1e7L
i <- sample(a:b, 1e3)
x <- c(
  R = median(microbenchmark((a:b)[-i], times=times)$time),
  bit = median(microbenchmark(bit_rangediff(c(a, b), i), times=times)$time),
  merge = median(microbenchmark(merge_rangediff(c(a, b), bit_sort(i)), times=times)$time)
)
knitr::kable(as.data.frame(as.list(x / x["R"] * 100)), caption="% of time relative to R", digits=1)
% of time relative to R
R bit merge
100 26.2 26.2

The vignette is compiled with the following performance settings: 5 replications with domain size small 1000 and big 10^{6}, sample size small 1000 and big 10^{6}.

Boolean data types

“A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.” “Il semble que la perfection soit atteinte non quand il n’y a plus rien à ajouter, mais quand il n’y a plus rien à retrancher” (Antoine de St. Exupery, Terre des Hommes (Gallimard, 1939), p. 60.)

We compare memory consumption (n=1e+06) and runtime (median of 5 replications) of the different booltypes for the following filter scenarios:

knitr::kable(
  data.frame(
    coin="random 50%",
    often="random 99%",
    rare="random 1%",
    chunk="contiguous chunk of 5%"
  ),
  caption="selection characteristic"
)
selection characteristic
coin often rare chunk
random 50% random 99% random 1% contiguous chunk of 5%
# nolint end: strings_as_factors_linter.

There are substantial savings in skewed filter situations:

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'rare' scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘rare’ scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'often' scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘often’ scenario

Even in non-skewed situations the new booltypes are competitive:

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'coin' scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘coin’ scenario

Detailed tables follow.

% memory consumption of filter

% bytes of logical
coin often rare chunk
logical 100.0 100.0 100.0 100.0
bit 3.2 3.2 3.2 3.2
bitwhich 50.0 1.0 1.0 5.0
which 50.0 99.0 1.0 5.0
ri NA NA NA 0.0

% time extracting

% time of logical
coin often rare chunk
logical 6.3 6.2 7.1 NA
bit 18.1 18.2 18.7 NA
bitwhich 173.1 64.2 6.7 NA
which NA NA NA NA
ri NA NA NA NA

% time assigning

% time of logical
coin often rare chunk
logical 39.4 40.3 40.2 NA
bit 80.2 12.9 13.4 NA
bitwhich 182.3 42.5 43.1 NA
which NA NA NA NA
ri NA NA NA NA

% time subscripting with ‘which’

% time of logical
coin often rare chunk
logical 34.2 35.1 1.0 NA
bit 22.4 43.4 0.7 NA
bitwhich 31.5 59.4 1.7 NA
which NA NA NA NA
ri NA NA NA NA

% time assigning with ‘which’

% time of logical
coin often rare chunk
logical 20.9 41.3 0.6 NA
bit 15.9 38.3 0.5 NA
bitwhich 147.0 28.8 2.4 NA
which NA NA NA NA
ri NA NA NA NA

% time Boolean NOT

% time for Boolean NOT
coin often rare chunk
logical 18.9 19.6 18.9 19.1
bit 0.7 1.1 0.7 0.6
bitwhich 8.5 0.7 0.9 1.7
which NA NA NA NA
ri NA NA NA NA

% time Boolean AND

% time for Boolean &
coin often rare chunk
logical 96.9 28.7 27.0 25.6
bit 4.5 4.2 4.6 4.5
bitwhich 30.4 5.4 4.8 6.1
which NA NA NA NA
ri NA NA NA NA

% time Boolean OR

% time for Boolean |
coin often rare chunk
logical 100.0 33.3 25.0 25.0
bit 4.4 4.6 4.4 4.5
bitwhich 18.2 4.3 5.2 7.4
which NA NA NA NA
ri NA NA NA NA

% time Boolean EQUALITY

% time for Boolean ==
coin often rare chunk
logical 26.2 26.2 26.1 26.1
bit 4.5 4.6 4.5 4.4
bitwhich 15.1 4.2 4.3 5.3
which NA NA NA NA
ri NA NA NA NA

% time Boolean XOR

% time for Boolean !=
coin often rare chunk
logical 24.8 24.9 24.8 24.9
bit 4.4 4.4 4.5 4.6
bitwhich 14.0 4.5 4.4 5.3
which NA NA NA NA
ri NA NA NA NA

% time Boolean SUMMARY

% time for Boolean summary
coin often
logical 84.4 17.4
bit 3.5 3.5

Fast methods for integer set operations

“The space-efficient structure of bitmaps dramatically reduced the run time of sorting” (Jon Bently, Programming Pearls, Cracking the oyster, p. 7)

Execution time for R (R) and bit (b)

Execution time for R (R) and bit (b)

Execution time for R, bit and merge relative to most expensive R in 'unsorted bigbig' scenario

Execution time for R, bit and merge relative to most expensive R in ‘unsorted bigbig’ scenario

Execution time for R, bit and merge in 'sorted bigbig' scenario

Execution time for R, bit and merge in ‘sorted bigbig’ scenario

% time for sorting

sorted data relative to R’s sort
small big
sort 115.7 164.2
sortunique 85.7 37.8
unsorted data relative to R’s sort
small big
sort 25.9 32.7
sortunique 16.5 12.6

% time for unique

sorted data relative to R
small big
bit 141.0 23.1
merge 28.7 9.6
sort 0.0 0.0
unsorted data relative to R
small big
bit 117.8 17.6
merge 136.3 69.8
sort 109.8 61.1

% time for duplicated

sorted data relative to R
small big
bit 229.5 25.1
merge 26.8 9.5
sort 0.0 0.0
unsorted data relative to R
small big
bit 227.5 18.5
merge 159.7 78.1
sort 132.4 69.3

% time for anyDuplicated

sorted data relative to R
small big
bit 161.2 31.3
merge 17.2 4.0
sort 0.0 0.0
unsorted data relative to R
small big
bit 154.7 18.8
merge 162.4 73.8
sort 145.2 70.8

% time for sumDuplicated

sorted data relative to R
small big
bit 129.4 24.2
merge 22.5 10.2
sort 0.0 0.0
unsorted data relative to R
small big
bit 103.4 17.3
merge 119.0 74.3
sort 99.2 65.1

% time for match

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit NA NA NA NA
merge 24.7 0 21.5 5.2
sort 0.0 0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit NA NA NA NA
merge 228.5 70.7 73.1 73.8
sort 202.8 70.7 64.9 70.0

% time for in

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 165.2 6.5 14.1 23.5
merge 23.2 0.0 15.9 5.1
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 176.9 5.4 6.6 15.7
merge 209.1 70.9 63.8 73.7
sort 184.6 70.9 56.4 69.9

% time for notin

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 246.1 6.6 13.0 19.4
merge 18.5 0.0 4.1 4.1
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 262.0 5.5 6.1 13.3
merge 177.5 70.7 53.0 62.0
sort 159.2 70.7 51.1 58.8

% time for union

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 36.4 25.5 18.7 17.0
merge 25.7 15.2 11.2 11.6
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 36.5 13.6 20.9 12.1
merge 69.3 31.0 47.3 41.2
sort 43.3 22.3 34.1 32.2

% time for intersect

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 33.2 5.3 6.2 10.5
merge 9.0 0.0 0.0 5.3
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 29.8 5.2 4.5 7.7
merge 50.9 35.4 30.5 34.8
sort 42.0 35.4 30.5 30.5

% time for setdiff

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 30.3 2.7 17.1 9.0
merge 9.2 0.0 13.6 4.9
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 27.7 2.7 12.6 6.2
merge 53.1 35.6 39.9 37.9
sort 44.3 35.6 28.8 33.8

% time for symdiff

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 20.0 10.8 11.3 8.1
merge 3.7 4.8 5.0 2.6
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 21.0 8.6 9.2 6.2
merge 23.0 15.5 16.7 18.7
sort 19.1 11.2 12.0 16.4

% time for setequal

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 56.0 63.7 14.6 15.6
merge 13.7 12.9 7.0 7.4
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 27.7 54.9 8.4 9.6
merge 34.7 25715.0 23.1 47.2
sort 28.1 25702.4 18.0 41.4

% time for setearly

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 54.8 4.2 14.8 14.9
merge 15.0 0.0 0.1 7.1
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 58.9 2.4 5.0 8.9
merge 76.5 33.9 69.0 43.6
sort 60.5 33.8 68.9 38.2