Performance of the bit package


A performance example

Before we measure performance of the main functionality of the package, note that something simple as ‘(a:b)[-i]’ can and has been accelerated in this package:

a <- 1L
b <- 1e7L
i <- sample(a:b, 1e3)
x <- c(
  R = median(microbenchmark((a:b)[-i], times=times)$time),
  bit = median(microbenchmark(bit_rangediff(c(a, b), i), times=times)$time),
  merge = median(microbenchmark(merge_rangediff(c(a, b), bit_sort(i)), times=times)$time)
)
knitr::kable(as.data.frame(as.list(x / x["R"] * 100)), caption="% of time relative to R", digits=1)
% of time relative to R
R bit merge
100 25.6 25.5

The vignette is compiled with the following performance settings: 5 replications with domain size small 1000 and big 10^{6}, sample size small 1000 and big 10^{6}.

Boolean data types

“A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.” “Il semble que la perfection soit atteinte non quand il n’y a plus rien à ajouter, mais quand il n’y a plus rien à retrancher” (Antoine de St. Exupery, Terre des Hommes (Gallimard, 1939), p. 60.)

We compare memory consumption (n=1e+06) and runtime (median of 5 replications) of the different booltypes for the following filter scenarios:

knitr::kable(
  data.frame(
    coin="random 50%",
    often="random 99%",
    rare="random 1%",
    chunk="contiguous chunk of 5%"
  ),
  caption="selection characteristic"
)
selection characteristic
coin often rare chunk
random 50% random 99% random 1% contiguous chunk of 5%
# nolint end: strings_as_factors_linter.

There are substantial savings in skewed filter situations:

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'rare' scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘rare’ scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'often' scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘often’ scenario

Even in non-skewed situations the new booltypes are competitive:

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the 'coin' scenario

% size and execution time for bit (b) and bitwhich (w) relative to logical (R) in the ‘coin’ scenario

Detailed tables follow.

% memory consumption of filter

% bytes of logical
coin often rare chunk
logical 100.0 100.0 100.0 100.0
bit 3.2 3.2 3.2 3.2
bitwhich 50.0 1.0 1.0 5.0
which 50.0 99.0 1.0 5.0
ri NA NA NA 0.0

% time extracting

% time of logical
coin often rare chunk
logical 21.2 5.7 6.5 NA
bit 18.9 19.1 21.3 NA
bitwhich 25.7 51.5 6.8 NA
which NA NA NA NA
ri NA NA NA NA

% time assigning

% time of logical
coin often rare chunk
logical 39.4 39.7 39.8 NA
bit 84.3 16.9 11.4 NA
bitwhich 193.0 51.3 45.4 NA
which NA NA NA NA
ri NA NA NA NA

% time subscripting with ‘which’

% time of logical
coin often rare chunk
logical 27.1 35.1 1.2 NA
bit 22.2 44.0 0.8 NA
bitwhich 31.0 66.8 1.8 NA
which NA NA NA NA
ri NA NA NA NA

% time assigning with ‘which’

% time of logical
coin often rare chunk
logical 19.7 38.5 0.8 NA
bit 16.4 39.4 0.6 NA
bitwhich 38.6 22.5 2.7 NA
which NA NA NA NA
ri NA NA NA NA

% time Boolean NOT

% time for Boolean NOT
coin often rare chunk
logical 18.9 19.5 18.6 18.6
bit 1.0 0.9 1.0 0.9
bitwhich 11.5 0.9 0.9 2.3
which NA NA NA NA
ri NA NA NA NA

% time Boolean AND

% time for Boolean &
coin often rare chunk
logical 100.0 27.7 27.4 25.7
bit 4.9 5.0 5.0 4.9
bitwhich 18.8 5.6 5.1 6.6
which NA NA NA NA
ri NA NA NA NA

% time Boolean OR

% time for Boolean |
coin often rare chunk
logical 98.1 27.5 25.3 24.9
bit 4.8 4.9 4.8 5.0
bitwhich 40.4 4.9 5.5 8.2
which NA NA NA NA
ri NA NA NA NA

% time Boolean EQUALITY

% time for Boolean ==
coin often rare chunk
logical 26.6 26.7 26.4 26.6
bit 4.7 4.6 4.8 4.8
bitwhich 18.0 4.9 4.9 6.4
which NA NA NA NA
ri NA NA NA NA

% time Boolean XOR

% time for Boolean !=
coin often rare chunk
logical 25.1 25.1 25.0 25.2
bit 5.2 5.0 5.1 5.0
bitwhich 15.1 4.9 4.8 5.7
which NA NA NA NA
ri NA NA NA NA

% time Boolean SUMMARY

% time for Boolean summary
coin often
logical 89.1 21.8
bit 5.3 5.5

Fast methods for integer set operations

“The space-efficient structure of bitmaps dramatically reduced the run time of sorting” (Jon Bently, Programming Pearls, Cracking the oyster, p. 7)

Execution time for R (R) and bit (b)

Execution time for R (R) and bit (b)

Execution time for R, bit and merge relative to most expensive R in 'unsorted bigbig' scenario

Execution time for R, bit and merge relative to most expensive R in ‘unsorted bigbig’ scenario

Execution time for R, bit and merge in 'sorted bigbig' scenario

Execution time for R, bit and merge in ‘sorted bigbig’ scenario

% time for sorting

sorted data relative to R’s sort
small big
sort 149 126.7
sortunique 78 38.7
unsorted data relative to R’s sort
small big
sort 27.3 32.0
sortunique 21.6 13.1

% time for unique

sorted data relative to R
small big
bit 111.1 20.7
merge 21.9 8.3
sort 0.0 0.0
unsorted data relative to R
small big
bit 126.9 16.1
merge 107.1 70.0
sort 87.7 62.2

% time for duplicated

sorted data relative to R
small big
bit 198.0 22.9
merge 22.2 8.4
sort 0.0 0.0
unsorted data relative to R
small big
bit 198.3 17.0
merge 141.0 77.3
sort 118.5 69.6

% time for anyDuplicated

sorted data relative to R
small big
bit 125.9 26.8
merge 21.0 4.2
sort 0.0 0.0
unsorted data relative to R
small big
bit 119.3 18.7
merge 142.7 82.1
sort 120.8 78.5

% time for sumDuplicated

sorted data relative to R
small big
bit 100.1 20.8
merge 17.0 6.6
sort 0.0 0.0
unsorted data relative to R
small big
bit 105.8 16.6
merge 127.5 77.0
sort 108.1 70.4

% time for match

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit NA NA NA NA
merge 25.4 0 23.6 5.8
sort 0.0 0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit NA NA NA NA
merge 184.3 74.7 82.6 78.8
sort 162.0 74.7 73.9 74.6

% time for in

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 152.3 6.1 14.3 17.7
merge 20.1 0.0 16.3 4.2
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 141.1 4.7 6.5 13.2
merge 179.5 69.2 71.3 77.6
sort 160.2 69.1 63.9 73.7

% time for notin

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 209.9 6.4 13.0 15.9
merge 20.0 0.0 4.9 4.7
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 226.4 4.9 6.1 12.0
merge 173.6 69.3 59.8 67.2
sort 153.3 69.2 57.6 63.1

% time for union

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 33.8 17.0 20.9 14.3
merge 24.3 13.9 12.3 10.0
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 34.3 14.3 20.2 12.9
merge 62.9 34.0 48.6 42.9
sort 39.2 25.0 35.7 33.3

% time for intersect

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 28.7 5.1 8.1 7.2
merge 8.6 0.0 0.0 3.7
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 26.7 4.6 5.3 7.3
merge 46.0 35.6 32.8 36.5
sort 37.4 35.6 32.8 32.4

% time for setdiff

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 26.4 2.6 18.3 7.2
merge 9.7 0.0 15.0 5.0
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 23.3 2.3 12.8 6.0
merge 50.6 34.9 42.2 40.9
sort 41.3 34.8 30.9 35.9

% time for symdiff

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 19.2 10.4 11 6.5
merge 4.2 4.7 5 2.4
sort 0.0 0.0 0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 21.3 8.5 8.1 5.7
merge 24.9 17.0 16.0 19.8
sort 20.0 12.5 11.7 17.4

% time for setequal

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 48.9 50.7 13.1 9.2
merge 12.0 11.6 7.0 4.9
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 50.6 48.3 8.0 9.2
merge 68.8 24105.7 24.9 51.4
sort 55.5 24094.2 19.5 45.1

% time for setearly

sorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 49.9 3.8 14.5 14.2
merge 12.4 0.0 0.1 7.6
sort 0.0 0.0 0.0 0.0
unsorted data relative to R
smallsmall smallbig bigsmall bigbig
bit 58.0 2.4 5.0 8.1
merge 75.1 37.2 78.1 44.9
sort 61.0 37.1 78.0 39.4