Boolean data types
“A designer knows he has achieved perfection not when there is
nothing left to add, but when there is nothing left to take away.” “Il
semble que la perfection soit atteinte non quand il n’y a plus rien à
ajouter, mais quand il n’y a plus rien à retrancher” (Antoine de
St. Exupery, Terre des Hommes (Gallimard, 1939), p. 60.)
We compare memory consumption (n=1e+06) and runtime (median of 5
replications) of the different booltype
s for the following
filter scenarios:
knitr::kable(
data.frame(
coin="random 50%",
often="random 99%",
rare="random 1%",
chunk="contiguous chunk of 5%"
),
caption="selection characteristic"
)
selection characteristic
random 50% |
random 99% |
random 1% |
contiguous chunk of 5% |
# nolint end: strings_as_factors_linter.
There are substantial savings in skewed filter situations:
Even in non-skewed situations the new booltypes are competitive:
Detailed tables follow.
% memory consumption of filter
% bytes of logical
logical |
100.0 |
100.0 |
100.0 |
100.0 |
bit |
3.2 |
3.2 |
3.2 |
3.2 |
bitwhich |
50.0 |
1.0 |
1.0 |
5.0 |
which |
50.0 |
99.0 |
1.0 |
5.0 |
ri |
NA |
NA |
NA |
0.0 |
% time assigning
% time of logical
logical |
39.1 |
39.3 |
39.0 |
NA |
bit |
82.9 |
16.3 |
10.5 |
NA |
bitwhich |
185.3 |
46.2 |
39.6 |
NA |
which |
NA |
NA |
NA |
NA |
ri |
NA |
NA |
NA |
NA |
% time subscripting with ‘which’
% time of logical
logical |
20.3 |
35.3 |
0.9 |
NA |
bit |
23.3 |
44.9 |
0.9 |
NA |
bitwhich |
31.0 |
59.9 |
1.7 |
NA |
which |
NA |
NA |
NA |
NA |
ri |
NA |
NA |
NA |
NA |
% time assigning with ‘which’
% time of logical
logical |
19.8 |
39.0 |
0.6 |
NA |
bit |
18.8 |
40.8 |
0.5 |
NA |
bitwhich |
35.2 |
29.1 |
2.0 |
NA |
which |
NA |
NA |
NA |
NA |
ri |
NA |
NA |
NA |
NA |
% time Boolean NOT
% time for Boolean NOT
logical |
18.2 |
18.4 |
18.6 |
18.3 |
bit |
0.8 |
0.8 |
0.8 |
0.7 |
bitwhich |
9.4 |
0.7 |
0.7 |
1.5 |
which |
NA |
NA |
NA |
NA |
ri |
NA |
NA |
NA |
NA |
% time Boolean AND
% time for Boolean &
logical |
98.9 |
27.3 |
27.1 |
25.5 |
bit |
4.1 |
4.4 |
4.2 |
4.1 |
bitwhich |
17.7 |
4.9 |
4.2 |
5.5 |
which |
NA |
NA |
NA |
NA |
ri |
NA |
NA |
NA |
NA |
% time Boolean OR
% time for Boolean |
logical |
100.0 |
33.5 |
25.2 |
25.1 |
bit |
4.3 |
4.2 |
4.2 |
4.2 |
bitwhich |
35.7 |
4.2 |
4.8 |
7.0 |
which |
NA |
NA |
NA |
NA |
ri |
NA |
NA |
NA |
NA |
% time Boolean EQUALITY
% time for Boolean ==
logical |
26.2 |
26.1 |
26.1 |
26.2 |
bit |
4.2 |
4.4 |
4.3 |
4.2 |
bitwhich |
13.8 |
4.2 |
4.1 |
4.9 |
which |
NA |
NA |
NA |
NA |
ri |
NA |
NA |
NA |
NA |
% time Boolean XOR
% time for Boolean !=
logical |
24.9 |
24.8 |
24.8 |
24.8 |
bit |
4.2 |
4.1 |
4.3 |
4.3 |
bitwhich |
17.4 |
4.1 |
4.2 |
5.1 |
which |
NA |
NA |
NA |
NA |
ri |
NA |
NA |
NA |
NA |
% time Boolean SUMMARY
% time for Boolean summary
logical |
87.1 |
20.5 |
bit |
3.4 |
3.4 |
Fast methods for integer
set operations
“The space-efficient structure of bitmaps dramatically reduced the
run time of sorting” (Jon Bently, Programming Pearls, Cracking the
oyster, p. 7)
% time for sorting
sorted data relative to R’s sort
sort |
129.9 |
177.8 |
sortunique |
83.8 |
37.6 |
unsorted data relative to R’s sort
sort |
25.5 |
33.9 |
sortunique |
17.4 |
12.0 |
% time for unique
sorted data relative to R
bit |
122.4 |
24.2 |
merge |
22.2 |
9.8 |
sort |
0.0 |
0.0 |
unsorted data relative to R
bit |
121.2 |
16.8 |
merge |
124.4 |
69.4 |
sort |
100.5 |
61.5 |
% time for duplicated
sorted data relative to R
bit |
203.0 |
28.1 |
merge |
23.5 |
8.7 |
sort |
0.0 |
0.0 |
unsorted data relative to R
bit |
165.7 |
17.9 |
merge |
107.7 |
78.2 |
sort |
88.1 |
71.3 |
% time for anyDuplicated
sorted data relative to R
bit |
155.6 |
32.3 |
merge |
18.9 |
3.7 |
sort |
0.0 |
0.0 |
unsorted data relative to R
bit |
144.9 |
18.8 |
merge |
134.9 |
80.0 |
sort |
117.3 |
77.3 |
% time for sumDuplicated
sorted data relative to R
bit |
111.8 |
25.7 |
merge |
15.2 |
5.5 |
sort |
0.0 |
0.0 |
unsorted data relative to R
bit |
96.0 |
17.5 |
merge |
99.8 |
77.2 |
sort |
85.7 |
72.5 |
% time for match
sorted data relative to R
bit |
NA |
NA |
NA |
NA |
merge |
26.1 |
0 |
11.3 |
7.7 |
sort |
0.0 |
0 |
0.0 |
0.0 |
unsorted data relative to R
bit |
NA |
NA |
NA |
NA |
merge |
198.4 |
80.1 |
84.3 |
78.5 |
sort |
171.1 |
80.1 |
81.1 |
73.8 |
% time for in
sorted data relative to R
bit |
164.1 |
6.5 |
14.8 |
23.2 |
merge |
22.4 |
0.0 |
13.3 |
5.5 |
sort |
0.0 |
0.0 |
0.0 |
0.0 |
unsorted data relative to R
bit |
167.5 |
5.6 |
5.3 |
14.3 |
merge |
183.9 |
80.2 |
77.8 |
74.0 |
sort |
161.1 |
80.2 |
73.1 |
70.3 |
% time for notin
sorted data relative to R
bit |
220.9 |
6.8 |
10.4 |
22.0 |
merge |
17.3 |
0.0 |
5.0 |
5.1 |
sort |
0.0 |
0.0 |
0.0 |
0.0 |
unsorted data relative to R
bit |
408.7 |
5.7 |
4.9 |
13.0 |
merge |
161.3 |
80.6 |
64.2 |
66.8 |
sort |
143.0 |
80.6 |
61.9 |
63.4 |
% time for union
sorted data relative to R
bit |
80.8 |
30.1 |
30.2 |
26.3 |
merge |
48.7 |
13.3 |
13.3 |
19.4 |
sort |
0.0 |
0.0 |
0.0 |
0.0 |
unsorted data relative to R
bit |
82.7 |
23.8 |
23.6 |
17.2 |
merge |
136.8 |
69.6 |
69.7 |
67.8 |
sort |
84.6 |
58.1 |
58.2 |
54.2 |
% time for intersect
sorted data relative to R
bit |
49.3 |
13.3 |
5.3 |
14.2 |
merge |
16.1 |
0.0 |
0.0 |
8.1 |
sort |
0.0 |
0.0 |
0.0 |
0.0 |
unsorted data relative to R
bit |
54.2 |
10.3 |
4.2 |
6.8 |
merge |
78.3 |
76.9 |
31.3 |
41.7 |
sort |
61.7 |
76.9 |
31.3 |
36.4 |
% time for setdiff
sorted data relative to R
bit |
41.8 |
6.5 |
20.3 |
12.2 |
merge |
15.1 |
0.0 |
8.5 |
8.4 |
sort |
0.0 |
0.0 |
0.0 |
0.0 |
unsorted data relative to R
bit |
49.1 |
4.9 |
12.4 |
7.5 |
merge |
92.6 |
72.3 |
35.3 |
50.4 |
sort |
74.6 |
72.3 |
29.5 |
44.0 |
% time for symdiff
sorted data relative to R
bit |
36.4 |
12.9 |
12.3 |
10.3 |
merge |
8.3 |
3.9 |
3.7 |
3.3 |
sort |
0.0 |
0.0 |
0.0 |
0.0 |
unsorted data relative to R
bit |
37.0 |
8.1 |
8.5 |
8.0 |
merge |
40.1 |
17.9 |
19.0 |
25.9 |
sort |
31.3 |
15.0 |
15.9 |
23.0 |
% time for setequal
sorted data relative to R
bit |
83.8 |
91.1 |
14.5 |
14.9 |
merge |
21.0 |
17.7 |
6.0 |
6.2 |
sort |
0.0 |
0.0 |
0.0 |
0.0 |
unsorted data relative to R
bit |
86.4 |
71.8 |
8.5 |
9.0 |
merge |
101.4 |
37622.5 |
24.8 |
48.0 |
sort |
80.1 |
37606.7 |
20.2 |
43.2 |
% time for setearly
sorted data relative to R
bit |
78.8 |
4.1 |
18.0 |
17.3 |
merge |
19.9 |
0.0 |
0.1 |
7.2 |
sort |
0.0 |
0.0 |
0.0 |
0.0 |
unsorted data relative to R
bit |
81.4 |
2.4 |
5.4 |
8.0 |
merge |
97.9 |
36.4 |
80.8 |
42.4 |
sort |
77.1 |
36.4 |
80.8 |
38.1 |