--- title: "Demo of the bit package" author: "Dr. Jens Oehlschlägel" date: '`r Sys.Date()`' output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Demo of the bit package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, echo = FALSE, results = "hide", message = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") library(bit) .ff.is.available = requireNamespace("ff", quietly=TRUE) && packageVersion("ff") >= "4.0.0" if (.ff.is.available) library(ff) #tools::buildVignette("vignettes/bit-demo.Rmd") #devtools::build_vignettes() ``` --- ## bit type Create a huge boolean vector (no NAs allowed) ```{r} n <- 1e8 b1 <- bit(n) b1 ``` It costs only one bit per element ```{r} object.size(b1) / n ``` A couple of standard methods work ```{r} b1[10:30] <- TRUE summary(b1) ``` Create a another boolean vector with TRUE in some different positions ```{r} b2 <- bit(n) b2[20:40] <- TRUE b2 ``` fast boolean operations ```{r} b1 & b2 ``` fast boolean operations ```{r} summary(b1 & b2) ``` ## bitwhich type Since we have a very skewed distribution we may coerce to an even sparser representation ```{r} w1 <- as.bitwhich(b1) w2 <- as.bitwhich(b2) object.size(w1) / n ``` and everything ```{r} w1 & w2 ``` works as expected ```{r} summary(w1 & w2) ``` even mixing ```{r} summary(b1 & w2) ``` ## processing chunks Many bit functions support a range restriction, ```{r} summary(b1, range=c(1, 1000)) ``` which is useful ```{r} as.which(b1, range=c(1, 1000)) ``` for filtered chunked looping ```{r} lapply(chunk(from=1, to=n, length=10), function(i) as.which(b1, range=i)) ``` over large ff vectors ```{r, eval=.ff.is.available} options(ffbatchbytes=1024^3) x <- ff(vmode="single", length=n) x[1:1000] <- runif(1000) lapply(chunk(x, length.out = 10), function(i) sum(x[as.hi(b1, range=i)])) ``` and wrap-up ```{r, eval=.ff.is.available} delete(x) rm(x, b1, b2, w1, w2, n) ``` for more info check the usage vignette