The std::vector
container is the cornerstone of the C++ Standard
Library. When used with Rcpp however, there is an inherent cost in using
std::vector
as memory must be copied both from and to R internal data
structures. The RcppStdVector package allows users to write Rcpp code
using the Standard Library std::vector
container without incurring an
extra container copy operation. Consider the following:
std::vector<double>
my_func(std::vector<double>& x) { // copy from REALSXP to std::vector here
...
return x; // additional copy from std::vector to REALSXP here
}
For large vectors, this entails a large overhead. Rcpp native vectors
avoid these copies by holding an SEXP
, which will be only copied on
demand. Recall however that unless an Rcpp vector is duplicated,
functions can modify the vector in-place, which violates the
no-side-effects rule. This can be useful or problematic in cases where
the SEXP
is aliased to several R variables. Most modifying Rcpp
functions will therefore usually implement cloning the input vector.
Similarly, using RcppStdVector, we can write:
RcppStdVector::std_vec_real
my_func(RcppStdVector::std_vec_real& x) { // copy here from REALSXP to std::vector
...
return x; // zero-overhead, no-copy return of REALSXP
}
The zero-overhead return is achieved by using a custom allocator that
allocates an R vector instead of general heap memory. This R vector is
simply retrieved from the allocator at the end. Special consideration is
taken when the capacity of the std::vector
is greater than its size so
that there are no dangling elements. It is furthermore possible to
construct a standard vector in-place over an R vector via placement new.
void my_func(RcppStandardVector::std_ivec_int& x) { // no copy
// modify elements directly
}
The caveat to this is that any operation that causes allocation could lead to undefined behavior as the in-place allocator does not actually allocate and delete. Operations that grow the vector will throw an error. Provided overloads for assignment and copy construction will also throw.
There are several reasons one might wish to use std::vector
over the
native Rcpp vector types. First, as part of the Standard Library,
std::vector
is simple and well-documented. Compare
this to
this.
Second, using std::vector
everywhere minimizes code rewrites when
moving between other C++ codebases and Rcpp. Third, std::vector
supports amortized-constant appending elements. This is a very common
idiom in C++ coding and its absense in Rcpp is a source of frustration.
Consider the following functions:
SEXP test_copy_chr(Rcpp::CharacterVector& x) { // no copy
RcppStdVector::std_vec_chr y; // this IS a std::vector
auto oi = std::back_inserter(y); // amortized-constant insertion
std::copy(x.begin(), x.end(), oi);
return Rcpp::wrap(y); // no copy
}
SEXP test_copy_rcpp_chr(Rcpp::CharacterVector& x) { // no copy
Rcpp::CharacterVector y; // Rcpp type
auto oi = std::back_inserter(y); // copies entire vector on each insertion
std::copy(x.begin(), x.end(), oi);
return Rcpp::wrap(y); // no copy
}
On my system, I get:
> microbenchmark(r1 <- test_copy_chr(x), r2 <- test_copy_rcpp_chr(x))
Unit: microseconds
expr min lq mean median uq max
r1 <- test_copy_chr(x) 94.811 102.117 124.4515 125.6335 129.7775 218.154
r2 <- test_copy_rcpp_chr(x) 935634.546 975367.489 1012093.6482 996092.4005 1023481.6770 1228233.548
where x
contains 1e4 character strings. The sole drawback of this
idiom is that up to 50% of the memory allocated to the std::vector
may
be unused.
RcppStdVector supports list and string-list types. For example the following copies a list of elements and duplicates each.
template <int RTYPE>
void dup_vec(SEXP& s) {
auto y = RcppStdVector::from_sexp<RTYPE>(s);
auto oi = std::back_inserter(y);
y.reserve(2 * y.size());
std::copy(RcppStdVector::begin<RTYPE>(s),
RcppStdVector::end<RTYPE>(s), oi);
s = PROTECT(Rcpp::wrap(y));
}
// [[Rcpp::export]]
SEXP test_double_it(RcppStdVector::std_vec_sxp& x) {
for (auto s = x.begin(); s != x.end(); ++s) {
switch(TYPEOF(*s)) {
case REALSXP: dup_vec<REALSXP>(*s); break;
case INTSXP: dup_vec<INTSXP>(*s); break;
case LGLSXP: dup_vec<LGLSXP>(*s); break;
case STRSXP: dup_vec<STRSXP>(*s); break;
case VECSXP: dup_vec<VECSXP>(*s); break;
default: Rcpp::stop("Unsupported column type");
}
}
UNPROTECT(x.size());
return Rcpp::wrap(x);
}
The reason for calling PROTECT
is because when the vector y
goes out
of scope, its internal SEXP
will be unprotected. I get the following:
> test_double_it(list(1:3, letters[1:3], (1:3) > 2, pi))
[[1]]
[1] 1 2 3 1 2 3
[[2]]
[1] "a" "b" "c" "a" "b" "c"
[[3]]
[1] FALSE FALSE TRUE FALSE FALSE TRUE
[[4]]
[1] 3.141593 3.141593
The following code demonstrates placement new construction of a
std::vector
over the memory owned by an R vector.
void test_inplace(RcppStdVector::std_ivec_int& x) {
std::transform(std::begin(x), std::end(x),
std::begin(x), [](int a){ return 2 * a; });
}
This function doubles each element of an R vector without ever allocating.
> x <- 1:10
> test_inplace(x)
> x
[1] 2 4 6 8 10 12 14 16 18 20
>
The basic idea of this experiment is to use the standard library components by adapting them to work on top of R’s memory system. This is what I always wanted from Rcpp: a thin lightweight layer that enables use of standard C++.