These are loose scripts not connected with the package. The eddington.r file may be useful in that it calculates the Eddington number in pure R. However, it is much slower than functions in the package available on CRAN. Additionally, the Rcpp version in this folder uses the slower algorithm, whereas the R package uses the optimized Rcpp code and is much faster.
Another approach would be to use the Eddington
R6 class. This has the advantage of
maintaining state, so that updates can be applied as new data comes in. However,
it is still much slower than the optimized Rcpp code included in the
package, such that the Rcpp code could recompute the data several times in
the span of time it would take for the R6 object to compute the data once.
On the other hand, there is an experimental optimized R6 class using C++ code in the package available on the R6 branch. You will need a development environment to build/install it.
This folder is an R package. To install, use devtools
:
if (!require(devtools)) install.packages("devtools")
devtools::install_github("pegeler/eddington2/R/package")
NOTE: This package uses Rcpp and so needs a development environment. If you do not have a development environment and want to calculate an Eddington number, use the eddington.r script in loose/. This computes the number using base R.
Here I'm comparing the optimized Rcpp code used in the package (E_num()
) with
the base R code found in the loose scripts folder (Eddington_number()
) using
microbenchmark. The dataset
is the mock data found in this repo.
library(microbenchmark)
rides <- as.numeric(readLines("../mock-data/rides.dat"))
microbenchmark(E_num(rides), Eddington_number(rides))
## Unit: microseconds
## expr min lq mean median uq max neval cld
## E_num(rides) 4.737 5.3665 6.88629 7.349 7.707 17.760 100 a
## Eddington_number(rides) 100.293 101.8555 105.97427 103.186 105.504 277.938 100 b
You can see that the difference in median times is about 96 microseconds on my machine, or a factor of 14.
I was inspired to create the new algorithm because of my interest in computing
a cumulative Eddington number on larger datasets. Let's define a cumulative
Eddington number in base R, called E_cumR()
and compare it to the E_cum()
function included in the package.
E_cumR <- function(rides) {
sapply(seq_along(rides), function(i) Eddington_number(rides[seq_len(i)])
}
microbenchmark(E_cum(rides), E_cumR(rides))
## Unit: microseconds
## expr min lq mean median uq max neval cld
## E_cum(rides) 4.802 5.742 13.30837 16.8805 19.151 42.317 100 a
## E_cumR(rides) 9783.453 9861.086 10471.62408 9903.3595 10095.600 19901.871 100 b
You can see that the median time is now 587x faster using the package code over our function defined in base R. That said, 587x faster is still the difference of 9900 microseconds over 100 rides. So really in most cases, it's just as good to use the R code if there is a level of convenience associated with it. In fact, the time taken to type this sentence is probably more than all the time savings I'll ever realize by developing the better algorithm!