We are doing a re-write of the API.
fastrank
an R package providing fast ranking for integer, numeric, logical
and complex vectors, as an alternative to calling .Internal(rank(...))
, which
packages cannot do. Its API is a bit more restrictive, in the interests of
speed. You cannot sort character
vectors or vectors containing NA
with
fastrank
. if you need these capabilities, use base rank
or convert your
data to a form accepted by fastrank
.
The package provides a general interface via the fastrank
function, a
replacement for the base R rank
. It accepts any of the above accepted
datatypes and any ties.method
:
fastrank(x, ties.method = c("average", "first", "random", "max", "min"))
There are also direct interfaces for specific data types with specific tie-breaking methods, if you can guarantee the data type of your vectors. These are slightly faster for shorter vectors because setup time is reduced:
fastrank_num_avg(x)
For all functions, the ranks of x
are returned in a vector of the same length
as x
. As with rank
, for ties.method = "average"
the vector returned is
numeric
because ranks can be fractional; for all other methods the vector is
integer
.
fastrank
is designed to handle all ties.method
arguments identically to
base rank
. fastrank
handles "first"
and "random"
in C code, so should
be much faster for these. Unfortunately, the "first"
method is not
currently comparable because the sort routine used does not preserve the
order of equivalent items. In future fastrank
may switch to using a stable
sort if "first"
is requested.
No fastrank
entry handles NA
in data, nor do they accept character
vectors for ranking. The Scollate
internal R routines for comparing
character strings using locales is not part of the R API, and it would probably
be a bigger job to provide this than the rest of fastrank
.
For complete benchmarking and performance details, see BENCHMARKING.md.
We are almost as fast as .Internal(rank(...))
for vectors length 10, and the direct routines are about 10% faster than the general routine for short vectors, about 5% faster for 100 vectors, and essentially no difference for 10000 vectors.
> frc = cmpfun(fastrank, options=list(optimize=3))
> y <- y.rev
> microbenchmark(rank_new(y), fastrank(y), frc(y), fastrank_num_avg(y), frcna(y), times=100000)
Unit: nanoseconds
expr min lq mean median uq max neval
rank_new(y) 697 878 1039.175 937 1026 2436410 1e+05
fastrank(y) 1000 1157 1288.889 1229 1331 785916 1e+05
frc(y) 963 1119 1267.990 1192 1291 755307 1e+05
fastrank_num_avg(y) 886 1044 1181.111 1113 1197 2219413 1e+05
frcna(y) 846 1015 1129.037 1084 1172 764383 1e+05
> y <- yy.rev
> microbenchmark(rank_new(y), fastrank(y), frc(y), fastrank_num_avg(y), frcna(y), times=100000)
Unit: microseconds
expr min lq mean median uq max neval
rank_new(y) 2.597 2.906 3.754443 3.029 3.218 10811.45 1e+05
fastrank(y) 1.754 2.017 2.950814 2.145 2.401 11680.76 1e+05
frc(y) 1.714 1.996 3.359293 2.135 2.394 12105.77 1e+05
fastrank_num_avg(y) 1.634 1.906 2.901634 2.022 2.225 10798.09 1e+05
frcna(y) 1.611 1.888 2.698567 2.015 2.224 11076.25 1e+05
> y <- yyy.rev
> microbenchmark(rank_new(y), fastrank(y), frc(y), fastrank_num_avg(y), frcna(y), times=5000)
Unit: microseconds
expr min lq mean median uq max neval
rank_new(y) 277.910 283.8145 311.0948 284.5125 286.2915 5120.002 5000
fastrank(y) 107.366 116.2625 154.6016 117.4350 119.2650 37400.171 5000
frc(y) 107.994 116.2720 147.2273 117.3980 119.1905 4979.048 5000
fastrank_num_avg(y) 107.891 116.2345 142.0935 117.3810 118.9535 5063.963 5000
frcna(y) 107.962 116.2380 136.6245 117.5415 119.0410 4836.530 5000
The motivation for this comes from my development of the nestedRanksTest
package. A standard run with the default 10,000 bootstrap iterations takes
a few seconds to complete on a test data set.
> library(nestedRanksTest)
> data(woodpecker_multiyear)
> adat <- subset(woodpecker_multiyear, Species == "agrifolia")
> system.time(with(adat, nestedRanksTest(y = Distance, x = Year, groups = Granary)))
user system elapsed
5.252 0.067 5.318
Profiling with library(lineprof)
revealed that the bottleneck was in the base
R function rank
. A stripped-down rank_new
that simply calls the
.Internal(rank(...))
routine is 8-9× faster than the default rank for a
vector of 100 values, about 2× faster for 1000-value vectors, and only about
20-30% for 10,000-value vectors.
> library(microbenchmark)
> rank_new <- function (x) .Internal(rank(x, length(x), "average"))
> yy <- rnorm(100)
> microbenchmark(rank(yy), rank_new(yy), times=100000)
Unit: microseconds
expr min lq mean median uq max neval
rank(yy) 29.148 31.945 36.931556 32.678 33.5165 57192.350 1e+05
rank_new(yy) 3.755 4.300 4.789952 4.542 4.7290 6784.741 1e+05
> yyy <- rnorm(1000)
> microbenchmark(rank(yyy), rank_new(yyy), times=100000)
Unit: microseconds
expr min lq mean median uq max neval
rank(yyy) 114.693 118.766 134.38290 120.071 123.598 18734.74 1e+05
rank_new(yyy) 63.150 64.877 67.71052 65.340 67.062 16023.45 1e+05
> yyy <- rnorm(10000)
> microbenchmark(rank(yyy), rank_new(yyy), times=1000)
Unit: microseconds
expr min lq mean median uq max neval
rank(yyy) 1238.485 1263.3515 1361.22 1279.580 1303.5785 6436.039 1000
rank_new(yyy) 955.013 964.9215 1002.81 967.849 992.6315 6072.219 1000
fastrank
uses my own code, with some guidance for the basics of e.g., sorting
routines from Wikipedia. In the course of writing the package I frequently
consulted R source for guidance in writing the C language interface, and when
benchmarking I implemented R's own shellsort for comparison purposes. The
repository history can be consulted for details, and the src/tst
directory
contains some test files.
The sort routine at the heart is Quicksort, modified to operate on a vector of
indices rather than the array of values, and also modified to shortcut to an
insertion sort of vector length equal to or shorter than
QUICKSORT_INSERTION_CUTOFF
, currently set to 20. See benchmarking results
for much more on sort routine selection.
The main sorting and assigning of ranks is coded in C macros, with concrete types and comparison functions supplied via macro arguments. This means that while there is less duplication of code features in the source, there is some duplication in the final object code. This is however likely quite fast, and datatype-specific optimisations can be applied wherever possible. I have not benchmarked my concrete expansions against a more generic approach, but common sense suggests concrete is faster.
I considered using R's own sorting routines, e.g.,
R_orderVector
, especially considering it can handle any type
of atomic SEXP
, but benchmarking presented below demonstrated that it is
slower than Quicksort and other methods. Perhaps I could handle character
data this way? I would like to open up the API further...
I also considered copy in its entirety the internal R function do_rank
within
src/main/sort.c
that is what we reach when doing the .Internal(rank(...))
call. This ultimately proved to be impossible because of the numerous internal
features used that are not part of the R API.
Finally, I considered using C++ and the Rcpp package for this,
using an STL sorting routine which is probably quite comparable in performance
to what I have implemented. However, my reading indicated that using Rcpp
pulls in some heavyweight object code, and I prefer to avoid that.
I still have some performance tweaking to do, but major decisions based on
benchmarking are now completed and the main structure is in place. Actually
this is no longer true, I am looking at a couple of other sorting options for
general sorting, and am looking for a fast stable option for ties.method = "first"
.
Of course I want to squeeze as much time as I can, so need to explore an
updated fastrank_num_avg
since the direct entries should always be fastest,
but there are a few more general points to explore.
- What does GC Torture mean when it comes to benchmarking?
- Can I modify my Quicksort to be stable, for
"first"
? - What about Dual-Pivot Quicksort or Quicksort?
I have created a large set of tests for all ties.method
values that check whether rank
and fastrank
are absolutely identical in their results. So far this is true for "average"
, "max"
, and "min"
, but not for "first"
, because the sort used is not yet stable. The "random"
method needs to be checked, it is hoped we can duplicate rank
's behaviour if the seed is identical beforehand. These have only been checked for numeric
vectors.
> y <- sample(10,10,repl=T)
> y
[1] 8 6 8 3 3 6 3 9 6 6
> rank(y,ties="first")
[1] 8 4 9 1 2 5 3 10 6 7
> y <- as.numeric(y)
> fastrank(y, "first")
[1] 8 4 9 1 2 5 3 10 6 7
>
GPL 2, just like R itself.