Skip to content

v1.2.3 tsv-sample: Faster sampling algorithms

Compare
Choose a tag to compare
@jondegenhardt jondegenhardt released this 21 Oct 02:07
· 193 commits to master since this release
v1.2.3
3c827fb

This release add several new sampling algorithms that improve runtime performance and memory utilization for a number of sampling use-cases. There are no new forms of sampling, just additional algorithms. The new algorithms:

  • A skip sampling implementation of Bernoulli sampling.
  • An implementation of reservoir sampling "Algorithm R" used for unweighted random sampling.
  • A line order randomization algorithm based on array shuffling.

Formal performance benchmarks have not been run. However, tests run on Mac OS as part of development show favorable results relative to other available tools, including GNU shuf.

To download and unpack the prebuilt binaries:

$ # Linux
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.2.3/tsv-utils-v1.2.3_linux-x86_64_ldc2.tar.gz | tar xz

$ # MacOS
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.2.3/tsv-utils-v1.2.3_osx-x86_64_ldc2.tar.gz | tar xz