rsampling

Simple reservoir sampling for the command line. Works on lines.

An algorithm for randomly choosing a sample of k items from a list S containing n items, where n is either a very large or unknown number. Typically n is large enough that the list doesn't fit into main memory.

Original paper: Random Sampling with a Reservoir (1985)

Install

$ go install github.com/miku/rsampling/cmd/...@latest

Or some release.

Usage

$ rsampling -h
Usage of rsampling:
  -r int
        random seed (default 945553340)
  -n int
        number of samples to obtain (default 16)

$ seq 100 | rsampling -n 8
71
21
83
39
81
47
89
77

Performance

Sometimes a

sort -R | head -10

is fast enough, but sometimes it is not:

$ time sort -R fixtures/b | head -10
630163
119338
52446
774497
425201
849943
735918
964258
492182
20209

real    0m19.137s
user    1m12.019s
sys     0m0.205s

$ time rsampling -n 10 < fixtures/b
624611
645730
368278
871774
388413
546190
605655
178735
16707
298563

real    0m0.187s
user    0m0.168s
sys     0m0.013s

Comparison to sort and shuf

Options:

$ seq 100 | rsampling -n 16
$ seq 100 | sort -R | head -16
$ seq 100 | shuf -n 16

Sort, shuf and rsampling.

Shuf and rsampling, small subset.

Shuf and rsampling, larger subset.

Visual impression of reservoir sampling:

Misc

Optimization oppertunities.

From bufio.Reader to bufio.Scanner

Scanner can be faster, since it does not need to allocate memory to return a value.

Using a scanner, we get closer to shuf performance.

Sort, shuf and rsampling.

Shuf and rsampling, small subset.

Shuf and rsampling, larger subset.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
cmd		cmd
fixtures		fixtures
images		images
packaging		packaging
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
chart.py		chart.py
go.mod		go.mod

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rsampling

Install

Usage

Performance

Comparison to sort and shuf

Misc

From bufio.Reader to bufio.Scanner

About

Releases 2

Packages

Languages

License

miku/rsampling

Folders and files

Latest commit

History

Repository files navigation

rsampling

Install

Usage

Performance

Comparison to sort and shuf

Misc

From bufio.Reader to bufio.Scanner

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages