add runtime dataset fetch and parse in-place #186

saik0 · 2022-02-09T05:38:20Z

Closes #129
Closes #171
Closes #185

Here's my go at fetching the datasets at runtime

Datasets are lazily fetched the first time they're needed (or updated, if local HEAD != origin/master).
The zip files are parsed-in place on every benchmark run, to keep the on-disk size down.
The parsing is also lazy, and happens at most once.
This PR updates any benchmarks that were already using limited data from wikileaks-noquotes to use all the datasets.
A fast follow PR will update all the benchmarks.

@Kerollmops Third times the charm?

Kerollmops

Thank you very much, that is quite good! Even the little indicatif progress bar ❤️

benchmarks/.gitignore

benchmarks/benches/datasets.rs

benchmarks/build.rs

benchmarks/Cargo.toml

benchmarks/benches/lib.rs

saik0 · 2022-02-09T11:46:42Z

CI is failing in my branch from git. Investigating.

.github/workflows/test.yml

Kerollmops

Is it ready for review? Do you think I can't take a last look at your PR?

.github/workflows/test.yml

saik0 · 2022-02-09T13:41:30Z

Is it ready for review? Do you think I can't take a last look at your PR?

If CI is ✅ then yes.

Kerollmops

Your PR looks good to me!

.github/workflows/test.yml

benchmarks/Cargo.toml

benchmarks/benches/datasets.rs

Co-authored-by: Clément Renault <renault.cle@gmail.com>

Kerollmops

Ok, so ready to be merged, thank you again for your work and the time spent on this!
bors merge

bors · 2022-02-09T15:10:18Z

Build succeeded:

ci (stable)

186: add runtime dataset fetch and parse in-place r=Kerollmops a=saik0 Closes RoaringBitmap#129 Closes RoaringBitmap#171 Closes RoaringBitmap#185 Here's my go at fetching the datasets at runtime * Datasets are lazily fetched the first time they're needed (or updated, if local `HEAD != origin/master`). * The zip files are parsed-in place on every benchmark run, to keep the on-disk size down. * The parsing is also lazy, and happens at most once. * This PR updates any benchmarks that were already using limited data from `wikileaks-noquotes` to use all the datasets. * A fast follow PR will update all the benchmarks. `@Kerollmops` Third times the charm? Co-authored-by: saik0 <github@saik0.net> Co-authored-by: Joel Pedraza <github@saik0.net>

saik0 added 2 commits February 8, 2022 21:35

add runtime dataset fetch and parse in-place

d1fa477

remove build-deps and build.rs

7461720

saik0 force-pushed the bench-datasets-fetch-runtime branch from dffba38 to 7461720 Compare February 9, 2022 08:03

saik0 added 3 commits February 9, 2022 02:05

cleanup benchmark dependencies

37c10a3

fix benchmark warnings

7da34b7

remove datasets_paths.rs file

1f6cc64

Kerollmops requested changes Feb 9, 2022

View reviewed changes

saik0 added 2 commits February 9, 2022 03:12

add trailing newline to benchmark gitignore

6b420e0

remove benchmark src dir

e846983

add benchmark offline mode. use by CI.

684b4b2

saik0 commented Feb 9, 2022

View reviewed changes

.github/workflows/test.yml Show resolved Hide resolved

add trailing newline to test.yml

eb3de40

saik0 commented Feb 9, 2022

View reviewed changes

.github/workflows/test.yml Show resolved Hide resolved

Kerollmops approved these changes Feb 9, 2022

View reviewed changes

.github/workflows/test.yml Show resolved Hide resolved

saik0 added 3 commits February 9, 2022 05:34

add fetch step to ci

a6e88f4

remove all-targets arg from fetch step

3c63761

add vendored ssl to git deps

b99c445

Kerollmops requested changes Feb 9, 2022

View reviewed changes

.github/workflows/test.yml Show resolved Hide resolved

.github/workflows/test.yml Outdated Show resolved Hide resolved

benchmarks/Cargo.toml Show resolved Hide resolved

benchmarks/benches/datasets.rs Outdated Show resolved Hide resolved

saik0 and others added 2 commits February 9, 2022 06:30

add --benches flag to simd benchmark tests

68de418

Update benchmarks/benches/datasets.rs

dbfe350

Co-authored-by: Clément Renault <renault.cle@gmail.com>

Kerollmops approved these changes Feb 9, 2022

View reviewed changes

bors bot merged commit 31ed4ca into RoaringBitmap:master Feb 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add runtime dataset fetch and parse in-place #186

add runtime dataset fetch and parse in-place #186

saik0 commented Feb 9, 2022 •

edited by Kerollmops

Loading

Kerollmops left a comment

saik0 commented Feb 9, 2022

Kerollmops left a comment

saik0 commented Feb 9, 2022

Kerollmops left a comment

Kerollmops left a comment

bors bot commented Feb 9, 2022

add runtime dataset fetch and parse in-place #186

add runtime dataset fetch and parse in-place #186

Conversation

saik0 commented Feb 9, 2022 • edited by Kerollmops Loading

Kerollmops left a comment

Choose a reason for hiding this comment

saik0 commented Feb 9, 2022

Kerollmops left a comment

Choose a reason for hiding this comment

saik0 commented Feb 9, 2022

Kerollmops left a comment

Choose a reason for hiding this comment

Kerollmops left a comment

Choose a reason for hiding this comment

bors bot commented Feb 9, 2022

saik0 commented Feb 9, 2022 •

edited by Kerollmops

Loading