Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Offer 'simd' feature for faster folding #146

Merged
merged 5 commits into from
Sep 9, 2024

Conversation

epage
Copy link
Contributor

@epage epage commented Sep 9, 2024

Inspired by https://purplesyringa.moe/blog/i-sped-up-serde-json-strings-by-20-percent/

$ cargo bench && cargo bench -F simd
   Compiling annotate-snippets v0.11.2 (/home/epage/src/personal/annotate-snippets-rs)
    Finished `bench` profile [optimized] target(s) in 0.99s
     Running unittests src/lib.rs (target/release/deps/annotate_snippets-b51bb37991a7f496)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/bench.rs (target/release/deps/bench-468ba612503afee1)
Timer precision: 18 ns
bench         fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ fold                     │               │               │               │         │
│  ├─ 0       1.911 µs      │ 19.44 µs      │ 1.943 µs      │ 2.146 µs      │ 100     │ 100
│  ├─ 1       1.916 µs      │ 3.158 µs      │ 1.973 µs      │ 1.982 µs      │ 100     │ 100
│  ├─ 10      2.121 µs      │ 6.05 µs       │ 2.225 µs      │ 2.281 µs      │ 100     │ 100
│  ├─ 100     3.706 µs      │ 7.007 µs      │ 3.83 µs       │ 3.876 µs      │ 100     │ 100
│  ├─ 1000    19.42 µs      │ 25.61 µs      │ 19.48 µs      │ 19.64 µs      │ 100     │ 100
│  ├─ 10000   111.2 µs      │ 204.2 µs      │ 127 µs        │ 133.6 µs      │ 100     │ 100
│  ╰─ 100000  1.094 ms      │ 1.747 ms      │ 1.137 ms      │ 1.158 ms      │ 100     │ 100
╰─ simple     10.14 µs      │ 40.27 µs      │ 10.5 µs       │ 11.01 µs      │ 100     │ 100

   Compiling annotate-snippets v0.11.2 (/home/epage/src/personal/annotate-snippets-rs)
    Finished `bench` profile [optimized] target(s) in 0.99s
     Running unittests src/lib.rs (target/release/deps/annotate_snippets-9d4024ac94675e6a)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/bench.rs (target/release/deps/bench-d5470149969acbb8)
Timer precision: 13 ns
bench         fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ fold                     │               │               │               │         │
│  ├─ 0       1.164 µs      │ 13.91 µs      │ 1.208 µs      │ 1.408 µs      │ 100     │ 100
│  ├─ 1       1.188 µs      │ 4.289 µs      │ 1.234 µs      │ 1.277 µs      │ 100     │ 100
│  ├─ 10      1.259 µs      │ 3.822 µs      │ 1.319 µs      │ 1.419 µs      │ 100     │ 100
│  ├─ 100     1.312 µs      │ 2.732 µs      │ 1.412 µs      │ 1.519 µs      │ 100     │ 100
│  ├─ 1000    1.917 µs      │ 5.52 µs       │ 2 µs          │ 2.085 µs      │ 100     │ 100
│  ├─ 10000   7.195 µs      │ 29.55 µs      │ 7.325 µs      │ 7.638 µs      │ 100     │ 100
│  ╰─ 100000  59.08 µs      │ 403 µs        │ 61.1 µs       │ 65.52 µs      │ 100     │ 100
╰─ simple     9.92 µs       │ 19.09 µs      │ 10.33 µs      │ 10.91 µs      │ 100     │ 100

The upper bound for this benchmark was taken from https://github.com/crate-ci/typos/blob/786c825f1753f87b02d24770e3f4ec8043d9084a/crates/typos-dict/src/word_codegen.rs

It would be nice to run tests but divan is getting in the way
```console
$ cargo bench && cargo bench -F simd
   Compiling annotate-snippets v0.11.2 (/home/epage/src/personal/annotate-snippets-rs)
    Finished `bench` profile [optimized] target(s) in 0.99s
     Running unittests src/lib.rs (target/release/deps/annotate_snippets-b51bb37991a7f496)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/bench.rs (target/release/deps/bench-468ba612503afee1)
Timer precision: 18 ns
bench         fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ fold                     │               │               │               │         │
│  ├─ 0       1.911 µs      │ 19.44 µs      │ 1.943 µs      │ 2.146 µs      │ 100     │ 100
│  ├─ 1       1.916 µs      │ 3.158 µs      │ 1.973 µs      │ 1.982 µs      │ 100     │ 100
│  ├─ 10      2.121 µs      │ 6.05 µs       │ 2.225 µs      │ 2.281 µs      │ 100     │ 100
│  ├─ 100     3.706 µs      │ 7.007 µs      │ 3.83 µs       │ 3.876 µs      │ 100     │ 100
│  ├─ 1000    19.42 µs      │ 25.61 µs      │ 19.48 µs      │ 19.64 µs      │ 100     │ 100
│  ├─ 10000   111.2 µs      │ 204.2 µs      │ 127 µs        │ 133.6 µs      │ 100     │ 100
│  ╰─ 100000  1.094 ms      │ 1.747 ms      │ 1.137 ms      │ 1.158 ms      │ 100     │ 100
╰─ simple     10.14 µs      │ 40.27 µs      │ 10.5 µs       │ 11.01 µs      │ 100     │ 100

   Compiling annotate-snippets v0.11.2 (/home/epage/src/personal/annotate-snippets-rs)
    Finished `bench` profile [optimized] target(s) in 0.99s
     Running unittests src/lib.rs (target/release/deps/annotate_snippets-9d4024ac94675e6a)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/bench.rs (target/release/deps/bench-d5470149969acbb8)
Timer precision: 13 ns
bench         fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ fold                     │               │               │               │         │
│  ├─ 0       1.164 µs      │ 13.91 µs      │ 1.208 µs      │ 1.408 µs      │ 100     │ 100
│  ├─ 1       1.188 µs      │ 4.289 µs      │ 1.234 µs      │ 1.277 µs      │ 100     │ 100
│  ├─ 10      1.259 µs      │ 3.822 µs      │ 1.319 µs      │ 1.419 µs      │ 100     │ 100
│  ├─ 100     1.312 µs      │ 2.732 µs      │ 1.412 µs      │ 1.519 µs      │ 100     │ 100
│  ├─ 1000    1.917 µs      │ 5.52 µs       │ 2 µs          │ 2.085 µs      │ 100     │ 100
│  ├─ 10000   7.195 µs      │ 29.55 µs      │ 7.325 µs      │ 7.638 µs      │ 100     │ 100
│  ╰─ 100000  59.08 µs      │ 403 µs        │ 61.1 µs       │ 65.52 µs      │ 100     │ 100
╰─ simple     9.92 µs       │ 19.09 µs      │ 10.33 µs      │ 10.91 µs      │ 100     │ 100
```
@epage epage force-pushed the bench branch 2 times, most recently from 2eaf9bb to e8ce092 Compare September 9, 2024 15:15
Copy link
Member

@Muscraft Muscraft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Muscraft Muscraft merged commit babde1c into rust-lang:master Sep 9, 2024
14 of 15 checks passed
@epage epage deleted the bench branch September 9, 2024 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants