Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No performance improvements with -Ctarget-feature=+neon on arm64 #360

Closed
Slixe opened this issue Jun 17, 2024 · 6 comments
Closed

No performance improvements with -Ctarget-feature=+neon on arm64 #360

Slixe opened this issue Jun 17, 2024 · 6 comments

Comments

@Slixe
Copy link

Slixe commented Jun 17, 2024

Hey there, thanks for this amazing repo!

I'm currently using chacha20 crate, unfortunately, on arm64 CPU I'm getting poor performances with and without -Ctarget-feature=+neon.

test chacha12_bench1_16b   ... bench:          34.15 ns/iter (+/- 1.03) = 470 MB/s
test chacha12_bench2_256b  ... bench:         417.99 ns/iter (+/- 16.83) = 613 MB/s
test chacha12_bench3_1kib  ... bench:       1,653.14 ns/iter (+/- 39.99) = 619 MB/s
test chacha12_bench4_16kib ... bench:      26,417.67 ns/iter (+/- 919.50) = 620 MB/s
test chacha20_bench1_16b   ... bench:          48.31 ns/iter (+/- 0.44) = 333 MB/s
test chacha20_bench2_256b  ... bench:         704.38 ns/iter (+/- 53.47) = 363 MB/s
test chacha20_bench3_1kib  ... bench:       2,710.32 ns/iter (+/- 109.84) = 377 MB/s
test chacha20_bench4_16kib ... bench:      43,389.69 ns/iter (+/- 2,237.09) = 377 MB/s
test chacha8_bench1_16b    ... bench:          26.71 ns/iter (+/- 2.61) = 615 MB/s
test chacha8_bench2_256b   ... bench:         284.31 ns/iter (+/- 12.14) = 901 MB/s
test chacha8_bench3_1kib   ... bench:       1,121.32 ns/iter (+/- 50.64) = 913 MB/s
test chacha8_bench4_16kib  ... bench:      17,929.43 ns/iter (+/- 378.40) = 913 MB/s

Am I missing something ? Thanks in advance :)

@newpavlov
Copy link
Member

newpavlov commented Jun 17, 2024

With chacha20 v0.9 you need to also provide the chacha20_force_neon configuration flag described in the docs (note that it's NOT a crate feature). In v0.10 it probably will work as expected without providing this configuration flag (though we haven't implemented it yet).

@Slixe
Copy link
Author

Slixe commented Jun 17, 2024

Mea culpa for missing that! I was using a clone version (so v0.10.0-pre) and expected to work directly.

I just followed the README.md from https://crates.io/crates/chacha20

With chacha20_force_neon I get the following now:

test chacha12_bench1_16b   ... bench:          16.14 ns/iter (+/- 0.73) = 1000 MB/s
test chacha12_bench2_256b  ... bench:         111.41 ns/iter (+/- 3.24) = 2306 MB/s
test chacha12_bench3_1kib  ... bench:         440.84 ns/iter (+/- 11.52) = 2327 MB/s
test chacha12_bench4_16kib ... bench:       7,054.65 ns/iter (+/- 344.24) = 2322 MB/s
test chacha20_bench1_16b   ... bench:          29.74 ns/iter (+/- 1.23) = 551 MB/s
test chacha20_bench2_256b  ... bench:         187.54 ns/iter (+/- 4.40) = 1368 MB/s
test chacha20_bench3_1kib  ... bench:         746.00 ns/iter (+/- 6.39) = 1372 MB/s
test chacha20_bench4_16kib ... bench:      11,934.89 ns/iter (+/- 130.76) = 1372 MB/s
test chacha8_bench1_16b    ... bench:          11.72 ns/iter (+/- 0.46) = 1454 MB/s
test chacha8_bench2_256b   ... bench:          73.39 ns/iter (+/- 2.77) = 3506 MB/s
test chacha8_bench3_1kib   ... bench:         294.11 ns/iter (+/- 9.41) = 3482 MB/s
test chacha8_bench4_16kib  ... bench:       4,701.54 ns/iter (+/- 167.79) = 3485 MB/s

I'm asking maybe too much but, can't the feature be directly detected and enabled ?

@newpavlov
Copy link
Member

Adding compile-time detection should be relatively easy and we should do it for v0.10 (we essentially just need to tweak our cfgs a bit). This should be sufficient in practice for AArch64 as discussed here.

@tarcieri
Copy link
Member

It was previously nightly-only but looks like it works on stable now, so yeah, we should be able to enable it unconditionally on aarch64 targets

@Slixe
Copy link
Author

Slixe commented Jun 17, 2024

Great to hear! Well, I have everything I need, thanks a lot for your hard work guys, I close the issue.

@Slixe Slixe closed this as completed Jun 17, 2024
@tarcieri
Copy link
Member

Here's a PR to enable it by default: #361

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants