Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: use uint MAX_INDEX on arm / aarch64 for uint SIMD #54

Merged
merged 1 commit into from
Apr 19, 2023
Merged

Conversation

jvdd
Copy link
Owner

@jvdd jvdd commented Apr 18, 2023

Enhances the performance for arm / aarch64 uint operations by using uint as MAX_INDEX instead of int.
This results in 2x fewer exits from the inner SIMD loop (and thus 2x less - much slower - horizontal SIMD ops) 🎉

Illustration of the performance gain

main branch ⬇️

scalar_u8_argminmax     time:   [274.45 µs 274.47 µs 274.48 µs]
scalar_u8_argmin        time:   [274.52 µs 274.53 µs 274.54 µs]
scalar_u8_argmax        time:   [274.51 µs 274.52 µs 274.54 µs]
neon_u8_argminmax       time:   [58.060 µs 58.072 µs 58.089 µs]
neon_u8_argmin          time:   [38.840 µs 38.843 µs 38.846 µs]
neon_u8_argmax          time:   [38.836 µs 38.837 µs 38.840 µs]
impl_u8_argminmax       time:   [58.059 µs 58.062 µs 58.065 µs]
impl_u8_argmin          time:   [38.869 µs 38.870 µs 38.871 µs]
impl_u8_argmax          time:   [38.873 µs 38.886 µs 38.912 µs]

this PR ⬇️

scalar_u8_argminmax     time:   [274.43 µs 274.44 µs 274.44 µs]
scalar_u8_argmin        time:   [274.53 µs 274.54 µs 274.54 µs]
scalar_u8_argmax        time:   [274.49 µs 274.50 µs 274.52 µs]
neon_u8_argminmax       time:   [39.193 µs 39.194 µs 39.195 µs]
neon_u8_argmin          time:   [29.295 µs 29.295 µs 29.296 µs]
neon_u8_argmax          time:   [29.298 µs 29.301 µs 29.307 µs]
impl_u8_argminmax       time:   [39.194 µs 39.195 µs 39.196 µs]
impl_u8_argmin          time:   [29.299 µs 29.300 µs 29.300 µs]
impl_u8_argmax          time:   [29.299 µs 29.300 µs 29.301 µs]

We see here 30-40% better performance. This improvement becomes smaller when increasing in bit-size of the underlying uint datatype.

@codspeed-hq
Copy link

codspeed-hq bot commented Apr 18, 2023

CodSpeed Performance Report

Merging #54 max_index_arm (4a59183) will not alter performances.

Summary

🔥 0 improvements
❌ 0 regressions
✅ 168 untouched benchmarks

🆕 0 new benchmarks
⁉️ 0 dropped benchmarks

@jvdd jvdd changed the title perf: use uint max_index on arm / aarch64 for uint SIMD perf: use uint MAX_INDEX on arm / aarch64 for uint SIMD Apr 18, 2023
@jvdd jvdd merged commit 032d085 into main Apr 19, 2023
@jvdd jvdd deleted the max_index_arm branch June 3, 2023 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant