You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a 25% perf regression when converting packed_simd to core_simd.
The implementations here the same mutatis mutantis, but their performances are not:
core_simd_min 2^20 f32 [286.86 us 289.22 us 292.03 us]
packed_simd_min 2^20 f32 [230.50 us 234.12 us 238.86 us]
nonsimd_min 2^20 f32 [245.75 us 249.19 us 254.00 us]
naive_min 2^20 f32 [2.8560 ms 2.8721 ms 2.8885 ms]
In particular, it seems more efficient to write the code without std::simd (nonsimd_min) than with it (core_simd_min).
with target-cpu=native:
core_simd_min 2^20 f32 [376.98 us 378.40 us 379.72 us]
packed_simd_min 2^20 f32 [181.77 us 182.95 us 185.05 us]
nonsimd_min 2^20 f32 [185.89 us 186.35 us 186.83 us]
naive_min 2^20 f32 [2.0208 ms 2.0274 ms 2.0341 ms]
which is an even larger difference
Context
We are considering migrating from packed_simd to std::simd and observed this regression on our benchmarks. jorgecarleitao/arrow2#747 for details.
The text was updated successfully, but these errors were encountered:
There is a 25% perf regression when converting
packed_simd
tocore_simd
.The implementations here the same mutatis mutantis, but their performances are not:
In particular, it seems more efficient to write the code without
std::simd
(nonsimd_min) than with it (core_simd_min).with
target-cpu=native
:which is an even larger difference
Context
We are considering migrating from
packed_simd
tostd::simd
and observed this regression on our benchmarks. jorgecarleitao/arrow2#747 for details.The text was updated successfully, but these errors were encountered: