Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Migrated to portable simd #747

Merged
merged 5 commits into from
Mar 5, 2022
Merged

Migrated to portable simd #747

merged 5 commits into from
Mar 5, 2022

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Jan 9, 2022

This PR replaces the packed_simd dependency by the nightly-available std::simd (aka portable simd).

Closes #580

@codecov
Copy link

codecov bot commented Jan 9, 2022

Codecov Report

Merging #747 (db5cdce) into main (1b7e8ee) will decrease coverage by 1.02%.
The diff coverage is 60.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #747      +/-   ##
==========================================
- Coverage   71.52%   70.49%   -1.03%     
==========================================
  Files         337      343       +6     
  Lines       18443    18735     +292     
==========================================
+ Hits        13191    13207      +16     
- Misses       5252     5528     +276     
Impacted Files Coverage Δ
src/types/simd/mod.rs 90.90% <ø> (ø)
src/types/simd/packed.rs 0.00% <0.00%> (ø)
src/compute/aggregate/min_max.rs 65.90% <100.00%> (ø)
src/compute/aggregate/simd/native.rs 94.11% <100.00%> (ø)
src/compute/arithmetics/time.rs 25.68% <0.00%> (-0.92%) ⬇️
src/io/odbc/read/deserialize.rs 0.00% <0.00%> (ø)
src/io/odbc/write/serialize.rs 0.00% <0.00%> (ø)
src/io/odbc/read/schema.rs 0.00% <0.00%> (ø)
src/io/odbc/write/schema.rs 0.00% <0.00%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1b7e8ee...db5cdce. Read the comment docs.

@jorgecarleitao jorgecarleitao force-pushed the portable_simd branch 3 times, most recently from e1e7af8 to 099d5ed Compare January 9, 2022 20:30
@jorgecarleitao jorgecarleitao marked this pull request as ready for review January 9, 2022 20:33
@jorgecarleitao
Copy link
Owner Author

jorgecarleitao commented Jan 9, 2022

Compiling and tests passing 🎉🎉🎉🎉🎉

Unfortunately, we do have major regressions:

git checkout main
cargo bench --bench aggregate --features compute_aggregate,benchmarks,simd -- "2\^20"
cargo bench --bench comparison_kernels --features compute_comparison,benchmarks,simd -- "2\^20"
git checkout portable_simd
cargo bench --bench aggregate --features compute_aggregate,benchmarks,simd -- "2\^20"
cargo bench --bench comparison_kernels --features compute_comparison,benchmarks,simd -- "2\^20"
sum 2^20 f32            time:   [176.60 us 176.83 us 177.07 us]                         
                        change: [+13.785% +16.097% +18.402%] (p = 0.00 < 0.05)
min 2^20 f32            time:   [301.82 us 303.29 us 304.81 us]                         
                        change: [+23.534% +24.444% +25.363%] (p = 0.00 < 0.05)
sum null 2^20 f32       time:   [212.85 us 214.04 us 215.33 us]                              
                        change: [-5.2111% -4.5145% -3.7872%] (p = 0.00 < 0.05)
min null 2^20 f32       time:   [474.66 us 477.09 us 479.59 us]                              
                        change: [+8.4680% +9.2737% +10.047%] (p = 0.00 < 0.05)
f32 2^20                time:   [2.1006 ms 2.1116 ms 2.1240 ms]                      
                        change: [+437.37% +440.05% +443.38%] (p = 0.00 < 0.0
f32 scalar 2^20         time:   [2.1254 ms 2.1360 ms 2.1466 ms]                             
                        change: [+961.65% +968.65% +974.66%] (p = 0.00 < 0.0
bool 2^20               time:   [189.05 us 190.03 us 191.03 us]                      
                        change: [-11.617% -10.306% -9.1736%] (p = 0.00 < 0.05)
bool scalar 2^20        time:   [31.371 us 31.527 us 31.687 us]                              
                        change: [-1.3276% -0.4781% +0.3641%] (p = 0.26 > 0.05)

@ritchie46
Copy link
Collaborator

Man, those are huge regressions :(. Given that it compiles, maybe just wait a bit and do a benchmark once in a while?

@jorgecarleitao
Copy link
Owner Author

I agree that we should not merge with these numbers. I will reach the working group for feedback and advice.

@jorgecarleitao jorgecarleitao force-pushed the portable_simd branch 3 times, most recently from 8badb7d to 81aacde Compare January 10, 2022 05:38
@jorgecarleitao jorgecarleitao marked this pull request as draft January 11, 2022 07:22
@jorgecarleitao
Copy link
Owner Author

Converted to draft to indicate that this is blocked by a perf. regression.

@jorgecarleitao
Copy link
Owner Author

jorgecarleitao commented Jan 13, 2022

@jhorstmann
Copy link
Contributor

FYI, I also experimented with portable_simd today and created an llvm issue about a missing/incorrect optimization with from_bitmask on non-avx512 machines.

@jorgecarleitao
Copy link
Owner Author

Wow, awesome finding and summary @jhorstmann - thank you for reporting it over there 🙇

@ritchie46
Copy link
Collaborator

What do you think of keeping this one aside of current SIMD (until the performance is comparable)? Then there is already SIMD support for the stable compiler. And can people opt in to the nightly version for the maximal performance.

@jorgecarleitao
Copy link
Owner Author

:) portable simd released the required APIs. I am re-benching this as of now to evaluate where we at wrt to performance. :)

@ritchie46
Copy link
Collaborator

🤞

@jorgecarleitao
Copy link
Owner Author

There are still 2 regressions, so this continues blocked.

sum 2^20 f32            time:   [164.88 us 165.03 us 165.23 us]                         
                        change: [-1.3535% -0.2982% +0.8551%] (p = 0.62 > 0.05)
min 2^20 f32            time:   [211.05 us 212.38 us 214.02 us]                         
                        change: [-0.9052% +0.2757% +1.3189%] (p = 0.64 > 0.05)
sum 2^20 i32            time:   [167.82 us 168.38 us 169.12 us]                         
min 2^20 i32            time:   [165.78 us 166.48 us 167.59 us]                         
sum null 2^20 f32       time:   [858.64 us 860.01 us 861.73 us]                              
                        change: [+352.55% +356.83% +360.89%] (p = 0.00 < 0.05)
min null 2^20 f32       time:   [874.36 us 877.98 us 883.40 us]                              
                        change: [+128.24% +131.57% +136.81%] (p = 0.00 < 0.05)
f32 2^20                time:   [349.08 us 354.17 us 364.40 us]                     
                        change: [-3.7451% -2.3785% -0.2625%] (p = 0.00 < 0.05)
f32 scalar 2^20         time:   [201.41 us 202.66 us 204.41 us]                            
                        change: [-2.4794% -1.2193% -0.0931%] (p = 0.05 < 0.05)
bool 2^20               time:   [179.35 us 180.41 us 182.26 us]                      
                        change: [+4.1593% +5.9573% +8.4415%] (p = 0.00 < 0.05)
bool scalar 2^20        time:   [26.768 us 26.920 us 27.116 us]                              
                        change: [+0.2871% +1.5636% +2.6602%] (p = 0.00 < 0.05)
utf8 2^20               time:   [6.6215 ms 6.6433 ms 6.6747 ms]                      
                        change: [-3.8776% -3.2642% -2.6263%] (p = 0.00 < 0.05)                    

@jorgecarleitao
Copy link
Owner Author

actually, the regression was only from the last commit, which tries to use the intrinsics for the from_bitmask. Without that, there is no regression. Just reverted that one and will post the updated benches.

@jorgecarleitao
Copy link
Owner Author

There is some noise, but the overall is that we are now equal to packed simd:

sum 2^20 f32            time:   [166.41 us 167.83 us 169.74 us]                         
                        change: [-0.0800% +1.1867% +3.0513%] (p = 0.11 > 0.05)
min 2^20 f32            time:   [210.90 us 211.92 us 213.13 us]                         
                        change: [+0.2811% +1.5970% +2.8225%] (p = 0.02 < 0.05)
sum 2^20 i32            time:   [167.38 us 167.71 us 168.16 us]                         
                        change: [-6.1677% -4.9397% -3.5768%] (p = 0.00 < 0.05)
min 2^20 i32            time:   [198.58 us 199.09 us 199.78 us]                         
                        change: [+14.587% +16.306% +18.393%] (p = 0.00 < 0.05)
sum null 2^20 f32       time:   [186.71 us 188.37 us 190.54 us]                              
                        change: [-1.6255% +0.2482% +2.1276%] (p = 0.81 > 0.05)
min null 2^20 f32       time:   [368.12 us 370.28 us 373.81 us]                              
                        change: [-4.1516% -3.1519% -2.1852%] (p = 0.00 < 0.05)
f32 2^20                time:   [351.59 us 352.98 us 355.48 us]                     
                        change: [-0.8055% +0.4256% +1.7513%] (p = 0.52 > 0.05)
f32 scalar 2^20         time:   [235.50 us 237.09 us 239.49 us]                            
                        change: [-4.2012% -1.9893% -0.1382%] (p = 0.06 > 0.05)
bool 2^20               time:   [192.72 us 193.63 us 195.08 us]                      
                        change: [+14.066% +16.238% +19.238%] (p = 0.00 < 0.05
bool scalar 2^20        time:   [26.444 us 26.557 us 26.692 us]                              
                        change: [-2.1207% -0.6513% +0.5040%] (p = 0.36 > 0.05)
utf8 2^20               time:   [6.5841 ms 6.6033 ms 6.6358 ms]                      
                        change: [-0.3873% +0.1999% +0.8762%] (p = 0.57 > 0.05)

@ritchie46
Copy link
Collaborator

Awesome. Nice work!

@jorgecarleitao jorgecarleitao marked this pull request as ready for review March 5, 2022 18:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature A new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Considering migrate to portable-simd
4 participants