-
Notifications
You must be signed in to change notification settings - Fork 808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize array::transform::utils::set_bits #716
Conversation
Code looks good to me. The chained loop with bits_to_align and remainder could be split into two loops to write more sequential, but the code looks a bit simpler with one loop. I benchmarked this locally by adding to
The results are very good, speedup between factor 3-4, improvement on bigger batches could be even better. Interestingly the benchmark setup seems to always create a null bitmap, also for the tests that are supposed to be non-null. Otherwise I can't explain why those benches also see a big speedup. There is minimal additional overhead in "concat 1024 arrays i32 4" but that is probably the worst case, concatenating 1024 arrays of length 4.
|
I plan to review this tomorrow |
Codecov Report
@@ Coverage Diff @@
## master #716 +/- ##
==========================================
+ Coverage 82.46% 82.53% +0.07%
==========================================
Files 168 168
Lines 47419 47705 +286
==========================================
+ Hits 39104 39375 +271
- Misses 8315 8330 +15
Continue to review full report at Codecov.
|
arrow/src/array/transform/utils.rs
Outdated
let chunks = BitChunks::new(data, offset_read + bits_to_align, len - bits_to_align); | ||
chunks.iter().for_each(|chunk| { | ||
null_count += chunk.count_zeros(); | ||
chunk.to_ne_bytes().iter().for_each(|b| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to use to_le_bytes
, for example see the comments in ops.rs
, bitwise_bin_op_helper
(which has some typos):
// we are counting bits starting from the least significant bit, so to_le_bytes should be correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution @mathiaspeters-sig
I am not sure about the test results -- I may be misunderstanding (and bit twiddling is not my speciality) but something looks not quite right.
arrow/src/array/transform/utils.rs
Outdated
|
||
let expected_data: &[u8] = &[ | ||
0b00111000, 0b00111111, 0b00111111, 0b00111111, 0b00111111, 0b00111111, | ||
0b00111111, 0b00111111, 0b00000111, 0b00000000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the 3rd byte looks suspicious -- should it be?
0b00111111, 0b00111111, 0b00000111, 0b00000000, | |
0b00111111, 0b00111111, 0b00111100, 0b00000000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me this looks to be correct, the last 3 bits on the left from the source byte end up on the right of the last destination byte (set bits go from right to left)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can also mention here that I double checked that the old implementation returned the specified result as well, for all 4 tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked the implementation and verified what the tests are doing with the offsets and length. I'm happy with the change
@alamb anything that is left to do here in your opinion? |
Nope -- thanks @mathiaspeters-sig I am happy with @nevi-me 's review. I am sorry for the delay in merging I have been away and am now catching up |
* Added tests * Updated tests and improved implementation * Cleanup * Stopped collecting bytes before writing to write_data * Added tests * Cleanup and comments * Fixed clippy warning * Fixed an endianess issue * Fixed comments and naming * Made tests less prone to off-by-n errors
* Added tests * Updated tests and improved implementation * Cleanup * Stopped collecting bytes before writing to write_data * Added tests * Cleanup and comments * Fixed clippy warning * Fixed an endianess issue * Fixed comments and naming * Made tests less prone to off-by-n errors Co-authored-by: mathiaspeters-sig <71126763+mathiaspeters-sig@users.noreply.github.com>
Which issue does this PR close?
Closes #397
Rationale for this change
See issue.
What changes are included in this PR?
Two changes:
BitChunkIterator
on the remaining bytes to set entireu8
sAre there any user-facing changes?
Sort of.
The way
set_bits
is called currently is that it's always applied to byte arrays where all bits are set to 0. As long as that is true there are no user facing changes. However, the old implementation would not overwrite a1
with a0
if that is what the data says but in the current implementation it will sometimes do that. Where it's setting individual bits (initial bits to get to a byte alignment and remainder bits after bit chunk iterator has run out of full u64) it behaves like the old implementation, but the section where it's setting full bytes sets the bytes regardless of the value indata
andwrite_data
.