Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize array::transform::utils::set_bits #716

Merged
merged 10 commits into from
Sep 9, 2021

Conversation

mathiaspeters-sig
Copy link
Contributor

Which issue does this PR close?

Closes #397

Rationale for this change

See issue.

What changes are included in this PR?

Two changes:

  1. I added unit tests to make sure the function behaves the same as before my changes
  2. I updated the function body to follow the proposed algorithm in the issue which in a nutshell is:
    • Set individual bytes to the next byte offset
    • Use a BitChunkIterator on the remaining bytes to set entire u8s
    • Set remaining bits individually

Are there any user-facing changes?

Sort of.

The way set_bits is called currently is that it's always applied to byte arrays where all bits are set to 0. As long as that is true there are no user facing changes. However, the old implementation would not overwrite a 1 with a 0 if that is what the data says but in the current implementation it will sometimes do that. Where it's setting individual bits (initial bits to get to a byte alignment and remainder bits after bit chunk iterator has run out of full u64) it behaves like the old implementation, but the section where it's setting full bytes sets the bytes regardless of the value in data and write_data.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Aug 25, 2021
@jhorstmann
Copy link
Contributor

Code looks good to me. The chained loop with bits_to_align and remainder could be split into two loops to write more sequential, but the code looks a bit simpler with one loop. I benchmarked this locally by adding to concatenate_kernels:

    let v1 = create_boolean_array(1024, 0.5, 0.0);
    let v2 = create_boolean_array(1024, 0.5, 0.0);
    c.bench_function("concat bool 1024", |b| {
        b.iter(|| bench_concat(&v1, &v2))
    });

    let v1 = create_boolean_array(1024, 0.5, 0.5);
    let v2 = create_boolean_array(1024, 0.5, 0.5);
    c.bench_function("concat bool nulls 1024", |b| {
        b.iter(|| bench_concat(&v1, &v2))
    });

The results are very good, speedup between factor 3-4, improvement on bigger batches could be even better. Interestingly the benchmark setup seems to always create a null bitmap, also for the tests that are supposed to be non-null. Otherwise I can't explain why those benches also see a big speedup.

There is minimal additional overhead in "concat 1024 arrays i32 4" but that is probably the worst case, concatenating 1024 arrays of length 4.

concat i32 1024         time:   [1.4334 us 1.4357 us 1.4382 us]                             
                        change: [-67.795% -67.484% -67.188%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  7 (7.00%) high mild
  5 (5.00%) high severe

concat i32 nulls 1024   time:   [1.6528 us 1.6549 us 1.6572 us]                                   
                        change: [-58.407% -57.885% -57.194%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  7 (7.00%) high mild
  8 (8.00%) high severe

concat 1024 arrays i32 4                                                                            
                        time:   [162.80 us 162.99 us 163.23 us]
                        change: [+4.3373% +6.1785% +7.9774%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe

concat str 1024         time:   [4.1305 us 4.1378 us 4.1471 us]                             
                        change: [-40.416% -40.067% -39.739%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe

concat str nulls 1024   time:   [21.156 us 21.181 us 21.208 us]                                   
                        change: [-3.1958% -2.3516% -1.5638%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

concat bool 1024        time:   [1.4137 us 1.4203 us 1.4281 us]                              
                        change: [-74.572% -74.403% -74.216%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

concat bool nulls 1024  time:   [1.4999 us 1.5033 us 1.5070 us]                                    
                        change: [-74.566% -74.398% -74.230%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

@alamb
Copy link
Contributor

alamb commented Aug 26, 2021

I plan to review this tomorrow

@codecov-commenter
Copy link

codecov-commenter commented Aug 26, 2021

Codecov Report

Merging #716 (ba6a71d) into master (8308615) will increase coverage by 0.07%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #716      +/-   ##
==========================================
+ Coverage   82.46%   82.53%   +0.07%     
==========================================
  Files         168      168              
  Lines       47419    47705     +286     
==========================================
+ Hits        39104    39375     +271     
- Misses       8315     8330      +15     
Impacted Files Coverage Δ
arrow/src/array/transform/utils.rs 98.71% <100.00%> (+3.71%) ⬆️
arrow/src/compute/kernels/comparison.rs 95.08% <0.00%> (-0.76%) ⬇️
parquet_derive_test/src/lib.rs 100.00% <0.00%> (ø)
arrow/src/array/array_binary.rs 92.46% <0.00%> (+0.23%) ⬆️
parquet/src/arrow/arrow_array_reader.rs 79.71% <0.00%> (+1.28%) ⬆️
parquet_derive/src/parquet_field.rs 68.86% <0.00%> (+2.69%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8308615...ba6a71d. Read the comment docs.

let chunks = BitChunks::new(data, offset_read + bits_to_align, len - bits_to_align);
chunks.iter().for_each(|chunk| {
null_count += chunk.count_zeros();
chunk.to_ne_bytes().iter().for_each(|b| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to use to_le_bytes, for example see the comments in ops.rs, bitwise_bin_op_helper (which has some typos):

// we are counting bits starting from the least significant bit, so to_le_bytes should be correct

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @mathiaspeters-sig

I am not sure about the test results -- I may be misunderstanding (and bit twiddling is not my speciality) but something looks not quite right.

arrow/src/array/transform/utils.rs Show resolved Hide resolved
arrow/src/array/transform/utils.rs Outdated Show resolved Hide resolved
arrow/src/array/transform/utils.rs Show resolved Hide resolved
arrow/src/array/transform/utils.rs Outdated Show resolved Hide resolved
arrow/src/array/transform/utils.rs Outdated Show resolved Hide resolved

let expected_data: &[u8] = &[
0b00111000, 0b00111111, 0b00111111, 0b00111111, 0b00111111, 0b00111111,
0b00111111, 0b00111111, 0b00000111, 0b00000000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the 3rd byte looks suspicious -- should it be?

Suggested change
0b00111111, 0b00111111, 0b00000111, 0b00000000,
0b00111111, 0b00111111, 0b00111100, 0b00000000,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this looks to be correct, the last 3 bits on the left from the source byte end up on the right of the last destination byte (set bits go from right to left)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also mention here that I double checked that the old implementation returned the specified result as well, for all 4 tests

Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked the implementation and verified what the tests are doing with the offsets and length. I'm happy with the change

@mathiaspeters-sig
Copy link
Contributor Author

@alamb anything that is left to do here in your opinion?

@alamb
Copy link
Contributor

alamb commented Sep 9, 2021

@alamb anything that is left to do here in your opinion?

Nope -- thanks @mathiaspeters-sig I am happy with @nevi-me 's review. I am sorry for the delay in merging I have been away and am now catching up

@alamb alamb merged commit 4221099 into apache:master Sep 9, 2021
alamb pushed a commit that referenced this pull request Sep 9, 2021
* Added tests

* Updated tests and improved implementation

* Cleanup

* Stopped collecting bytes before writing to write_data

* Added tests

* Cleanup and comments

* Fixed clippy warning

* Fixed an endianess issue

* Fixed comments and naming

* Made tests less prone to off-by-n errors
alamb added a commit that referenced this pull request Sep 10, 2021
* Added tests

* Updated tests and improved implementation

* Cleanup

* Stopped collecting bytes before writing to write_data

* Added tests

* Cleanup and comments

* Fixed clippy warning

* Fixed an endianess issue

* Fixed comments and naming

* Made tests less prone to off-by-n errors

Co-authored-by: mathiaspeters-sig <71126763+mathiaspeters-sig@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize MutableArrayData::extend for null buffers
6 participants