Optimize array::transform::utils::set_bits #716

mathiaspeters-sig · 2021-08-25T18:50:19Z

Which issue does this PR close?

Closes #397

Rationale for this change

See issue.

What changes are included in this PR?

Two changes:

I added unit tests to make sure the function behaves the same as before my changes
I updated the function body to follow the proposed algorithm in the issue which in a nutshell is:
- Set individual bytes to the next byte offset
- Use a BitChunkIterator on the remaining bytes to set entire u8s
- Set remaining bits individually

Are there any user-facing changes?

Sort of.

The way set_bits is called currently is that it's always applied to byte arrays where all bits are set to 0. As long as that is true there are no user facing changes. However, the old implementation would not overwrite a 1 with a 0 if that is what the data says but in the current implementation it will sometimes do that. Where it's setting individual bits (initial bits to get to a byte alignment and remainder bits after bit chunk iterator has run out of full u64) it behaves like the old implementation, but the section where it's setting full bytes sets the bytes regardless of the value in data and write_data.

jhorstmann · 2021-08-26T17:56:13Z

Code looks good to me. The chained loop with bits_to_align and remainder could be split into two loops to write more sequential, but the code looks a bit simpler with one loop. I benchmarked this locally by adding to concatenate_kernels:

    let v1 = create_boolean_array(1024, 0.5, 0.0);
    let v2 = create_boolean_array(1024, 0.5, 0.0);
    c.bench_function("concat bool 1024", |b| {
        b.iter(|| bench_concat(&v1, &v2))
    });

    let v1 = create_boolean_array(1024, 0.5, 0.5);
    let v2 = create_boolean_array(1024, 0.5, 0.5);
    c.bench_function("concat bool nulls 1024", |b| {
        b.iter(|| bench_concat(&v1, &v2))
    });

The results are very good, speedup between factor 3-4, improvement on bigger batches could be even better. Interestingly the benchmark setup seems to always create a null bitmap, also for the tests that are supposed to be non-null. Otherwise I can't explain why those benches also see a big speedup.

There is minimal additional overhead in "concat 1024 arrays i32 4" but that is probably the worst case, concatenating 1024 arrays of length 4.

concat i32 1024         time:   [1.4334 us 1.4357 us 1.4382 us]                             
                        change: [-67.795% -67.484% -67.188%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  7 (7.00%) high mild
  5 (5.00%) high severe

concat i32 nulls 1024   time:   [1.6528 us 1.6549 us 1.6572 us]                                   
                        change: [-58.407% -57.885% -57.194%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  7 (7.00%) high mild
  8 (8.00%) high severe

concat 1024 arrays i32 4                                                                            
                        time:   [162.80 us 162.99 us 163.23 us]
                        change: [+4.3373% +6.1785% +7.9774%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe

concat str 1024         time:   [4.1305 us 4.1378 us 4.1471 us]                             
                        change: [-40.416% -40.067% -39.739%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe

concat str nulls 1024   time:   [21.156 us 21.181 us 21.208 us]                                   
                        change: [-3.1958% -2.3516% -1.5638%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

concat bool 1024        time:   [1.4137 us 1.4203 us 1.4281 us]                              
                        change: [-74.572% -74.403% -74.216%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

concat bool nulls 1024  time:   [1.4999 us 1.5033 us 1.5070 us]                                    
                        change: [-74.566% -74.398% -74.230%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

alamb · 2021-08-26T20:42:52Z

I plan to review this tomorrow

codecov-commenter · 2021-08-26T21:01:45Z

Codecov Report

Merging #716 (ba6a71d) into master (8308615) will increase coverage by 0.07%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #716      +/-   ##
==========================================
+ Coverage   82.46%   82.53%   +0.07%     
==========================================
  Files         168      168              
  Lines       47419    47705     +286     
==========================================
+ Hits        39104    39375     +271     
- Misses       8315     8330      +15

Impacted Files	Coverage Δ
arrow/src/array/transform/utils.rs	`98.71% <100.00%> (+3.71%)`	⬆️
arrow/src/compute/kernels/comparison.rs	`95.08% <0.00%> (-0.76%)`	⬇️
parquet_derive_test/src/lib.rs	`100.00% <0.00%> (ø)`
arrow/src/array/array_binary.rs	`92.46% <0.00%> (+0.23%)`	⬆️
parquet/src/arrow/arrow_array_reader.rs	`79.71% <0.00%> (+1.28%)`	⬆️
parquet_derive/src/parquet_field.rs	`68.86% <0.00%> (+2.69%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8308615...ba6a71d. Read the comment docs.

jhorstmann · 2021-08-27T10:12:35Z

arrow/src/array/transform/utils.rs

+    let chunks = BitChunks::new(data, offset_read + bits_to_align, len - bits_to_align);
+    chunks.iter().for_each(|chunk| {
+        null_count += chunk.count_zeros();
+        chunk.to_ne_bytes().iter().for_each(|b| {


I think this needs to use to_le_bytes, for example see the comments in ops.rs, bitwise_bin_op_helper (which has some typos):

// we are counting bits starting from the least significant bit, so to_le_bytes should be correct

alamb

Thank you for the contribution @mathiaspeters-sig

I am not sure about the test results -- I may be misunderstanding (and bit twiddling is not my speciality) but something looks not quite right.

arrow/src/array/transform/utils.rs

alamb · 2021-08-27T20:11:46Z

arrow/src/array/transform/utils.rs

+
+        let expected_data: &[u8] = &[
+            0b00111000, 0b00111111, 0b00111111, 0b00111111, 0b00111111, 0b00111111,
+            0b00111111, 0b00111111, 0b00000111, 0b00000000,


i think the 3rd byte looks suspicious -- should it be?

Suggested change

0b00111111, 0b00111111, 0b00000111, 0b00000000,

0b00111111, 0b00111111, 0b00111100, 0b00000000,

To me this looks to be correct, the last 3 bits on the left from the source byte end up on the right of the last destination byte (set bits go from right to left)

I can also mention here that I double checked that the old implementation returned the specified result as well, for all 4 tests

nevi-me

I've checked the implementation and verified what the tests are doing with the offsets and length. I'm happy with the change

mathiaspeters-sig · 2021-09-07T08:36:17Z

@alamb anything that is left to do here in your opinion?

alamb · 2021-09-09T21:31:55Z

@alamb anything that is left to do here in your opinion?

Nope -- thanks @mathiaspeters-sig I am happy with @nevi-me 's review. I am sorry for the delay in merging I have been away and am now catching up

* Added tests * Updated tests and improved implementation * Cleanup * Stopped collecting bytes before writing to write_data * Added tests * Cleanup and comments * Fixed clippy warning * Fixed an endianess issue * Fixed comments and naming * Made tests less prone to off-by-n errors

* Added tests * Updated tests and improved implementation * Cleanup * Stopped collecting bytes before writing to write_data * Added tests * Cleanup and comments * Fixed clippy warning * Fixed an endianess issue * Fixed comments and naming * Made tests less prone to off-by-n errors Co-authored-by: mathiaspeters-sig <71126763+mathiaspeters-sig@users.noreply.github.com>

mathiaspeters-sig added 3 commits August 24, 2021 15:35

Added tests

edb16ba

Updated tests and improved implementation

cc04f37

Cleanup

91b6e7f

github-actions bot added the arrow Changes to the arrow crate label Aug 25, 2021

mathiaspeters-sig added 2 commits August 26, 2021 15:15

Stopped collecting bytes before writing to write_data

68e1ecf

Added tests

bcb6991

mathiaspeters-sig added 2 commits August 27, 2021 07:41

Cleanup and comments

6387c14

Fixed clippy warning

fe1456d

jhorstmann reviewed Aug 27, 2021

View reviewed changes

Fixed an endianess issue

869a6b0

alamb reviewed Aug 27, 2021

View reviewed changes

mathiaspeters-sig added 2 commits August 30, 2021 08:29

Fixed comments and naming

015d279

Made tests less prone to off-by-n errors

ba6a71d

nevi-me approved these changes Sep 1, 2021

View reviewed changes

alamb merged commit 4221099 into apache:master Sep 9, 2021

alamb added the cherry-picked label Sep 9, 2021

alamb mentioned this pull request Sep 9, 2021

Cherry pick Optimize array::transform::utils::set_bits to active_release #761

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize array::transform::utils::set_bits #716

Optimize array::transform::utils::set_bits #716

mathiaspeters-sig commented Aug 25, 2021

jhorstmann commented Aug 26, 2021

alamb commented Aug 26, 2021

codecov-commenter commented Aug 26, 2021 •

edited

Loading

jhorstmann Aug 27, 2021

alamb left a comment

alamb Aug 27, 2021

Dandandan Aug 27, 2021

mathiaspeters-sig Aug 30, 2021

nevi-me left a comment

mathiaspeters-sig commented Sep 7, 2021

alamb commented Sep 9, 2021

	0b00111111, 0b00111111, 0b00000111, 0b00000000,
	0b00111111, 0b00111111, 0b00111100, 0b00000000,

Optimize array::transform::utils::set_bits #716

Optimize array::transform::utils::set_bits #716

Conversation

mathiaspeters-sig commented Aug 25, 2021

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

jhorstmann commented Aug 26, 2021

alamb commented Aug 26, 2021

codecov-commenter commented Aug 26, 2021 • edited Loading

Codecov Report

jhorstmann Aug 27, 2021

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

alamb Aug 27, 2021

Choose a reason for hiding this comment

Dandandan Aug 27, 2021

Choose a reason for hiding this comment

mathiaspeters-sig Aug 30, 2021

Choose a reason for hiding this comment

nevi-me left a comment

Choose a reason for hiding this comment

mathiaspeters-sig commented Sep 7, 2021

alamb commented Sep 9, 2021

codecov-commenter commented Aug 26, 2021 •

edited

Loading