Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve MutableArrayData Null Handling (#1224) (#1230) #1225

Closed

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Jan 23, 2022

Which issue does this PR close?

Closes #1224

Rationale for this change

See ticket.

This improves the performance of filtering arrays with no nulls by ~2x on my local machine, and will likely see similar performance improvements in other kernels.

What changes are included in this PR?

This changes MutableArrayData to skip building the null mask when it has already decided it is not needed

Are there any user-facing changes?

No

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 23, 2022
.collect();

let null_buffer = if use_nulls {
let (null_buffer, extend_null_bits) = if use_nulls {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_nulls is always true if any of the input arrays contain nulls

@tustvold
Copy link
Contributor Author

Benchmarks

filter u8               time:   [289.42 us 290.41 us 291.38 us]                      
                        change: [-40.610% -40.339% -40.079%] (p = 0.00 < 0.05)
                        Performance has improved.

filter u8 high selectivity                                                                             
                        time:   [5.2395 us 5.2626 us 5.2940 us]
                        change: [-58.224% -58.063% -57.846%] (p = 0.00 < 0.05)
                        Performance has improved.

filter u8 low selectivity                                                                             
                        time:   [4.4407 us 4.4438 us 4.4476 us]
                        change: [-32.040% -27.787% -23.268%] (p = 0.00 < 0.05)
                        Performance has improved.

filter context u8       time:   [105.87 us 105.90 us 105.92 us]                              
                        change: [-64.860% -64.836% -64.812%] (p = 0.00 < 0.05)
                        Performance has improved.

filter context u8 high selectivity                                                                             
                        time:   [1.7448 us 1.8098 us 1.8646 us]
                        change: [-80.196% -79.460% -78.803%] (p = 0.00 < 0.05)
                        Performance has improved.

filter context u8 low selectivity                                                                            
                        time:   [419.29 ns 419.96 ns 420.88 ns]
                        change: [-56.222% -55.769% -55.162%] (p = 0.00 < 0.05)
                        Performance has improved.

So a little bit over 2x faster, performance for arrays containing nulls is unimpacted (as expected)

@codecov-commenter
Copy link

Codecov Report

Merging #1225 (bfa0024) into master (fcd37ee) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1225      +/-   ##
==========================================
- Coverage   82.70%   82.70%   -0.01%     
==========================================
  Files         175      175              
  Lines       51711    51714       +3     
==========================================
+ Hits        42769    42771       +2     
- Misses       8942     8943       +1     
Impacted Files Coverage Δ
arrow/src/array/transform/mod.rs 85.47% <100.00%> (-0.10%) ⬇️
parquet_derive/src/parquet_field.rs 65.98% <0.00%> (-0.23%) ⬇️
arrow/src/csv/reader.rs 88.12% <0.00%> (+0.01%) ⬆️
arrow/src/datatypes/field.rs 54.10% <0.00%> (+0.30%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fcd37ee...bfa0024. Read the comment docs.

@jhorstmann
Copy link
Contributor

Nice, I think this should also speed up the concatenate_kernel benchmarks.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find -- I checked that self.extend_null_bits isn't used anywhere else 👍

@tustvold
Copy link
Contributor Author

I just need to double-check how this interacts with ExtendNulls

@tustvold tustvold marked this pull request as draft January 23, 2022 15:38
@tustvold
Copy link
Contributor Author

Applied the same treatment to ExtendNulls. FWIW I think this was a bug before, as you could call extend_nulls on a MutableArrayData with null handling disabled and end up with an inconsistent final ArrayData

@tustvold tustvold marked this pull request as ready for review January 23, 2022 16:22
@tustvold tustvold marked this pull request as draft January 23, 2022 16:41
@tustvold
Copy link
Contributor Author

This is currently blocked by #1230 which is in turn blocked by #1197. Will work on getting fixes for those up

@@ -608,18 +614,28 @@ impl<'a> MutableArrayData<'a> {
/// This function panics if the range is out of bounds, i.e. if `start + len >= array.len()`.
pub fn extend(&mut self, index: usize, start: usize, end: usize) {
let len = end - start;
(self.extend_null_bits[index])(&mut self.data, start, len);
if !self.extend_null_bits.is_empty() {
(self.extend_null_bits[index])(&mut self.data, start, len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name of the use_nulls parameter is confusing and is not necessarily related to whether nulls are present or not, here is an excerpt of the method's doc comment:

use_nulls is a flag used to optimize insertions. It should be false if the only source of nulls are the arrays themselves

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at build_extend_null_bits this is already optimized by returning a no-op function ( Box::new(|_, _, _| {}) ) if use_nulls is false and the array doesn't have a null bitmap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes so if it is true it cannot assume that a bitmask won't be needed due to a call to extend_nulls. Otherwise if it is false and the arrays don't contain nulls, it knows it doesn't need to compute a null bitmask?

At least that was my reading of it, although the fact you can call extend_nulls having specified use_nulls as false and get something other than a panic suggests maybe I'm missing something 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no-op function ( Box::new(|_, _, _| {}) ) if use_nulls is false and the array doesn't have a null bitmap

This is a good point, the particular case the filter benchmarks hit is where the array has a null bitmap, but a zero null count. So an alternative fix might be to fix extend_null_bits. Although now that I think about it, I'm not sure that no-op function is correct in the event of mixed array nullability 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my reading of the code (and the little documentation that accompanies it) is that use_nulls is used to determine where the nulls will be coming from, not whether nulls are present or not

Copy link
Contributor Author

@tustvold tustvold Jan 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree with you if it weren't for this code

// if any of the arrays has nulls, insertions from any array requires setting bits
// as there is at least one array with nulls.
if arrays.iter().any(|array| array.null_count() > 0) {
    use_nulls = true;
};

Effectively it changes meaning part way through 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I agree - this needs clarification or fixing, to make it less confusing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @tustvold has resolved this in the latest version of this PR -- by making it clear that use_nulls must be set to true otherwise a panic will result if the buffer is extended with nulls

@tustvold
Copy link
Contributor Author

tustvold commented Jan 23, 2022

I've pushed a first cut of trying to clarify the null handling in MutableArrayData. This PR also now fixes #1230. It is worth highlighting that this is an API change. In particular until #1234 is merged the tests will fail.

I also changed to using BooleanBufferBuilder instead of MutableBuffer directly, which gives an extra sanity check and also avoids some code duplication. This required making BooleanBufferBuilder::append_packed_range return the number of 0 bits set. This API is relatively new, added in #1039, but this still is technically a breaking change.

I will take another look at this with fresh eyes in the morning.

}

array_data_builder
}
}

fn build_extend_null_bits(array: &ArrayData, use_nulls: bool) -> ExtendNullBits {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than returning a no-op if use_nulls is false, simply don't construct. An ExtendNullBits that does nothing feels like a potential footgun that is better to handle in a way that will fail loudly

@tustvold tustvold changed the title Skip building null mask in MutableArrayData (#1224) Improve MutableArrayData Null Handling (#1224, #1230) Jan 23, 2022
@tustvold tustvold changed the title Improve MutableArrayData Null Handling (#1224, #1230) Improve MutableArrayData Null Handling (#1224) (#1230) Jan 23, 2022
@tustvold tustvold marked this pull request as ready for review January 25, 2022 11:09
@tustvold tustvold requested a review from alamb January 25, 2022 11:09
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good. Thank you @tustvold

@yordan-pavlov any other thoughts?

/// be computed only if `arrays` contains nulls.
///
/// Code that plans to call [MutableArrayData::extend_nulls] MUST set `use_nulls` to `true`,
/// in order to ensure that a null bitmap is computed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// in order to ensure that a null bitmap is computed.
/// in order to ensure that a null bitmap is computed, otherwise a panic will result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will panic otherwise, right? Rather than get undefined behavior?

} else {
// create 0 capacity mutable buffer with the intention that it won't be used
MutableBuffer::with_capacity(0)
// create no null buffer and no extend_null_bits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice 👍

@@ -608,18 +614,28 @@ impl<'a> MutableArrayData<'a> {
/// This function panics if the range is out of bounds, i.e. if `start + len >= array.len()`.
pub fn extend(&mut self, index: usize, start: usize, end: usize) {
let len = end - start;
(self.extend_null_bits[index])(&mut self.data, start, len);
if !self.extend_null_bits.is_empty() {
(self.extend_null_bits[index])(&mut self.data, start, len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @tustvold has resolved this in the latest version of this PR -- by making it clear that use_nulls must be set to true otherwise a panic will result if the buffer is extended with nulls

let data = data.freeze();

assert_eq!(data.len(), 9);
assert_eq!(data.null_buffer().unwrap().len(), 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 because there are two bytes needed to hold the bitmap?

/// be computed only if `arrays` contains nulls.
///
/// Code that plans to call [MutableArrayData::extend_nulls] MUST set `use_nulls` to `true`,
/// in order to ensure that a null bitmap is computed.
pub fn new(arrays: Vec<&'a ArrayData>, use_nulls: bool, capacity: usize) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having thought some more about this, wouldn't something like compute_nulls or create_null_bitmap (instead of use_nulls) be a better name, because:
(1) if it's true, then a null bitmap is always created, no matter if any the input arrays have a null bitmap
(2) the documentation comment, I think, reads better as e.g.

 if `compute_nulls` is `true` a null bitmap will be created regardless of the contents of `arrays`

@yordan-pavlov
Copy link
Contributor

@tustvold are you still seeing a 2x performance improvement in filter benchmarks after the latest changes?

@tustvold
Copy link
Contributor Author

tustvold commented Jan 26, 2022

cargo criterion --bench filter_kernels
   Compiling arrow v8.0.0 (/home/raphael/repos/external/arrow-rs/arrow)
    Finished bench [optimized] target(s) in 18.23s
filter u8               time:   [291.13 us 293.93 us 298.49 us]                      
                        change: [-40.935% -40.686% -40.239%] (p = 0.00 < 0.05)
                        Performance has improved.

filter u8 high selectivity                                                                             
                        time:   [5.8296 us 5.8316 us 5.8336 us]
                        change: [-54.079% -53.954% -53.829%] (p = 0.00 < 0.05)
                        Performance has improved.

filter u8 low selectivity                                                                             
                        time:   [3.7740 us 3.7783 us 3.7829 us]
                        change: [-12.217% -11.997% -11.788%] (p = 0.00 < 0.05)
                        Performance has improved.

filter context u8       time:   [105.74 us 105.76 us 105.80 us]                              
                        change: [-63.643% -63.614% -63.586%] (p = 0.00 < 0.05)
                        Performance has improved.

filter context u8 high selectivity                                                                             
                        time:   [1.3801 us 1.3816 us 1.3829 us]
                        change: [-82.396% -82.359% -82.319%] (p = 0.00 < 0.05)
                        Performance has improved.

filter context u8 low selectivity                                                                            
                        time:   [401.67 ns 401.79 ns 401.92 ns]
                        change: [-58.196% -58.112% -58.047%] (p = 0.00 < 0.05)
                        Performance has improved.

filter context u8 w NULLs                                                                            
                        time:   [427.53 us 427.66 us 427.80 us]
                        change: [+13.449% +13.527% +13.598%] (p = 0.00 < 0.05)
                        Performance has regressed.

filter context u8 w NULLs high selectivity                                                                             
                        time:   [6.8897 us 6.8919 us 6.8946 us]
                        change: [+0.2869% +0.3711% +0.4592%] (p = 0.00 < 0.05)
                        Change within noise threshold.

filter context u8 w NULLs low selectivity                                                                             
                        time:   [1.0082 us 1.0085 us 1.0088 us]
                        change: [+6.1612% +6.4041% +6.5859%] (p = 0.00 < 0.05)
                        Performance has regressed.

filter f32              time:   [606.18 us 607.55 us 608.93 us]                       
                        change: [+6.1214% +6.3825% +6.6391%] (p = 0.00 < 0.05)
                        Performance has regressed.

filter context f32      time:   [427.36 us 428.01 us 429.08 us]                               
                        change: [+12.435% +12.609% +12.799%] (p = 0.00 < 0.05)
                        Performance has regressed.

filter context f32 high selectivity                                                                             
                        time:   [12.375 us 12.907 us 13.357 us]
                        change: [+1.5047% +4.5855% +7.1816%] (p = 0.00 < 0.05)
                        Performance has regressed.

filter context f32 low selectivity                                                                             
                        time:   [1.0550 us 1.0552 us 1.0554 us]
                        change: [+8.6226% +9.4838% +10.093%] (p = 0.00 < 0.05)
                        Performance has regressed.

filter context string   time:   [534.98 us 535.16 us 535.32 us]                                  
                        change: [+9.7285% +9.8604% +10.001%] (p = 0.00 < 0.05)
                        Performance has regressed.

filter context string high selectivity                                                                            
                        time:   [402.80 us 402.92 us 403.03 us]
                        change: [-2.6457% -2.5796% -2.5140%] (p = 0.00 < 0.05)
                        Performance has improved.

filter context string low selectivity                                                                             
                        time:   [1.3243 us 1.3246 us 1.3249 us]
                        change: [+3.4003% +3.7378% +4.0158%] (p = 0.00 < 0.05)
                        Performance has regressed.

filter single record batch                                                                            
                        time:   [286.47 us 286.77 us 287.09 us]
                        change: [-41.765% -41.668% -41.572%] (p = 0.00 < 0.05)
                        Performance has improved.

So it makes filtering arrays without nulls about takes ~50% less time, however, it does seem to make filtering arrays with nulls take 10% longer. This is likely down to the issue in #1229 , that the extend_bits function is ludicrously "hot" for these benchmarks where the runs are typically 1 or 2 elements long.

I'd personally prefer to merge this as is and keep pushing forward, but I can also hold off on this until I've fixed #1229.

@tustvold
Copy link
Contributor Author

Hmm I've changed my mind - will pause this until I've fixed #1229 as it will influence the benchmarks significantly

@tustvold
Copy link
Contributor Author

Following #1229 this no longer yields a performance improvement, and for some reason seems to slow down null mask handling in the filter kernel (possibly related to the change to append_ranges) so going to close this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MutableArrayData Builds Null Mask Having Decided Not To
5 participants