-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Null validity bitmap in ArrowArray #7199
Conversation
This is allowed when we have no nulls, simplifies the code a little, and reduces memory usage.
This looks like a good improvement and is also supported by the Arrow Columnar Format:
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7199 +/- ##
==========================================
+ Coverage 80.06% 81.78% +1.72%
==========================================
Files 190 204 +14
Lines 37181 38121 +940
Branches 9450 9886 +436
==========================================
+ Hits 29770 31179 +1409
+ Misses 2997 2964 -33
+ Partials 4414 3978 -436 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Some minor nits that does not affect the correctness of the patch for any normal platform, and it would be good to add a comment on from what compiler version that the loops start being unrolled.
validity_bitmap[n_total / 64] &= tail_mask; | ||
} | ||
|
||
uint64 *restrict validity_bitmap = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: do you need the restrict
keyword here? AFAICT, it is not used in a manner where aliasing can be a problem. I know this is just copied code, but I think it is not needed in the old location either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is required. We're reading from another array and writing into this one in a loop, so good to signal that they are not aliasing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am not understanding the situation correctly, but the restrict
keyword is only relevant if you are calling functions in different translation units and hence do not have knowledge of the aliasing situation.
The compiler is fully capable of understanding the situation inside this function and see that there are no aliasing situation.
An example of this is are the memcpy
function, which use the restrict
keyword to signal to the compiler that it does not allow same pointer to be passed in. Note that memset
does not have the restrict
keyword since a single pointer cannot cause such a situation.
It does not make any practical difference though, so you can keep it if you're unsure about if it has an effect, but I cannot see how it would be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler is fully capable of understanding the situation inside this function and see that there are no aliasing situation.
Why? PG and Timescale are compiled with -fno-strict-aliasing
, are you sure that uint64 *
and bool *
cannot alias with this flag? I think it's easier to slap restrict
on everything that's not const than try to figure it out for each case.
} | ||
|
||
/* Now move the data to account for nulls, and fill the validity bitmap. */ | ||
uint64 *restrict validity_bitmap = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
uint64 *restrict validity_bitmap = NULL; | |
uint64 * validity_bitmap = NULL; |
memset(validity_bitmap, 0xFF, validity_bitmap_bytes); | ||
if (n_total % 64) | ||
{ | ||
const uint64 tail_mask = -1ULL >> (64 - n_total % 64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: -1ULL
can raise warnings with some static checkers. better to use ~0ULL
if you want to have a "all ones" integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which ones, can we add them to the CI? We have -1ULL
in several other places as well and no ~0U
. Postgres uses ~0U
consistently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't got a list of them, sorry, I just point out that you might run into it. If you don't, then it does not make a difference. All modern systems I am aware of use 2-complement encoding, so it should not cause any other issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently this is MISRA-C 2004 rule 12.9 The unary minus operator shall not be applied to an expression whose underlying type is unsigned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but the standard has changed as well, so I think that in C++20 this is well-defined. 🙄 I just stay clear of these to play it safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this and all other places as well.
validity_bitmap[n_total / 64] &= tail_mask; | ||
} | ||
|
||
uint64 *restrict validity_bitmap = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
uint64 *restrict validity_bitmap = NULL; | |
uint64 * validity_bitmap = NULL; |
validity_bitmap[n_total / 64] &= tail_mask; | ||
} | ||
|
||
uint64 *restrict validity_bitmap = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint64 *restrict validity_bitmap = NULL; | |
uint64 * validity_bitmap = NULL; |
@@ -74,13 +74,14 @@ vector_nulltest(const ArrowArray *arrow, int test_type, uint64 *restrict result) | |||
const uint64 *validity = (const uint64 *) arrow->buffers[0]; | |||
for (uint16 i = 0; i < bitmap_words; i++) | |||
{ | |||
const uint64 validity_word = validity != NULL ? validity[i] : -1ULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const uint64 validity_word = validity != NULL ? validity[i] : -1ULL; | |
const uint64 validity_word = validity != NULL ? validity[i] : ~0ULL; |
memset(validity_bitmap, 0xFF, validity_bitmap_bytes); | ||
if (n_total % 64) | ||
{ | ||
const uint64 tail_mask = -1ULL >> (64 - n_total % 64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const uint64 tail_mask = -1ULL >> (64 - n_total % 64); | |
const uint64 tail_mask = ~0ULL >> (64 - n_total % 64); |
memset(validity_bitmap, 0xFF, validity_bitmap_bytes); | ||
if (n_total % 64) | ||
{ | ||
const uint64 tail_mask = -1ULL >> (64 - n_total % 64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const uint64 tail_mask = -1ULL >> (64 - n_total % 64); | |
const uint64 tail_mask = ~0ULL >> (64 - n_total % 64); |
memset(validity_bitmap, 0xFF, validity_bitmap_bytes); | ||
if (n_total % 64) | ||
{ | ||
const uint64 tail_mask = -1ULL >> (64 - n_total % 64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const uint64 tail_mask = -1ULL >> (64 - n_total % 64); | |
const uint64 tail_mask = ~0ULL >> (64 - n_total % 64); |
tsl/src/nodes/vector_agg/functions.c
Outdated
|
||
/* | ||
* This loop is not unrolled automatically, so do it manually as usual. | ||
* The value buffer is padded to an even multiple of 64 bytes, i.e. to | ||
* 64 / 4 = 16 elements. The bitmap is an even multiple of 64 elements. | ||
* The number of elements in the inner loop must be less than both these | ||
* values so that we don't go out of bounds. The particular value was | ||
* chosen because it gives some speedup, and the larger values blow up | ||
* the generated code with no performance benefit (checked on clang 16). | ||
*/ | ||
#define INNER_LOOP_SIZE 4 | ||
const int outer_boundary = pad_to_multiple(INNER_LOOP_SIZE, vector->length); | ||
for (int outer = 0; outer < outer_boundary; outer += INNER_LOOP_SIZE) | ||
for (int row = 0; row < vector->length; row++) | ||
{ | ||
for (int inner = 0; inner < INNER_LOOP_SIZE; inner++) | ||
{ | ||
const int row = outer + inner; | ||
const int32 arrow_value = ((int32 *) vector->buffers[1])[row]; | ||
const bool passes_filter = filter ? arrow_row_is_valid(filter, row) : true; | ||
batch_sum += passes_filter * arrow_value * arrow_row_is_valid(vector->buffers[0], row); | ||
} | ||
const int32 arrow_value = ((int32 *) vector->buffers[1])[row]; | ||
batch_sum += arrow_value * arrow_row_is_valid(filter, row) * | ||
arrow_row_is_valid(vector->buffers[0], row); | ||
} | ||
#undef INNER_LOOP_SIZE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old code noted that the loop was not unrolled. Did that change? If so, since what version of the compilers? You should probably add a comment about that since it would clarify the situation when compiled with an old version of the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just decided to remove this optimization to simplify things, since it doesn't make a difference for the queries we currently have in the benchmark. I think it makes sense to revisit this when we start optimizing the sum
function, because there are surprisingly many implementations possible for it, and this unrolled loop is far from the most optimal anyway. At some other DB engine we benchmarked like twenty of them to get close to a competitor's performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with removing this optimization for now. We can start with some more serious tweaking later, but it makes sense to add this information to the commit for posterity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment.
The unary minus operator shall not be applied to an expression whose underlying type is unsigned
This is allowed when we have no nulls, simplifies the code a little, and reduces memory usage.
No changes in tsbench: https://grafana.ops.savannah-dev.timescale.com/d/fasYic_4z/compare-akuzm?orgId=1&var-branch=All&var-run1=3642&var-run2=3678&var-threshold=0&var-use_historical_thresholds=true&var-threshold_expression=2.5%20%2A%20percentile_cont%280.90%29&var-exact_suite_version=false&from=now-2d&to=now
Disable-check: force-changelog-file