Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null validity bitmap in ArrowArray #7199

Merged
merged 5 commits into from
Aug 20, 2024
Merged

Null validity bitmap in ArrowArray #7199

merged 5 commits into from
Aug 20, 2024

Conversation

akuzm
Copy link
Member

@akuzm akuzm commented Aug 14, 2024

This is allowed when we have no nulls, simplifies the code a little, and
reduces memory usage.
@mkindahl
Copy link
Contributor

This looks like a good improvement and is also supported by the Arrow Columnar Format:

Arrays having a 0 null count may choose to not allocate the validity bitmap; how this is represented depends on the implementation (for example, a C++ implementation may represent such an “absent” validity bitmap using a NULL pointer). Implementations may choose to always allocate a validity bitmap anyway as a matter of convenience. Consumers of Arrow arrays should be ready to handle those two possibilities.

@akuzm akuzm marked this pull request as ready for review August 15, 2024 09:03
Copy link

codecov bot commented Aug 15, 2024

Codecov Report

Attention: Patch coverage is 94.33962% with 3 lines in your changes missing coverage. Please review.

Project coverage is 81.78%. Comparing base (59f50f2) to head (792c097).
Report is 298 commits behind head on main.

Files Patch % Lines
tsl/src/compression/algorithms/array.c 85.71% 0 Missing and 1 partial ⚠️
tsl/src/nodes/decompress_chunk/compressed_batch.c 75.00% 0 Missing and 1 partial ⚠️
tsl/src/nodes/vector_agg/exec.c 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7199      +/-   ##
==========================================
+ Coverage   80.06%   81.78%   +1.72%     
==========================================
  Files         190      204      +14     
  Lines       37181    38121     +940     
  Branches     9450     9886     +436     
==========================================
+ Hits        29770    31179    +1409     
+ Misses       2997     2964      -33     
+ Partials     4414     3978     -436     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Some minor nits that does not affect the correctness of the patch for any normal platform, and it would be good to add a comment on from what compiler version that the loops start being unrolled.

validity_bitmap[n_total / 64] &= tail_mask;
}

uint64 *restrict validity_bitmap = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: do you need the restrict keyword here? AFAICT, it is not used in a manner where aliasing can be a problem. I know this is just copied code, but I think it is not needed in the old location either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is required. We're reading from another array and writing into this one in a loop, so good to signal that they are not aliasing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am not understanding the situation correctly, but the restrict keyword is only relevant if you are calling functions in different translation units and hence do not have knowledge of the aliasing situation.

The compiler is fully capable of understanding the situation inside this function and see that there are no aliasing situation.

An example of this is are the memcpy function, which use the restrict keyword to signal to the compiler that it does not allow same pointer to be passed in. Note that memset does not have the restrict keyword since a single pointer cannot cause such a situation.

It does not make any practical difference though, so you can keep it if you're unsure about if it has an effect, but I cannot see how it would be required.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler is fully capable of understanding the situation inside this function and see that there are no aliasing situation.

Why? PG and Timescale are compiled with -fno-strict-aliasing, are you sure that uint64 * and bool * cannot alias with this flag? I think it's easier to slap restrict on everything that's not const than try to figure it out for each case.

}

/* Now move the data to account for nulls, and fill the validity bitmap. */
uint64 *restrict validity_bitmap = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Suggested change
uint64 *restrict validity_bitmap = NULL;
uint64 * validity_bitmap = NULL;

memset(validity_bitmap, 0xFF, validity_bitmap_bytes);
if (n_total % 64)
{
const uint64 tail_mask = -1ULL >> (64 - n_total % 64);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: -1ULL can raise warnings with some static checkers. better to use ~0ULL if you want to have a "all ones" integer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which ones, can we add them to the CI? We have -1ULL in several other places as well and no ~0U. Postgres uses ~0U consistently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't got a list of them, sorry, I just point out that you might run into it. If you don't, then it does not make a difference. All modern systems I am aware of use 2-complement encoding, so it should not cause any other issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently this is MISRA-C 2004 rule 12.9 The unary minus operator shall not be applied to an expression whose underlying type is unsigned

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but the standard has changed as well, so I think that in C++20 this is well-defined. 🙄 I just stay clear of these to play it safe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this and all other places as well.

validity_bitmap[n_total / 64] &= tail_mask;
}

uint64 *restrict validity_bitmap = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Suggested change
uint64 *restrict validity_bitmap = NULL;
uint64 * validity_bitmap = NULL;

validity_bitmap[n_total / 64] &= tail_mask;
}

uint64 *restrict validity_bitmap = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uint64 *restrict validity_bitmap = NULL;
uint64 * validity_bitmap = NULL;

@@ -74,13 +74,14 @@ vector_nulltest(const ArrowArray *arrow, int test_type, uint64 *restrict result)
const uint64 *validity = (const uint64 *) arrow->buffers[0];
for (uint16 i = 0; i < bitmap_words; i++)
{
const uint64 validity_word = validity != NULL ? validity[i] : -1ULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const uint64 validity_word = validity != NULL ? validity[i] : -1ULL;
const uint64 validity_word = validity != NULL ? validity[i] : ~0ULL;

memset(validity_bitmap, 0xFF, validity_bitmap_bytes);
if (n_total % 64)
{
const uint64 tail_mask = -1ULL >> (64 - n_total % 64);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const uint64 tail_mask = -1ULL >> (64 - n_total % 64);
const uint64 tail_mask = ~0ULL >> (64 - n_total % 64);

memset(validity_bitmap, 0xFF, validity_bitmap_bytes);
if (n_total % 64)
{
const uint64 tail_mask = -1ULL >> (64 - n_total % 64);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const uint64 tail_mask = -1ULL >> (64 - n_total % 64);
const uint64 tail_mask = ~0ULL >> (64 - n_total % 64);

memset(validity_bitmap, 0xFF, validity_bitmap_bytes);
if (n_total % 64)
{
const uint64 tail_mask = -1ULL >> (64 - n_total % 64);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const uint64 tail_mask = -1ULL >> (64 - n_total % 64);
const uint64 tail_mask = ~0ULL >> (64 - n_total % 64);

Comment on lines 48 to 70

/*
* This loop is not unrolled automatically, so do it manually as usual.
* The value buffer is padded to an even multiple of 64 bytes, i.e. to
* 64 / 4 = 16 elements. The bitmap is an even multiple of 64 elements.
* The number of elements in the inner loop must be less than both these
* values so that we don't go out of bounds. The particular value was
* chosen because it gives some speedup, and the larger values blow up
* the generated code with no performance benefit (checked on clang 16).
*/
#define INNER_LOOP_SIZE 4
const int outer_boundary = pad_to_multiple(INNER_LOOP_SIZE, vector->length);
for (int outer = 0; outer < outer_boundary; outer += INNER_LOOP_SIZE)
for (int row = 0; row < vector->length; row++)
{
for (int inner = 0; inner < INNER_LOOP_SIZE; inner++)
{
const int row = outer + inner;
const int32 arrow_value = ((int32 *) vector->buffers[1])[row];
const bool passes_filter = filter ? arrow_row_is_valid(filter, row) : true;
batch_sum += passes_filter * arrow_value * arrow_row_is_valid(vector->buffers[0], row);
}
const int32 arrow_value = ((int32 *) vector->buffers[1])[row];
batch_sum += arrow_value * arrow_row_is_valid(filter, row) *
arrow_row_is_valid(vector->buffers[0], row);
}
#undef INNER_LOOP_SIZE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code noted that the loop was not unrolled. Did that change? If so, since what version of the compilers? You should probably add a comment about that since it would clarify the situation when compiled with an old version of the compiler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just decided to remove this optimization to simplify things, since it doesn't make a difference for the queries we currently have in the benchmark. I think it makes sense to revisit this when we start optimizing the sum function, because there are surprisingly many implementations possible for it, and this unrolled loop is far from the most optimal anyway. At some other DB engine we benchmarked like twenty of them to get close to a competitor's performance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with removing this optimization for now. We can start with some more serious tweaking later, but it makes sense to add this information to the commit for posterity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.

akuzm added 3 commits August 20, 2024 13:10
The unary minus operator shall not be applied to an expression whose
underlying type is unsigned
@akuzm akuzm enabled auto-merge (squash) August 20, 2024 11:15
@akuzm akuzm merged commit 518cd47 into timescale:main Aug 20, 2024
40 of 43 checks passed
@akuzm akuzm deleted the no-validity branch August 20, 2024 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants