Null validity bitmap in ArrowArray #7199

akuzm · 2024-08-14T19:55:32Z

This is allowed when we have no nulls, simplifies the code a little, and reduces memory usage.

No changes in tsbench: https://grafana.ops.savannah-dev.timescale.com/d/fasYic_4z/compare-akuzm?orgId=1&var-branch=All&var-run1=3642&var-run2=3678&var-threshold=0&var-use_historical_thresholds=true&var-threshold_expression=2.5%20%2A%20percentile_cont%280.90%29&var-exact_suite_version=false&from=now-2d&to=now

Disable-check: force-changelog-file

This is allowed when we have no nulls, simplifies the code a little, and reduces memory usage.

mkindahl · 2024-08-15T07:22:04Z

This looks like a good improvement and is also supported by the Arrow Columnar Format:

Arrays having a 0 null count may choose to not allocate the validity bitmap; how this is represented depends on the implementation (for example, a C++ implementation may represent such an “absent” validity bitmap using a NULL pointer). Implementations may choose to always allocate a validity bitmap anyway as a matter of convenience. Consumers of Arrow arrays should be ready to handle those two possibilities.

codecov · 2024-08-15T09:08:05Z

Codecov Report

Attention: Patch coverage is 94.33962% with 3 lines in your changes missing coverage. Please review.

Project coverage is 81.78%. Comparing base (59f50f2) to head (792c097).
Report is 298 commits behind head on main.

Files	Patch %	Lines
tsl/src/compression/algorithms/array.c	85.71%	0 Missing and 1 partial ⚠️
tsl/src/nodes/decompress_chunk/compressed_batch.c	75.00%	0 Missing and 1 partial ⚠️
tsl/src/nodes/vector_agg/exec.c	66.66%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7199      +/-   ##
==========================================
+ Coverage   80.06%   81.78%   +1.72%     
==========================================
  Files         190      204      +14     
  Lines       37181    38121     +940     
  Branches     9450     9886     +436     
==========================================
+ Hits        29770    31179    +1409     
+ Misses       2997     2964      -33     
+ Partials     4414     3978     -436

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mkindahl

Looks good. Some minor nits that does not affect the correctness of the patch for any normal platform, and it would be good to add a comment on from what compiler version that the loops start being unrolled.

mkindahl · 2024-08-15T11:44:31Z

tsl/src/compression/algorithms/array.c

-		validity_bitmap[n_total / 64] &= tail_mask;
-	}
-
+	uint64 *restrict validity_bitmap = NULL;


Nit: do you need the restrict keyword here? AFAICT, it is not used in a manner where aliasing can be a problem. I know this is just copied code, but I think it is not needed in the old location either.

Yes, it is required. We're reading from another array and writing into this one in a loop, so good to signal that they are not aliasing.

Maybe I am not understanding the situation correctly, but the restrict keyword is only relevant if you are calling functions in different translation units and hence do not have knowledge of the aliasing situation.

The compiler is fully capable of understanding the situation inside this function and see that there are no aliasing situation.

An example of this is are the memcpy function, which use the restrict keyword to signal to the compiler that it does not allow same pointer to be passed in. Note that memset does not have the restrict keyword since a single pointer cannot cause such a situation.

It does not make any practical difference though, so you can keep it if you're unsure about if it has an effect, but I cannot see how it would be required.

The compiler is fully capable of understanding the situation inside this function and see that there are no aliasing situation.

Why? PG and Timescale are compiled with -fno-strict-aliasing, are you sure that uint64 * and bool * cannot alias with this flag? I think it's easier to slap restrict on everything that's not const than try to figure it out for each case.

mkindahl · 2024-08-15T11:46:59Z

tsl/src/compression/algorithms/deltadelta_impl.c

-	}
-
-	/* Now move the data to account for nulls, and fill the validity bitmap. */
+	uint64 *restrict validity_bitmap = NULL;


See above.

Suggested change

uint64 *restrict validity_bitmap = NULL;

uint64 * validity_bitmap = NULL;

mkindahl · 2024-08-15T11:52:16Z

tsl/src/compression/algorithms/deltadelta_impl.c

+		memset(validity_bitmap, 0xFF, validity_bitmap_bytes);
+		if (n_total % 64)
+		{
+			const uint64 tail_mask = -1ULL >> (64 - n_total % 64);


Nit: -1ULL can raise warnings with some static checkers. better to use ~0ULL if you want to have a "all ones" integer.

Which ones, can we add them to the CI? We have -1ULL in several other places as well and no ~0U. Postgres uses ~0U consistently.

I haven't got a list of them, sorry, I just point out that you might run into it. If you don't, then it does not make a difference. All modern systems I am aware of use 2-complement encoding, so it should not cause any other issues.

Apparently this is MISRA-C 2004 rule 12.9 The unary minus operator shall not be applied to an expression whose underlying type is unsigned

Yeah, but the standard has changed as well, so I think that in C++20 this is well-defined. 🙄 I just stay clear of these to play it safe.

Changed this and all other places as well.

mkindahl · 2024-08-15T11:53:19Z

tsl/src/compression/algorithms/dictionary.c

-		validity_bitmap[n_total / 64] &= tail_mask;
-	}
-
+	uint64 *restrict validity_bitmap = NULL;


See above.

Suggested change

uint64 *restrict validity_bitmap = NULL;

uint64 * validity_bitmap = NULL;

mkindahl · 2024-08-15T11:53:47Z

tsl/src/compression/algorithms/gorilla_impl.c

-		validity_bitmap[n_total / 64] &= tail_mask;
-	}
-
+	uint64 *restrict validity_bitmap = NULL;


Suggested change

uint64 *restrict validity_bitmap = NULL;

uint64 * validity_bitmap = NULL;

mkindahl · 2024-08-15T11:55:35Z

tsl/src/nodes/decompress_chunk/vector_predicates.c

@@ -74,13 +74,14 @@ vector_nulltest(const ArrowArray *arrow, int test_type, uint64 *restrict result)
 	const uint64 *validity = (const uint64 *) arrow->buffers[0];
 	for (uint16 i = 0; i < bitmap_words; i++)
 	{
+		const uint64 validity_word = validity != NULL ? validity[i] : -1ULL;


Suggested change

const uint64 validity_word = validity != NULL ? validity[i] : -1ULL;

const uint64 validity_word = validity != NULL ? validity[i] : ~0ULL;

mkindahl · 2024-08-15T12:28:15Z

tsl/src/compression/algorithms/array.c

+		memset(validity_bitmap, 0xFF, validity_bitmap_bytes);
+		if (n_total % 64)
+		{
+			const uint64 tail_mask = -1ULL >> (64 - n_total % 64);


Suggested change

const uint64 tail_mask = -1ULL >> (64 - n_total % 64);

const uint64 tail_mask = ~0ULL >> (64 - n_total % 64);

mkindahl · 2024-08-15T12:28:46Z

tsl/src/compression/algorithms/dictionary.c

+		memset(validity_bitmap, 0xFF, validity_bitmap_bytes);
+		if (n_total % 64)
+		{
+			const uint64 tail_mask = -1ULL >> (64 - n_total % 64);


Suggested change

const uint64 tail_mask = -1ULL >> (64 - n_total % 64);

const uint64 tail_mask = ~0ULL >> (64 - n_total % 64);

mkindahl · 2024-08-15T12:29:09Z

tsl/src/compression/algorithms/gorilla_impl.c

+		memset(validity_bitmap, 0xFF, validity_bitmap_bytes);
+		if (n_total % 64)
+		{
+			const uint64 tail_mask = -1ULL >> (64 - n_total % 64);


Suggested change

const uint64 tail_mask = -1ULL >> (64 - n_total % 64);

const uint64 tail_mask = ~0ULL >> (64 - n_total % 64);

mkindahl · 2024-08-15T12:36:03Z

tsl/src/nodes/vector_agg/functions.c

-
-	/*
-	 * This loop is not unrolled automatically, so do it manually as usual.
-	 * The value buffer is padded to an even multiple of 64 bytes, i.e. to
-	 * 64 / 4 = 16 elements. The bitmap is an even multiple of 64 elements.
-	 * The number of elements in the inner loop must be less than both these
-	 * values so that we don't go out of bounds. The particular value was
-	 * chosen because it gives some speedup, and the larger values blow up
-	 * the generated code with no performance benefit (checked on clang 16).
-	 */
-#define INNER_LOOP_SIZE 4
-	const int outer_boundary = pad_to_multiple(INNER_LOOP_SIZE, vector->length);
-	for (int outer = 0; outer < outer_boundary; outer += INNER_LOOP_SIZE)
+	for (int row = 0; row < vector->length; row++)
 	{
-		for (int inner = 0; inner < INNER_LOOP_SIZE; inner++)
-		{
-			const int row = outer + inner;
-			const int32 arrow_value = ((int32 *) vector->buffers[1])[row];
-			const bool passes_filter = filter ? arrow_row_is_valid(filter, row) : true;
-			batch_sum += passes_filter * arrow_value * arrow_row_is_valid(vector->buffers[0], row);
-		}
+		const int32 arrow_value = ((int32 *) vector->buffers[1])[row];
+		batch_sum += arrow_value * arrow_row_is_valid(filter, row) *
+					 arrow_row_is_valid(vector->buffers[0], row);
 	}
-#undef INNER_LOOP_SIZE


The old code noted that the loop was not unrolled. Did that change? If so, since what version of the compilers? You should probably add a comment about that since it would clarify the situation when compiled with an old version of the compiler.

I just decided to remove this optimization to simplify things, since it doesn't make a difference for the queries we currently have in the benchmark. I think it makes sense to revisit this when we start optimizing the sum function, because there are surprisingly many implementations possible for it, and this unrolled loop is far from the most optimal anyway. At some other DB engine we benchmarked like twenty of them to get close to a competitor's performance.

I'm fine with removing this optimization for now. We can start with some more serious tweaking later, but it makes sense to add this information to the commit for posterity.

Added a comment.

The unary minus operator shall not be applied to an expression whose underlying type is unsigned

Null validity bitmap in ArrowArray

f2b7ed3

This is allowed when we have no nulls, simplifies the code a little, and reduces memory usage.

akuzm mentioned this pull request Aug 14, 2024

Implement more vectorized aggregate functions #7200

Merged

2 tasks

typo

a1d9e0e

akuzm marked this pull request as ready for review August 15, 2024 09:03

mkindahl approved these changes Aug 15, 2024

View reviewed changes

svenklemm approved these changes Aug 19, 2024

View reviewed changes

akuzm added 3 commits August 20, 2024 13:10

MISRA-C 2004 rule 12.9

26eda39

The unary minus operator shall not be applied to an expression whose underlying type is unsigned

Merge remote-tracking branch 'origin/main' into HEAD

283a9eb

comment

792c097

akuzm enabled auto-merge (squash) August 20, 2024 11:15

akuzm merged commit 518cd47 into timescale:main Aug 20, 2024
40 of 43 checks passed

akuzm deleted the no-validity branch August 20, 2024 11:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Null validity bitmap in ArrowArray #7199

Null validity bitmap in ArrowArray #7199

akuzm commented Aug 14, 2024 •

edited

Loading

mkindahl commented Aug 15, 2024

codecov bot commented Aug 15, 2024 •

edited

Loading

mkindahl left a comment

mkindahl Aug 15, 2024

akuzm Aug 15, 2024

mkindahl Aug 15, 2024

akuzm Aug 15, 2024

mkindahl Aug 15, 2024

mkindahl Aug 15, 2024

akuzm Aug 15, 2024

mkindahl Aug 15, 2024

akuzm Aug 15, 2024

mkindahl Aug 15, 2024

akuzm Aug 20, 2024

mkindahl Aug 15, 2024

mkindahl Aug 15, 2024

mkindahl Aug 15, 2024

mkindahl Aug 15, 2024

mkindahl Aug 15, 2024

mkindahl Aug 15, 2024

mkindahl Aug 15, 2024

akuzm Aug 15, 2024

mkindahl Aug 15, 2024

akuzm Aug 20, 2024

	uint64 *restrict validity_bitmap = NULL;
	uint64 * validity_bitmap = NULL;

	const uint64 validity_word = validity != NULL ? validity[i] : -1ULL;
	const uint64 validity_word = validity != NULL ? validity[i] : ~0ULL;

	const uint64 tail_mask = -1ULL >> (64 - n_total % 64);
	const uint64 tail_mask = ~0ULL >> (64 - n_total % 64);

Null validity bitmap in ArrowArray #7199

Null validity bitmap in ArrowArray #7199

Conversation

akuzm commented Aug 14, 2024 • edited Loading

mkindahl commented Aug 15, 2024

codecov bot commented Aug 15, 2024 • edited Loading

Codecov Report

mkindahl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akuzm commented Aug 14, 2024 •

edited

Loading

codecov bot commented Aug 15, 2024 •

edited

Loading