Optimize conflict do update (upserts) on Hypercore TAM #7537

erimatnor · 2024-12-13T16:38:06Z

When a conflict-do-update statement executes on a Hypercore TAM table, the conflicting TID (and slot) can point to a "row" in a compressed segment. Therefore, it is possible to directly decompress the conflicting segment before executing the update statement.

In contrast, the existing on-conflict handling for compression doesn't have a unique index that points to the conflicting tuple, so it first needs to scan for a conflicting segment on the compressed chunk, which might, in some cases, be a full sequence scan on the compressed chunk, and decompression of the segments that might conflict.

On a synthetic test case, the improvements to upserts are significant, about 32x faster execution time for a single row upsert, while buffers read is reduced by 2220x. This makes the upsert performance on par with that of non-compressed hypertables.

Disable-check: force-changelog-file
Disable-check: approval-count

erimatnor · 2024-12-13T16:39:48Z

tsl/src/hypercore/hypercore_handler.c

+	/* Is it guaranteed that "slot" is populated with the conflicting segment
+	 * tuple?  Otherwise we need to fetch the compressed segment... Unclear if
+	 * this is ever needed. */
+	if (TTS_EMPTY(slot) && !table_tuple_fetch_row_version(relation, ctid, snapshot, slot))


I am not 100% sure this fetch is needed, hence the comment above.

It looks like it is guaranteed.

Before calling this function in ExecOnConflictUpdate, table_tuple_lock will be called with the TID of the conflicting tuple.

Since you have a check for !is_compressed_tid(ctid) above, we only need to consider the is_compressed_tid path in hypercore_tuple_lock.

This branch in hypercore_tuple_lock fetches the child slot for the slot passed in and calls heapam_tuple_lock on the compressed relation with the TID for the compressed tuple part of the conflicting TID.

heapam_tuple_lock does a lot of voodoo magic to fetch the heap tuple, but ends with ExecStorePinnedBufferHeapTuple(tuple, slot, buffer) which stores the locked tuple in the slot.

Later in ExecOnConflictUpdate this function is called, which will then contain the compressed tuple with the conflicting tuple.

It might be prudent to keep this check to deal with changes upstreams, but we have this comment in tableam.h:

* Lock a tuple in the specified mode. . . . * Output parameters: * *slot: contains the target tuple * *tmfd: filled in failure cases (see below)

So it seems that there is a guarantee that the slot will contain the target tuple, hence also the compressed tuple with the conflict.

codecov · 2024-12-13T16:53:53Z

Codecov Report

Attention: Patch coverage is 84.84848% with 5 lines in your changes missing coverage. Please review.

Project coverage is 82.24%. Comparing base (59f50f2) to head (a81884b).
Report is 659 commits behind head on main.

Files with missing lines	Patch %	Lines
tsl/src/hypercore/hypercore_handler.c	82.60%	0 Missing and 4 partials ⚠️
src/nodes/chunk_dispatch/chunk_dispatch.c	75.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7537      +/-   ##
==========================================
+ Coverage   80.06%   82.24%   +2.17%     
==========================================
  Files         190      230      +40     
  Lines       37181    43326    +6145     
  Branches     9450    10898    +1448     
==========================================
+ Hits        29770    35634    +5864     
- Misses       2997     3374     +377     
+ Partials     4414     4318      -96

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mkindahl

I have a few suggestions for improvements to the comments, and I think you can remove the statement you have a comment about, but since neither of these is a reason to block a merge, I approve the pull request.

mkindahl · 2024-12-13T18:16:14Z

src/nodes/hypertable_modify.c

+	if (ts_is_hypercore_am(resultRelInfo->ri_RelationDesc->rd_rel->relam))
+	{
+		ItemPointerData new_tid;
+		int ntuples =
+			ts_cm_functions->hypercore_decompress_conflict_segment(resultRelInfo->ri_RelationDesc,
+																   conflictTid,
+																   existing,
+																   context->estate->es_snapshot,
+																   &new_tid);
+
+		if (ntuples > 0)
+		{
+			/*
+			 * The conflicting row was decompressed, so must update the
+			 * conflictTid to point to the decompressed row.
+			 */
+			ItemPointerCopy(&new_tid, conflictTid);
+			/*
+			 * Since data was decompressed, the command counter was
+			 * incremented to make it visible. Make sure the executor uses the
+			 * latest command ID to see the changes.
+			 */
+			context->estate->es_output_cid = GetCurrentCommandId(true);
+		}
+	}
+


Just repeating our discussion to clarify why we need this.

The standard ExecOnConflictUpdate first calls table_tuple_lock with the TID of the tuple that conflicts so in theory we should be able to decompress the batch on a call to lock the tuple. Unfortunately, this do not work since that would change the TID from a compressed TID to an non-compressed TID, and the rest of ExecOnConflictUpdate assumes that it does not change.

So, unfortunately, it seems like we need this code for now.

mkindahl · 2024-12-17T13:18:30Z

tsl/src/hypercore/hypercore_handler.c

+	/* Is it guaranteed that "slot" is populated with the conflicting segment
+	 * tuple?  Otherwise we need to fetch the compressed segment... Unclear if
+	 * this is ever needed. */
+	if (TTS_EMPTY(slot) && !table_tuple_fetch_row_version(relation, ctid, snapshot, slot))


It looks like it is guaranteed.

Before calling this function in ExecOnConflictUpdate, table_tuple_lock will be called with the TID of the conflicting tuple.

Since you have a check for !is_compressed_tid(ctid) above, we only need to consider the is_compressed_tid path in hypercore_tuple_lock.

This branch in hypercore_tuple_lock fetches the child slot for the slot passed in and calls heapam_tuple_lock on the compressed relation with the TID for the compressed tuple part of the conflicting TID.

heapam_tuple_lock does a lot of voodoo magic to fetch the heap tuple, but ends with ExecStorePinnedBufferHeapTuple(tuple, slot, buffer) which stores the locked tuple in the slot.

Later in ExecOnConflictUpdate this function is called, which will then contain the compressed tuple with the conflicting tuple.

It might be prudent to keep this check to deal with changes upstreams, but we have this comment in tableam.h:

* Lock a tuple in the specified mode. . . . * Output parameters: * *slot: contains the target tuple * *tmfd: filled in failure cases (see below)

So it seems that there is a guarantee that the slot will contain the target tuple, hence also the compressed tuple with the conflict.

mkindahl · 2024-12-17T13:27:44Z

tsl/src/hypercore/hypercore_handler.c

+/*
+ * Decompress a segment that contains the row given by ctid.
+ *
+ * This function is typically called during an upsert (ON CONFLICT DO UPDATE),


Nit: it is only called in this situation.

Suggested change

* This function is typically called during an upsert (ON CONFLICT DO UPDATE),

* This function is called during an upsert (ON CONFLICT DO UPDATE),

tsl/src/hypercore/hypercore_handler.c

When a conflict-do-update statement executes on a Hypercore TAM table, the conflicting TID (and slot) can point to a "row" in a compressed segment. Therefore, it is possible to directly decompress the conflicting segment before executing the update statement. In contrast, the existing on-conflict handling for compression doesn't have a unique index that points to the conflicting tuple, so it first needs to scan for a conflicting segment on the compressed chunk, which might, in some cases, be a full sequence scan on the compressed chunk, and decompression of the segments that might conflict. On a synthetic test case, the improvements to upserts are significant, about 32x faster execution time for a single row upsert, while buffers read is reduced by 2220x. This makes the upsert performance on par with that of non-compressed hypertables.

erimatnor added upsert table-access-method hypercore labels Dec 13, 2024

erimatnor requested a review from mkindahl December 13, 2024 16:38

erimatnor commented Dec 13, 2024

View reviewed changes

erimatnor force-pushed the hypercore-on-conflict-update branch 2 times, most recently from 6d839eb to 5565ec3 Compare December 13, 2024 16:42

erimatnor force-pushed the hypercore-on-conflict-update branch from 5565ec3 to b4db81e Compare December 16, 2024 14:49

erimatnor marked this pull request as ready for review December 16, 2024 14:49

fabriziomello assigned erimatnor Dec 16, 2024

mkindahl reviewed Dec 17, 2024

View reviewed changes

erimatnor force-pushed the hypercore-on-conflict-update branch 2 times, most recently from 7792505 to a86bc2b Compare December 18, 2024 12:27

erimatnor force-pushed the hypercore-on-conflict-update branch from a86bc2b to a81884b Compare December 18, 2024 12:35

erimatnor requested a review from mkindahl December 18, 2024 12:35

mkindahl approved these changes Dec 18, 2024

View reviewed changes

erimatnor enabled auto-merge (rebase) December 18, 2024 12:48

erimatnor merged commit 93a5a92 into timescale:main Dec 18, 2024
50 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize conflict do update (upserts) on Hypercore TAM #7537

Optimize conflict do update (upserts) on Hypercore TAM #7537

erimatnor commented Dec 13, 2024 •

edited

Loading

erimatnor Dec 13, 2024

mkindahl Dec 17, 2024

codecov bot commented Dec 13, 2024 •

edited

Loading

mkindahl left a comment

mkindahl Dec 13, 2024

mkindahl Dec 17, 2024

mkindahl Dec 17, 2024

	* This function is typically called during an upsert (ON CONFLICT DO UPDATE),
	* This function is called during an upsert (ON CONFLICT DO UPDATE),

Optimize conflict do update (upserts) on Hypercore TAM #7537

Optimize conflict do update (upserts) on Hypercore TAM #7537

Conversation

erimatnor commented Dec 13, 2024 • edited Loading

erimatnor Dec 13, 2024

Choose a reason for hiding this comment

mkindahl Dec 17, 2024

Choose a reason for hiding this comment

codecov bot commented Dec 13, 2024 • edited Loading

Codecov Report

mkindahl left a comment

Choose a reason for hiding this comment

mkindahl Dec 13, 2024

Choose a reason for hiding this comment

mkindahl Dec 17, 2024

Choose a reason for hiding this comment

mkindahl Dec 17, 2024

Choose a reason for hiding this comment

erimatnor commented Dec 13, 2024 •

edited

Loading

codecov bot commented Dec 13, 2024 •

edited

Loading