Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize conflict do update (upserts) on Hypercore TAM #7537

Merged
merged 1 commit into from
Dec 18, 2024

Conversation

erimatnor
Copy link
Contributor

@erimatnor erimatnor commented Dec 13, 2024

When a conflict-do-update statement executes on a Hypercore TAM table, the conflicting TID (and slot) can point to a "row" in a compressed segment. Therefore, it is possible to directly decompress the conflicting segment before executing the update statement.

In contrast, the existing on-conflict handling for compression doesn't have a unique index that points to the conflicting tuple, so it first needs to scan for a conflicting segment on the compressed chunk, which might, in some cases, be a full sequence scan on the compressed chunk, and decompression of the segments that might conflict.

On a synthetic test case, the improvements to upserts are significant, about 32x faster execution time for a single row upsert, while buffers read is reduced by 2220x. This makes the upsert performance on par with that of non-compressed hypertables.

Disable-check: force-changelog-file
Disable-check: approval-count

/* Is it guaranteed that "slot" is populated with the conflicting segment
* tuple? Otherwise we need to fetch the compressed segment... Unclear if
* this is ever needed. */
if (TTS_EMPTY(slot) && !table_tuple_fetch_row_version(relation, ctid, snapshot, slot))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure this fetch is needed, hence the comment above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it is guaranteed.

  1. Before calling this function in ExecOnConflictUpdate, table_tuple_lock will be called with the TID of the conflicting tuple.
  2. Since you have a check for !is_compressed_tid(ctid) above, we only need to consider the is_compressed_tid path in hypercore_tuple_lock.
  3. This branch in hypercore_tuple_lock fetches the child slot for the slot passed in and calls heapam_tuple_lock on the compressed relation with the TID for the compressed tuple part of the conflicting TID.
  4. heapam_tuple_lock does a lot of voodoo magic to fetch the heap tuple, but ends with ExecStorePinnedBufferHeapTuple(tuple, slot, buffer) which stores the locked tuple in the slot.
  5. Later in ExecOnConflictUpdate this function is called, which will then contain the compressed tuple with the conflicting tuple.

It might be prudent to keep this check to deal with changes upstreams, but we have this comment in tableam.h:

 * Lock a tuple in the specified mode.
    .
    .
    .
 * Output parameters:
 *	*slot: contains the target tuple
 *	*tmfd: filled in failure cases (see below)

So it seems that there is a guarantee that the slot will contain the target tuple, hence also the compressed tuple with the conflict.

@erimatnor erimatnor force-pushed the hypercore-on-conflict-update branch 2 times, most recently from 6d839eb to 5565ec3 Compare December 13, 2024 16:42
Copy link

codecov bot commented Dec 13, 2024

Codecov Report

Attention: Patch coverage is 84.84848% with 5 lines in your changes missing coverage. Please review.

Project coverage is 82.24%. Comparing base (59f50f2) to head (a81884b).
Report is 659 commits behind head on main.

Files with missing lines Patch % Lines
tsl/src/hypercore/hypercore_handler.c 82.60% 0 Missing and 4 partials ⚠️
src/nodes/chunk_dispatch/chunk_dispatch.c 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7537      +/-   ##
==========================================
+ Coverage   80.06%   82.24%   +2.17%     
==========================================
  Files         190      230      +40     
  Lines       37181    43326    +6145     
  Branches     9450    10898    +1448     
==========================================
+ Hits        29770    35634    +5864     
- Misses       2997     3374     +377     
+ Partials     4414     4318      -96     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@erimatnor erimatnor force-pushed the hypercore-on-conflict-update branch from 5565ec3 to b4db81e Compare December 16, 2024 14:49
@erimatnor erimatnor marked this pull request as ready for review December 16, 2024 14:49
Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few suggestions for improvements to the comments, and I think you can remove the statement you have a comment about, but since neither of these is a reason to block a merge, I approve the pull request.

Comment on lines 2402 to 2427
if (ts_is_hypercore_am(resultRelInfo->ri_RelationDesc->rd_rel->relam))
{
ItemPointerData new_tid;
int ntuples =
ts_cm_functions->hypercore_decompress_conflict_segment(resultRelInfo->ri_RelationDesc,
conflictTid,
existing,
context->estate->es_snapshot,
&new_tid);

if (ntuples > 0)
{
/*
* The conflicting row was decompressed, so must update the
* conflictTid to point to the decompressed row.
*/
ItemPointerCopy(&new_tid, conflictTid);
/*
* Since data was decompressed, the command counter was
* incremented to make it visible. Make sure the executor uses the
* latest command ID to see the changes.
*/
context->estate->es_output_cid = GetCurrentCommandId(true);
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just repeating our discussion to clarify why we need this.

The standard ExecOnConflictUpdate first calls table_tuple_lock with the TID of the tuple that conflicts so in theory we should be able to decompress the batch on a call to lock the tuple. Unfortunately, this do not work since that would change the TID from a compressed TID to an non-compressed TID, and the rest of ExecOnConflictUpdate assumes that it does not change.

So, unfortunately, it seems like we need this code for now.

/* Is it guaranteed that "slot" is populated with the conflicting segment
* tuple? Otherwise we need to fetch the compressed segment... Unclear if
* this is ever needed. */
if (TTS_EMPTY(slot) && !table_tuple_fetch_row_version(relation, ctid, snapshot, slot))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it is guaranteed.

  1. Before calling this function in ExecOnConflictUpdate, table_tuple_lock will be called with the TID of the conflicting tuple.
  2. Since you have a check for !is_compressed_tid(ctid) above, we only need to consider the is_compressed_tid path in hypercore_tuple_lock.
  3. This branch in hypercore_tuple_lock fetches the child slot for the slot passed in and calls heapam_tuple_lock on the compressed relation with the TID for the compressed tuple part of the conflicting TID.
  4. heapam_tuple_lock does a lot of voodoo magic to fetch the heap tuple, but ends with ExecStorePinnedBufferHeapTuple(tuple, slot, buffer) which stores the locked tuple in the slot.
  5. Later in ExecOnConflictUpdate this function is called, which will then contain the compressed tuple with the conflicting tuple.

It might be prudent to keep this check to deal with changes upstreams, but we have this comment in tableam.h:

 * Lock a tuple in the specified mode.
    .
    .
    .
 * Output parameters:
 *	*slot: contains the target tuple
 *	*tmfd: filled in failure cases (see below)

So it seems that there is a guarantee that the slot will contain the target tuple, hence also the compressed tuple with the conflict.

/*
* Decompress a segment that contains the row given by ctid.
*
* This function is typically called during an upsert (ON CONFLICT DO UPDATE),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it is only called in this situation.

Suggested change
* This function is typically called during an upsert (ON CONFLICT DO UPDATE),
* This function is called during an upsert (ON CONFLICT DO UPDATE),

tsl/src/hypercore/hypercore_handler.c Outdated Show resolved Hide resolved
tsl/src/hypercore/hypercore_handler.c Outdated Show resolved Hide resolved
tsl/src/hypercore/hypercore_handler.c Show resolved Hide resolved
tsl/src/hypercore/hypercore_handler.c Show resolved Hide resolved
@erimatnor erimatnor force-pushed the hypercore-on-conflict-update branch 2 times, most recently from 7792505 to a86bc2b Compare December 18, 2024 12:27
When a conflict-do-update statement executes on a Hypercore TAM table,
the conflicting TID (and slot) can point to a "row" in a compressed
segment. Therefore, it is possible to directly decompress the
conflicting segment before executing the update statement.

In contrast, the existing on-conflict handling for compression doesn't
have a unique index that points to the conflicting tuple, so it first
needs to scan for a conflicting segment on the compressed chunk, which
might, in some cases, be a full sequence scan on the compressed chunk,
and decompression of the segments that might conflict.

On a synthetic test case, the improvements to upserts are significant,
about 32x faster execution time for a single row upsert, while buffers
read is reduced by 2220x. This makes the upsert performance on par
with that of non-compressed hypertables.
@erimatnor erimatnor force-pushed the hypercore-on-conflict-update branch from a86bc2b to a81884b Compare December 18, 2024 12:35
@erimatnor erimatnor requested a review from mkindahl December 18, 2024 12:35
@erimatnor erimatnor enabled auto-merge (rebase) December 18, 2024 12:48
@erimatnor erimatnor merged commit 93a5a92 into timescale:main Dec 18, 2024
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants