Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduces a faster query to tuple GC #1859

Merged
merged 1 commit into from
Apr 10, 2024
Merged

introduces a faster query to tuple GC #1859

merged 1 commit into from
Apr 10, 2024

Conversation

vroldanbet
Copy link
Contributor

we've noticed that even after #1550 was introduced and made the nested GC SELECT query faster, deletions are still causing spikes in postgres
and Aurora instances.

The proposed query seemed 1 order of magnitude less cost that the current query on a test PG 15.6 with 5M dead tuples and 5M alive tuples.

Before

WITH rows AS (SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, created_xid, deleted_xid FROM relation_tuple WHERE deleted_xid < '901' LIMIT 1000)
DELETE FROM relation_tuple
WHERE (namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, created_xid, deleted_xid) IN (SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, created_xid, deleted_xid FROM rows);
| -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Delete on relation_tuple  (cost=65.94..8656.38 rows=0 width=0) (actual time=14.375..14.376 rows=0 loops=1)                                                                                                                                                                                                                                                                                                     |
|   ->  Nested Loop  (cost=65.94..8656.38 rows=1000 width=110) (actual time=1.100..14.026 rows=1000 loops=1)                                                                                                                                                                                                                                                                                                     |
|         ->  HashAggregate  (cost=65.38..75.38 rows=1000 width=184) (actual time=1.061..1.319 rows=1000 loops=1)                                                                                                                                                                                                                                                                                                |
|               Group Key: (rows.namespace)::text, (rows.object_id)::text, (rows.relation)::text, (rows.userset_namespace)::text, (rows.userset_object_id)::text, (rows.userset_relation)::text, rows.created_xid, rows.deleted_xid                                                                                                                                                                              |
|               Batches: 1  Memory Usage: 841kB                                                                                                                                                                                                                                                                                                                                                                  |
|               ->  Subquery Scan on rows  (cost=0.43..45.38 rows=1000 width=184) (actual time=0.150..0.647 rows=1000 loops=1)                                                                                                                                                                                                                                                                                   |
|                     ->  Limit  (cost=0.43..35.38 rows=1000 width=80) (actual time=0.143..0.488 rows=1000 loops=1)                                                                                                                                                                                                                                                                                              |
|                           ->  Index Scan using ix_gc_index on relation_tuple relation_tuple_1  (cost=0.43..178606.61 rows=5110010 width=80) (actual time=0.142..0.444 rows=1000 loops=1)                                                                                                                                                                                                                       |
|                                 Index Cond: (deleted_xid < '901'::xid8)                                                                                                                                                                                                                                                                                                                                        |
|         ->  Index Scan using uq_relation_tuple_living_xid on relation_tuple  (cost=0.56..8.57 rows=1 width=86) (actual time=0.012..0.012 rows=1 loops=1000)                                                                                                                                                                                                                                                    |
|               Index Cond: (((namespace)::text = (rows.namespace)::text) AND ((object_id)::text = (rows.object_id)::text) AND ((relation)::text = (rows.relation)::text) AND ((userset_namespace)::text = (rows.userset_namespace)::text) AND ((userset_object_id)::text = (rows.userset_object_id)::text) AND ((userset_relation)::text = (rows.userset_relation)::text) AND (deleted_xid = rows.deleted_xid)) |
|               Filter: (rows.created_xid = created_xid)                                                                                                                                                                                                                                                                                                                                                         |
| Planning Time: 0.674 ms                                                                                                                                                                                                                                                                                                                                                                                        |
| Execution Time: 14.464 ms                                                                                                                                                                                                                                                                                                                                                                                      |

After

WITH rows AS (SELECT ctid FROM relation_tuple WHERE deleted_xid < '901' LIMIT 1000)
DELETE FROM relation_tuple
WHERE ctid IN (SELECT ctid FROM rows);
| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Delete on relation_tuple  (cost=47.89..854.38 rows=0 width=0) (actual time=3.598..3.598 rows=0 loops=1)                                                                                 |
|   ->  Nested Loop  (cost=47.89..854.38 rows=1000 width=36) (actual time=2.682..3.237 rows=1000 loops=1)                                                                                 |
|         ->  HashAggregate  (cost=47.88..49.88 rows=200 width=36) (actual time=2.679..2.758 rows=1000 loops=1)                                                                           |
|               Group Key: rows.ctid                                                                                                                                                      |
|               Batches: 1  Memory Usage: 209kB                                                                                                                                           |
|               ->  Subquery Scan on rows  (cost=0.43..45.38 rows=1000 width=36) (actual time=0.193..2.503 rows=1000 loops=1)                                                             |
|                     ->  Limit  (cost=0.43..35.38 rows=1000 width=6) (actual time=0.183..2.406 rows=1000 loops=1)                                                                        |
|                           ->  Index Scan using ix_gc_index on relation_tuple relation_tuple_1  (cost=0.43..178606.61 rows=5110010 width=6) (actual time=0.182..2.363 rows=1000 loops=1) |
|                                 Index Cond: (deleted_xid < '901'::xid8)                                                                                                                 |
|         ->  Tid Scan on relation_tuple  (cost=0.00..4.01 rows=1 width=6) (actual time=0.000..0.000 rows=1 loops=1000)                                                                   |
|               TID Cond: (ctid = rows.ctid)                                                                                                                                              |
| Planning Time: 3.335 ms                                                                                                                                                                 |
| Execution Time: 3.706 ms                                                                                                                                                                |

@github-actions github-actions bot added the area/datastore Affects the storage system label Apr 8, 2024
@vroldanbet
Copy link
Contributor Author

vroldanbet commented Apr 9, 2024

I was investigating if ctid is stable within a CTE, which led me to this post that describes a gotcha on partitioned tables:

There actually is a serious pitfall with using ctid for identifying rows to delete: in partitioned tables, ctid is not unique. It is only unique in a given partition so, if one uses it as a criterion for deletion, this will likely result in deleting the desired row as well as the otherwise totally unrelated rows from other partitions (ask me how I know :-/ ).

I ran the query with tableoid and it seems to work just fine, it simply adds another filter phase which shouldn't add much overhead unless the same CTID is found in multiple tables. So I think this approach is good seems fine.

Delete on relation_tuple  (cost=72.12..879.11 rows=0 width=0) (actual time=1.427..1.428 rows=0 loops=1)
"  ->  Nested Loop  (cost=72.12..879.11 rows=500 width=40) (actual time=0.569..1.157 rows=1000 loops=1)"
"        ->  HashAggregate  (cost=72.11..74.11 rows=200 width=44) (actual time=0.565..0.648 rows=1000 loops=1)"
"              Group Key: rows.tableoid, rows.ctid"
"              Batches: 1  Memory Usage: 209kB"
"              ->  Subquery Scan on rows  (cost=0.00..67.11 rows=1000 width=44) (actual time=0.024..0.407 rows=1000 loops=1)"
"                    ->  Limit  (cost=0.00..57.11 rows=1000 width=10) (actual time=0.020..0.315 rows=1000 loops=1)"
"                          ->  Seq Scan on relation_tuple relation_tuple_1  (cost=0.00..229045.70 rows=4010286 width=10) (actual time=0.020..0.272 rows=1000 loops=1)"
"                                Filter: (deleted_xid < '901'::xid8)"
"        ->  Tid Scan on relation_tuple  (cost=0.00..4.02 rows=1 width=10) (actual time=0.000..0.000 rows=1 loops=1000)"
"              TID Cond: (ctid = rows.ctid)"
"              Filter: (rows.tableoid = tableoid)"
Planning Time: 0.106 ms
Execution Time: 1.470 ms

@vroldanbet vroldanbet marked this pull request as ready for review April 10, 2024 10:53
@vroldanbet vroldanbet requested a review from a team April 10, 2024 10:53
@vroldanbet vroldanbet self-assigned this Apr 10, 2024
@josephschorr
Copy link
Member

@vroldanbet Is there a test we can add for GC along a partitioned table to ensure the use of the table ID fixes this potential issue?

@vroldanbet
Copy link
Contributor Author

@vroldanbet Is there a test we can add for GC along a partitioned table to ensure the use of the table ID fixes this potential issue?

We cannot control tableoid and ctid, those are internal columns. From the docs:

What the docs say about tableoid:

The OID of the table containing this row. This column is particularly handy for queries that select from partitioned tables or inheritance hierarchies, since without it, it's difficult to tell which individual table a row came from. The tableoid can be joined against the oid column of pg_class to obtain the table name.

What the docs say about ctid:

The physical location of the row version within its table. Note that although the ctid can be used to locate the row version very quickly, a row's ctid will change if it is updated or moved by VACUUM FULL. Therefore ctid is useless as a long-term row identifier. A primary key should be used to identify logical rows.

Testing a scenario where two different relationships to be GCed have the same ctid would require us to:

  • add a new type of test suite that partitions the main tables defined in the datastore. Something we should do at some point, but should be probably tackled into its own PR
  • generate a state of the database where 2 CTIDs are the same across partitions. I'm not sure how likely that is to happen without understanding how it's implemented. Docs say its "the location in the table", so it does not sound like "randomly generated". We'd have to create a test where we have a revision marked as deleted in one partition and a revision that is alive in another partition, both with the same CTID.

My interpretation of the document is that tableoid, ctid provides a globally unique internal identification to the row.

we've noticed that even after #1550
was introduced and made the nested GC SELECT query faster,
deletions are still causing spikes in postgres
and Aurora instances.

The proposed query seemed 1 order of magnitude less cost
that the current query on a test PG 15.6 with 5M dead
tuples and 5M alive tuples.
Copy link
Member

@josephschorr josephschorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vroldanbet vroldanbet added this pull request to the merge queue Apr 10, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 10, 2024
@vroldanbet vroldanbet added this pull request to the merge queue Apr 10, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 10, 2024
@vroldanbet vroldanbet added this pull request to the merge queue Apr 10, 2024
Merged via the queue into main with commit 095ffd3 Apr 10, 2024
22 checks passed
@vroldanbet vroldanbet deleted the faster-postgres-gc branch April 10, 2024 17:00
@github-actions github-actions bot locked and limited conversation to collaborators Apr 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/datastore Affects the storage system
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants