perf(postgres): optimize the expired rows cleanup routine in postgres connector #10331

windmgc · 2023-02-21T06:30:27Z

Summary

This PR tries to optimize the expired rows cleanup routine in the Postgres strategy connector.

Currently, the cleanup logic is managed by a timer.every, which will run multiple DELETE sub-queries in one big query and delete those rows that got expired. Concerns have been raised about such queries may lead to high load on database if the number of expired rows is large. The idea originates from FTI-4746 and KAG-516, but the code that got changed in this PR may not be directly relevant to the root cause of the production issue. The logic itself is definitely worth some optimizing after all.

This PR brings the following changes to the cleanup logic:

Makes the cleanup interval configurable. A new config parameter pg_expired_rows_cleanup_interval is introduced to control the interval of triggering the timer.
A cluster mutex is added to ensure that in a "traditional" deployment cluster, only one cleanup will be running at the same time.
The big query which contains multiple DELETE sub-queries has been split into multiple queries, with more obvious logging, so that if any errors are encountered, the table which caused such an error can be located easily. And each query will use its own connection so that if any of them got timed out on Kong's side(for example, waiting for locks in the Pg but the connection itself is timed out), the following cleanup will not be affected.
Debug logging which shows the total time usage of each table is added.
Batch delete is implemented in every DELETE query to avoid letting extremely slow query consuming database resources.

Checklist

The Pull Request has tests
There's an entry in the CHANGELOG
There is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HERE

Full changelog

Optimize the expired rows cleanup routine in Postgres connector.
Add a new config parameter pg_expired_rows_cleanup_interval to control the interval of expired rows cleanup routine.

Issue reference

liverpool8056 · 2023-02-23T02:39:50Z

kong/db/strategies/postgres/connector.lua

+  local connector = connect(config)
+
+  local cleanup_start_time = now_updated()
+  local ttl_escaped = connector:escape_identifier("ttl")


Would it be better if we define ttl_escaped as a global variable so that we won't have to run escape_identifier every time for the same result.
By the way, personally I don't think either ttl or expired_at need to be escaped, neither of them is a keyword in current postgres-sql syntax.

The escape_identifier depends on creating a Postgres connector in pgmoon(which makes it feel weird to reuse at module level), and it is merely just string manipulation so the cost is small, so I think it's okay to keep the current state

liverpool8056 · 2023-02-23T02:40:06Z

kong/db/strategies/postgres/connector.lua

+    connector:escape_identifier(table_name),
+    " WHERE ",
+    column_name,
+    " < TO_TIMESTAMP(%s) AT TIME ZONE 'UTC' LIMIT ",


Would it be helpful if we force db to use index here? I think it might not be better, but at least it wouldn't be worse than seq scan when using index.

I did a google and found no way of using force index in Postgres... it seems that this is not doable
SELECTing a batch of rows here should be fine since we already have ttl index for these table, so let's just leave the job for query engine to decide whether it should use index

kong/db/strategies/postgres/connector.lua

liverpool8056 · 2023-02-23T02:41:18Z

kong/db/strategies/postgres/connector.lua

+  local time_elapsed = tonumber(fmt("%.3f", cleanup_end_time - cleanup_start_time))
+  log(DEBUG, "cleaning up expired rows from table '", table_name,
+            "' took ", time_elapsed, " seconds")
+  connector:disconnect()


If there is a connection pool and the usage of connection pool could be in our control, we should try to use it rather than disconnecting. Processing connection is a expensive action for DB.

I chose to create a new connection for each table here intentionally, to avoid problems caused by connection reuse(such as a former connection waiting for locks and timed-out, re-use on this connection will be problematic, please check OpenResty's tcpsocket.receive document on read timeout)

Since this timer will only run on worker 0 and a cluster mutex is also added in this PR, the number of new concurrent connections running cleanup job in this cluster will be 1 constantly, so I think there is also no problem with creating too many connections

log - add cluster mutex to limit concurrency across traditional cluster

bungle · 2023-02-28T10:35:40Z

Question: what if we drop the Postgres cleanup routine altogether, and instead create Postgres triggers on entities that have ttl=true? We could run the trigger AFTER INSERT. I think that would be a better solution. This issue only affects traditional deployments. Very little the Control Plane and none the Data Planes or DBless.

The ttl fields on all tables (where schema says ttl=true) need proper indexes as they are also used with SELECTs.

windmgc · 2023-02-28T13:20:54Z

Hmm I'm a bit concerned about the AFTER INSERT trigger(if we're going to add a DELETE * instead of a batch deleting), it could also lead to more concurrent queries and high resource usage, right? Say there is a user's scenario that oauth2_tokens are created frequently, since hybrid mode cannot be used under the oauth2 case, there will still be some users having large traditional clusters deployed.

But yes, dropping the background timer and adding a trigger(and batch deleting in the function perhaps?) is more straightforward, we could re-consider this option after the CP-DP refactor thing happens. What do you think about it? @bungle

windmgc · 2023-02-28T13:35:28Z

The ttl fields on all tables (where schema says ttl=true) need proper indexes as they are also used with SELECTs.

@bungle Could you please show me where can I change to make this guarantee? I can confirm that all these ttl enabled tables have a proper index on ttl fields in the migration files, but I don't know if there is any DAO mechanics that can guarantee the schema field is indexed.

bungle · 2023-02-28T13:42:35Z

@bungle Could you please show me where can I change to make this guarantee?

Yes, just by looking our Postgres schemas that ttl=true tables have the index. If they don't we need to add index for ttl. We have added them during the years, but there might still be some entities that are missing Postgres index.

bungle · 2023-02-28T13:52:07Z

Hmm I'm a bit concerned about the AFTER INSERT trigger(if we're going to add a DELETE * instead of a batch deleting)

The batch deleting as you added here is basically Postgres feature. Thus it is easy to add it in trigger delete too.

it could also lead to more concurrent queries and high resource usage, right? Say there is a user's scenario that oauth2_tokens are created frequently, since hybrid mode cannot be used under the oauth2 case, there will still be some users having large traditional clusters deployed.

We can tune this inside trigger. E.g. delete only sometimes etc.

What I mean that the current logic is very complicated as in this PR. I think in most cases what we already had is better than what we have here. So then we have probably a single pain point that is oauth2 tokens. Perhaps oauth2 tokens ttling should be implemented differently while the rest are fine with current or triggers. Perhaps triggers can be made to work with oauth2 tokens problem too.

… listeners ### Summary PR #10389 and #10331 pursued different ways to improve our situation with cleanup timers. This PR makes it so that it only happens on nodes that have admin listeners. Fairly simple but may already cover the majority of problem space.

… listeners (#10405) ### Summary PR #10389 and #10331 pursued different ways to improve our situation with cleanup timers. This PR makes it so that it only happens on nodes that have admin listeners. Fairly simple but may already cover the majority of problem space.

### Summary The PR #10405 changed cleanup to only happen on nodes that have Admin API listeners. This PR makes deletion to happen in maximum of 50.000 row batches. Inspired from #10331 and #10389.

windmgc · 2023-03-01T14:50:10Z

superseded by #10389 , after discussion

* feat(db): batch cleanup expired rows from postgres ### Summary The PR #10405 changed cleanup to only happen on nodes that have Admin API listeners. This PR makes deletion to happen in maximum of 50.000 row batches. Inspired from #10331 and #10389. * change to batch delete on every table * update changelog * add test for ttl cleanup --------- Co-authored-by: windmgc <windmgc@gmail.com>

pull-request-size bot added the size/L label Feb 21, 2023

github-actions bot added core/configuration core/db core/templates labels Feb 21, 2023

liverpool8056 reviewed Feb 23, 2023

View reviewed changes

windmgc force-pushed the perf-improve-ttl-cleanup branch 2 times, most recently from 0087b53 to 447be41 Compare February 27, 2023 15:30

windmgc added 8 commits February 27, 2023 23:34

make ttl cleanup timer configurable

96e028c

- split cleanup table by table so that table name can be shown in error

6828a9b

log - add cluster mutex to limit concurrency across traditional cluster

fix cluster_events expire_at field name

a58fdae

add debug log for time elapsed

9b2cc51

create new connection on each table cleanup

ff84908

delete batch by batch and add max loop limit

ef7ee41

optimize last loop check

538ec3f

add related test

0b431f9

windmgc force-pushed the perf-improve-ttl-cleanup branch from 447be41 to 0b431f9 Compare February 27, 2023 15:34

function exposed to module level for testing purpose

f1a03ec

windmgc marked this pull request as ready for review February 27, 2023 16:16

windmgc added the pr/please review label Feb 27, 2023

bungle mentioned this pull request Feb 28, 2023

feat(db): add postgres triggers to cleanup expired rows on ttl-enabled schema tables #10389

Merged

bungle mentioned this pull request Mar 1, 2023

chore(db): only enable postgres cleanup timer on nodes that have admin listeners #10405

Merged

bungle mentioned this pull request Mar 1, 2023

feat(db): batch cleanup expired rows from postgres #10407

Merged

windmgc closed this Mar 1, 2023

hbagdi deleted the perf-improve-ttl-cleanup branch March 22, 2023 22:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(postgres): optimize the expired rows cleanup routine in postgres connector #10331

perf(postgres): optimize the expired rows cleanup routine in postgres connector #10331

windmgc commented Feb 21, 2023 •

edited by jira bot

Loading

liverpool8056 Feb 23, 2023

windmgc Feb 27, 2023 •

edited

Loading

liverpool8056 Feb 23, 2023

windmgc Feb 27, 2023

liverpool8056 Feb 23, 2023

windmgc Feb 27, 2023 •

edited

Loading

bungle commented Feb 28, 2023 •

edited

Loading

windmgc commented Feb 28, 2023

windmgc commented Feb 28, 2023 •

edited

Loading

bungle commented Feb 28, 2023

bungle commented Feb 28, 2023

windmgc commented Mar 1, 2023 •

edited

Loading

perf(postgres): optimize the expired rows cleanup routine in postgres connector #10331

perf(postgres): optimize the expired rows cleanup routine in postgres connector #10331

Conversation

windmgc commented Feb 21, 2023 • edited by jira bot Loading

Summary

Checklist

Full changelog

Issue reference

liverpool8056 Feb 23, 2023

Choose a reason for hiding this comment

windmgc Feb 27, 2023 • edited Loading

Choose a reason for hiding this comment

liverpool8056 Feb 23, 2023

Choose a reason for hiding this comment

windmgc Feb 27, 2023

Choose a reason for hiding this comment

liverpool8056 Feb 23, 2023

Choose a reason for hiding this comment

windmgc Feb 27, 2023 • edited Loading

Choose a reason for hiding this comment

bungle commented Feb 28, 2023 • edited Loading

windmgc commented Feb 28, 2023

windmgc commented Feb 28, 2023 • edited Loading

bungle commented Feb 28, 2023

bungle commented Feb 28, 2023

windmgc commented Mar 1, 2023 • edited Loading

windmgc commented Feb 21, 2023 •

edited by jira bot

Loading

windmgc Feb 27, 2023 •

edited

Loading

windmgc Feb 27, 2023 •

edited

Loading

bungle commented Feb 28, 2023 •

edited

Loading

windmgc commented Feb 28, 2023 •

edited

Loading

windmgc commented Mar 1, 2023 •

edited

Loading