Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(postgres): optimize the expired rows cleanup routine in postgres connector #10331

Closed
wants to merge 9 commits into from

Conversation

windmgc
Copy link
Member

@windmgc windmgc commented Feb 21, 2023

Summary

This PR tries to optimize the expired rows cleanup routine in the Postgres strategy connector.

Currently, the cleanup logic is managed by a timer.every, which will run multiple DELETE sub-queries in one big query and delete those rows that got expired. Concerns have been raised about such queries may lead to high load on database if the number of expired rows is large. The idea originates from FTI-4746 and KAG-516, but the code that got changed in this PR may not be directly relevant to the root cause of the production issue. The logic itself is definitely worth some optimizing after all.

This PR brings the following changes to the cleanup logic:

  • Makes the cleanup interval configurable. A new config parameter pg_expired_rows_cleanup_interval is introduced to control the interval of triggering the timer.
  • A cluster mutex is added to ensure that in a "traditional" deployment cluster, only one cleanup will be running at the same time.
  • The big query which contains multiple DELETE sub-queries has been split into multiple queries, with more obvious logging, so that if any errors are encountered, the table which caused such an error can be located easily. And each query will use its own connection so that if any of them got timed out on Kong's side(for example, waiting for locks in the Pg but the connection itself is timed out), the following cleanup will not be affected.
  • Debug logging which shows the total time usage of each table is added.
  • Batch delete is implemented in every DELETE query to avoid letting extremely slow query consuming database resources.

Checklist

Full changelog

  • Optimize the expired rows cleanup routine in Postgres connector.
  • Add a new config parameter pg_expired_rows_cleanup_interval to control the interval of expired rows cleanup routine.

Issue reference

local connector = connect(config)

local cleanup_start_time = now_updated()
local ttl_escaped = connector:escape_identifier("ttl")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better if we define ttl_escaped as a global variable so that we won't have to run escape_identifier every time for the same result.
By the way, personally I don't think either ttl or expired_at need to be escaped, neither of them is a keyword in current postgres-sql syntax.

Copy link
Member Author

@windmgc windmgc Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The escape_identifier depends on creating a Postgres connector in pgmoon(which makes it feel weird to reuse at module level), and it is merely just string manipulation so the cost is small, so I think it's okay to keep the current state

connector:escape_identifier(table_name),
" WHERE ",
column_name,
" < TO_TIMESTAMP(%s) AT TIME ZONE 'UTC' LIMIT ",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be helpful if we force db to use index here? I think it might not be better, but at least it wouldn't be worse than seq scan when using index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a google and found no way of using force index in Postgres... it seems that this is not doable
SELECTing a batch of rows here should be fine since we already have ttl index for these table, so let's just leave the job for query engine to decide whether it should use index

kong/db/strategies/postgres/connector.lua Outdated Show resolved Hide resolved
local time_elapsed = tonumber(fmt("%.3f", cleanup_end_time - cleanup_start_time))
log(DEBUG, "cleaning up expired rows from table '", table_name,
"' took ", time_elapsed, " seconds")
connector:disconnect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a connection pool and the usage of connection pool could be in our control, we should try to use it rather than disconnecting. Processing connection is a expensive action for DB.

Copy link
Member Author

@windmgc windmgc Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to create a new connection for each table here intentionally, to avoid problems caused by connection reuse(such as a former connection waiting for locks and timed-out, re-use on this connection will be problematic, please check OpenResty's tcpsocket.receive document on read timeout)

Since this timer will only run on worker 0 and a cluster mutex is also added in this PR, the number of new concurrent connections running cleanup job in this cluster will be 1 constantly, so I think there is also no problem with creating too many connections

@windmgc windmgc force-pushed the perf-improve-ttl-cleanup branch 2 times, most recently from 0087b53 to 447be41 Compare February 27, 2023 15:30
@windmgc windmgc marked this pull request as ready for review February 27, 2023 16:16
@bungle
Copy link
Member

bungle commented Feb 28, 2023

Question: what if we drop the Postgres cleanup routine altogether, and instead create Postgres triggers on entities that have ttl=true? We could run the trigger AFTER INSERT. I think that would be a better solution. This issue only affects traditional deployments. Very little the Control Plane and none the Data Planes or DBless.

The ttl fields on all tables (where schema says ttl=true) need proper indexes as they are also used with SELECTs.

@windmgc
Copy link
Member Author

windmgc commented Feb 28, 2023

Hmm I'm a bit concerned about the AFTER INSERT trigger(if we're going to add a DELETE * instead of a batch deleting), it could also lead to more concurrent queries and high resource usage, right? Say there is a user's scenario that oauth2_tokens are created frequently, since hybrid mode cannot be used under the oauth2 case, there will still be some users having large traditional clusters deployed.

But yes, dropping the background timer and adding a trigger(and batch deleting in the function perhaps?) is more straightforward, we could re-consider this option after the CP-DP refactor thing happens. What do you think about it? @bungle

@windmgc
Copy link
Member Author

windmgc commented Feb 28, 2023

The ttl fields on all tables (where schema says ttl=true) need proper indexes as they are also used with SELECTs.

@bungle Could you please show me where can I change to make this guarantee? I can confirm that all these ttl enabled tables have a proper index on ttl fields in the migration files, but I don't know if there is any DAO mechanics that can guarantee the schema field is indexed.

@bungle
Copy link
Member

bungle commented Feb 28, 2023

@bungle Could you please show me where can I change to make this guarantee?

Yes, just by looking our Postgres schemas that ttl=true tables have the index. If they don't we need to add index for ttl. We have added them during the years, but there might still be some entities that are missing Postgres index.

@bungle
Copy link
Member

bungle commented Feb 28, 2023

Hmm I'm a bit concerned about the AFTER INSERT trigger(if we're going to add a DELETE * instead of a batch deleting)

The batch deleting as you added here is basically Postgres feature. Thus it is easy to add it in trigger delete too.

it could also lead to more concurrent queries and high resource usage, right? Say there is a user's scenario that oauth2_tokens are created frequently, since hybrid mode cannot be used under the oauth2 case, there will still be some users having large traditional clusters deployed.

We can tune this inside trigger. E.g. delete only sometimes etc.

What I mean that the current logic is very complicated as in this PR. I think in most cases what we already had is better than what we have here. So then we have probably a single pain point that is oauth2 tokens. Perhaps oauth2 tokens ttling should be implemented differently while the rest are fine with current or triggers. Perhaps triggers can be made to work with oauth2 tokens problem too.

bungle added a commit that referenced this pull request Mar 1, 2023
… listeners

### Summary

PR #10389 and #10331 pursued different ways to improve our situation with cleanup timers.

This PR makes it so that it only happens on nodes that have admin listeners. Fairly simple
but may already cover the majority of problem space.
hanshuebner pushed a commit that referenced this pull request Mar 1, 2023
… listeners (#10405)

### Summary

PR #10389 and #10331 pursued different ways to improve our situation with cleanup timers.

This PR makes it so that it only happens on nodes that have admin listeners. Fairly simple
but may already cover the majority of problem space.
bungle added a commit that referenced this pull request Mar 1, 2023
### Summary

The PR #10405 changed cleanup to only happen on nodes that have Admin API listeners.

This PR makes deletion to happen in maximum of 50.000 row batches.

Inspired from #10331 and #10389.
@windmgc
Copy link
Member Author

windmgc commented Mar 1, 2023

superseded by #10389 , after discussion

@windmgc windmgc closed this Mar 1, 2023
hanshuebner pushed a commit that referenced this pull request Mar 3, 2023
* feat(db): batch cleanup expired rows from postgres

### Summary

The PR #10405 changed cleanup to only happen on nodes that have Admin API listeners.

This PR makes deletion to happen in maximum of 50.000 row batches.

Inspired from #10331 and #10389.

* change to batch delete on every table

* update changelog

* add test for ttl cleanup

---------

Co-authored-by: windmgc <windmgc@gmail.com>
@hbagdi hbagdi deleted the perf-improve-ttl-cleanup branch March 22, 2023 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants