-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loop to create pgbackrest process increases S3 API usage #868
Comments
Same thing happened to me but with Azure backups configured. Converting back to s3 backups resolved it for me. |
I am using Backblaze Cloud based on S3-compliant API. but I think this has nothing to do with the S3 provider and the issue should happen on every S3 provider. |
Hey folks, I created https://perconadev.atlassian.net/browse/K8SPG-630 to let you disable this feature so you can control the API usage. It'll be included in upcoming v2.5.0 release |
@bzp2010 @seacom-ms We run "pgbackrest info --output=json .." be abel to add "latestRestorableTime" into backup object:
but we understand that we need to have the possibility of disabling it. |
@bzp2010 the idea is having a background worker that watches commit timestamps and update the latest successful backup's status so users can understand the latest time they can restore to using this backup. if we do this process after a new backup finishes, I don't think we'll provide much value. we can also add a field for users to control the period of the watcher, so you can have less frequent checks and therefore lesser API calls. |
@bzp2010 It is for PITR, and PpBackrest is uploading WALs into the repo continuously, so we need to update it more than once. |
Report
The upgrade from
2.3.1
to2.4.1
resulted in a significant increase in API call (especially thelist objects
API) billing for the S3 service. This is due to a pgbackrest process that someone is constantly creating in the background. No reports were searched.More about the problem
A few weeks ago I updated my operator deployment from
2.3.1
to2.4.1
, and since then I've noticed that I've been making more calls to my S3 bucket API, which may have increased the monetary cost of the API calls.So I did some investigating, I checked the logs on the cluster CNI and I found that some processes located in the main database instance were constantly accessing the S3 API.
When I went into the container of the database instance I found some pgbackrest processes being created in a loop and they were using the command
pgbackrest info --output=json --repo=1
.I suspected that they were making constant calls to the S3 API, but I didn't know at the time who was starting these processes, so I did a process of elimination, starting with me shutting down operator Pod.
Since then the strange pgbackrest processes have stopped appearing and no program has called the S3 API again, so this proved my suspicions correct.
Investigating the code further, I found that this may be related to the following code, which is apparently constantly making pgbackrest queries in the master database pod via the Kubernetes Exec API.
pgbackrestInfo, err := pgbackrest.GetInfo(ctx, primary, backup.Spec.RepoName)
It has to do with operator's attempts to constantly re-reconcile PGBackup, which, upon checking, happens to have been introduced at the start of
2.4.0
, #759. Yes, that fits the guess as well.The following keeps appearing in the log:
Logs
They occur very regularly, cyclically, and they are also related to the wal watcher introduced in #759.
I'm wondering if this is indeed as expected (constantly executing pgbackrest info to query backup info from the S3 bucket)?
If this does meet the design intent, how can I turn it off or at least reduce its frequency to keep costs down.
Steps to reproduce
The issue appears to be ingrained in the behavior that is triggered whenever backups are turned on on the 2.4.0+ version of operator.
Without any special triggering method, simply going into the database pod and checking with top reveals that the PID of pgbackrest is incrementing, indicating that the program is being executed in a continuous loop.
So this may not require a step-by-step reproduction step. 🤔
Versions
1.30.0
2.4.1
PostgreSQL 16
Anything else?
When the operator is shutdown, most of S3 API calls disappear. However, there are still pgbackrest starts with much longer time period.
No response
The text was updated successfully, but these errors were encountered: