Loop to create pgbackrest process increases S3 API usage #868

bzp2010 · 2024-08-20T14:08:17Z

Report

The upgrade from 2.3.1 to 2.4.1 resulted in a significant increase in API call (especially the list objects API) billing for the S3 service. This is due to a pgbackrest process that someone is constantly creating in the background. No reports were searched.

More about the problem

A few weeks ago I updated my operator deployment from 2.3.1 to 2.4.1, and since then I've noticed that I've been making more calls to my S3 bucket API, which may have increased the monetary cost of the API calls.

So I did some investigating, I checked the logs on the cluster CNI and I found that some processes located in the main database instance were constantly accessing the S3 API.

When I went into the container of the database instance I found some pgbackrest processes being created in a loop and they were using the command pgbackrest info --output=json --repo=1.

I suspected that they were making constant calls to the S3 API, but I didn't know at the time who was starting these processes, so I did a process of elimination, starting with me shutting down operator Pod.

Since then the strange pgbackrest processes have stopped appearing and no program has called the S3 API again, so this proved my suspicions correct.

Investigating the code further, I found that this may be related to the following code, which is apparently constantly making pgbackrest queries in the master database pod via the Kubernetes Exec API.

pgbackrestInfo, err := pgbackrest.GetInfo(ctx, primary, backup.Spec.RepoName)

It has to do with operator's attempts to constantly re-reconcile PGBackup, which, upon checking, happens to have been introduced at the start of 2.4.0, #759. Yes, that fits the guess as well.

The following keeps appearing in the log:

Logs

2024-08-20T13:16:38.840Z	INFO	Starting workers	{"controller": "perconapgrestore", "controllerGroup": "pgv2.percona.com", "controllerKind": "PerconaPGRestore", "worker count": 1}
2024-08-20T13:16:38.840Z	INFO	Starting workers	{"controller": "pgupgrade", "controllerGroup": "postgres-operator.crunchydata.com", "controllerKind": "PGUpgrade", "worker count": 1}
2024-08-20T13:16:38.939Z	INFO	Starting workers	{"controller": "postgrescluster", "controllerGroup": "postgres-operator.crunchydata.com", "controllerKind": "PostgresCluster", "worker count": 2}
2024-08-20T13:16:38.965Z	INFO	Starting workers	{"controller": "pgadmin", "controllerGroup": "postgres-operator.crunchydata.com", "controllerKind": "PGAdmin", "worker count": 1}
2024-08-20T13:16:44.312Z	INFO	Superusers are exposed through PGBouncer	{"controller": "postgrescluster", "controllerGroup": "postgres-operator.crunchydata.com", "controllerKind": "PostgresCluster", "PostgresCluster": {"name":"postgres","namespace":"database"}, "namespace": "database", "name": "postgres", "reconcileID": "37cb444a-893f-4b5d-a95f-d314eafecb12"}
+ 2024-08-20T13:16:52.295Z	INFO	Got latest restorable timestamp	{"controller": "perconapgbackup", "controllerGroup": "pgv2.percona.com", "controllerKind": "PerconaPGBackup", "PerconaPGBackup": {"name":"postgres-repo1-diff-d89bt","namespace":"database"}, "namespace": "database", "name": "postgres-repo1-diff-d89bt", "reconcileID": "5dee8eb6-657b-4be9-a6ee-b9c39a232ec0", "request": {"name":"postgres-repo1-diff-d89bt","namespace":"database"}, "timestamp": "2024-08-20 13:16:20.639199 +0000 UTC"}
2024-08-20T13:16:58.033Z	INFO	Superusers are exposed through PGBouncer	{"controller": "postgrescluster", "controllerGroup": "postgres-operator.crunchydata.com", "controllerKind": "PostgresCluster", "PostgresCluster": {"name":"postgres","namespace":"database"}, "namespace": "database", "name": "postgres", "reconcileID": "9bc26f03-85a3-466d-822a-446ba8f42987"}
+ 2024-08-20T13:17:02.822Z	INFO	Latest commit timestamp	{"controller": "perconapgcluster", "controllerGroup": "pgv2.percona.com", "controllerKind": "PerconaPGCluster", "PerconaPGCluster": {"name":"postgres","namespace":"database"}, "namespace": "database", "name": "postgres", "reconcileID": "2f1a14dd-6132-4de7-a9a5-66ad58d68c1f", "timestamp": "2024-08-20 13:16:20.639199 +0000 UTC", "latestRestorableTime": "2024-08-20 13:03:15.036014 +0000 UTC"}
+ 2024-08-20T13:17:02.822Z	INFO	Triggering PGBackup reconcile	{"controller": "perconapgcluster", "controllerGroup": "pgv2.percona.com", "controllerKind": "PerconaPGCluster", "PerconaPGCluster": {"name":"postgres","namespace":"database"}, "namespace": "database", "name": "postgres", "reconcileID": "2f1a14dd-6132-4de7-a9a5-66ad58d68c1f", "latestBackup": "postgres-repo1-incr-m24n6", "latestRestorableTime": "2024-08-20 13:03:15.036014 +0000 UTC", "latestCommitTimestamp": "2024-08-20 13:16:20.639199 +0000 UTC"}
+ 2024-08-20T13:17:05.848Z	INFO	Got latest restorable timestamp	{"controller": "perconapgbackup", "controllerGroup": "pgv2.percona.com", "controllerKind": "PerconaPGBackup", "PerconaPGBackup": {"name":"postgres-repo1-diff-fjg5l","namespace":"database"}, "namespace": "database", "name": "postgres-repo1-diff-fjg5l", "reconcileID": "e2e1fef5-0034-411c-a356-abef3df2541b", "request": {"name":"postgres-repo1-diff-fjg5l","namespace":"database"}, "timestamp": "2024-08-20 13:16:20.639199 +0000 UTC"}
2024-08-20T13:17:11.291Z	INFO	Superusers are exposed through PGBouncer	{"controller": "postgrescluster", "controllerGroup": "postgres-operator.crunchydata.com", "controllerKind": "PostgresCluster", "PostgresCluster": {"name":"postgres","namespace":"database"}, "namespace": "database", "name": "postgres", "reconcileID": "d3ef925f-41d4-476f-b11e-84560de7313b"}
+ 2024-08-20T13:17:15.067Z	INFO	Latest commit timestamp	{"controller": "perconapgcluster", "controllerGroup": "pgv2.percona.com", "controllerKind": "PerconaPGCluster", "PerconaPGCluster": {"name":"postgres","namespace":"database"}, "namespace": "database", "name": "postgres", "reconcileID": "2f1a14dd-6132-4de7-a9a5-66ad58d68c1f", "timestamp": "2024-08-20 13:16:20.639199 +0000 UTC", "latestRestorableTime": "2024-08-20 13:03:15.036014 +0000 UTC"}
+ 2024-08-20T13:17:15.067Z	INFO	Triggering PGBackup reconcile	{"controller": "perconapgcluster", "controllerGroup": "pgv2.percona.com", "controllerKind": "PerconaPGCluster", "PerconaPGCluster": {"name":"postgres","namespace":"database"}, "namespace": "database", "name": "postgres", "reconcileID": "2f1a14dd-6132-4de7-a9a5-66ad58d68c1f", "latestBackup": "postgres-repo1-incr-m24n6", "latestRestorableTime": "2024-08-20 13:03:15.036014 +0000 UTC", "latestCommitTimestamp": "2024-08-20 13:16:20.639199 +0000 UTC"}
+ 2024-08-20T13:17:18.596Z	INFO	Got latest restorable timestamp	{"controller": "perconapgbackup", "controllerGroup": "pgv2.percona.com", "controllerKind": "PerconaPGBackup", "PerconaPGBackup": {"name":"postgres-repo1-full-647nr","namespace":"database"}, "namespace": "database", "name": "postgres-repo1-full-647nr", "reconcileID": "708d3e2c-28c8-42af-8680-c615cfea55e0", "request": {"name":"postgres-repo1-full-647nr","namespace":"database"}, "timestamp": "2024-08-20 13:16:20.639199 +0000 UTC"}
2024-08-20T13:17:24.754Z	INFO	Superusers are exposed through PGBouncer	{"controller": "postgrescluster", "controllerGroup": "postgres-operator.crunchydata.com", "controllerKind": "PostgresCluster", "PostgresCluster": {"name":"postgres","namespace":"database"}, "namespace": "database", "name": "postgres", "reconcileID": "a3062a24-6b6b-48eb-baf9-ed5764499a93"}
+ 2024-08-20T13:17:27.231Z	INFO	Latest commit timestamp	{"controller": "perconapgcluster", "controllerGroup": "pgv2.percona.com", "controllerKind": "PerconaPGCluster", "PerconaPGCluster": {"name":"postgres","namespace":"database"}, "namespace": "database", "name": "postgres", "reconcileID": "2f1a14dd-6132-4de7-a9a5-66ad58d68c1f", "timestamp": "2024-08-20 13:16:20.639199 +0000 UTC", "latestRestorableTime": "2024-08-20 13:03:15.036014 +0000 UTC"}
+ 2024-08-20T13:17:27.231Z	INFO	Triggering PGBackup reconcile	{"controller": "perconapgcluster", "controllerGroup": "pgv2.percona.com", "controllerKind": "PerconaPGCluster", "PerconaPGCluster": {"name":"postgres","namespace":"database"}, "namespace": "database", "name": "postgres", "reconcileID": "2f1a14dd-6132-4de7-a9a5-66ad58d68c1f", "latestBackup": "postgres-repo1-incr-m24n6", "latestRestorableTime": "2024-08-20 13:03:15.036014 +0000 UTC", "latestCommitTimestamp": "2024-08-20 13:16:20.639199 +0000 UTC"}
+ 2024-08-20T13:17:31.686Z	INFO	Got latest restorable timestamp	{"controller": "perconapgbackup", "controllerGroup": "pgv2.percona.com", "controllerKind": "PerconaPGBackup", "PerconaPGBackup": {"name":"postgres-repo1-incr-lv6dp","namespace":"database"}, "namespace": "database", "name": "postgres-repo1-incr-lv6dp", "reconcileID": "2e9daa86-2df8-4758-ac43-82b5f70c4d0c", "request": {"name":"postgres-repo1-incr-lv6dp","namespace":"database"}, "timestamp": "2024-08-20 13:16:20.639199 +0000 UTC"}
2024-08-20T13:17:38.115Z	INFO	Superusers are exposed through PGBouncer	{"controller": "postgrescluster", "controllerGroup": "postgres-operator.crunchydata.com", "controllerKind": "PostgresCluster", "PostgresCluster": {"name":"postgres","namespace":"database"}, "namespace": "database", "name": "postgres", "reconcileID": "306f8c5b-099b-42aa-9764-cae1c3fd0395"}
+ 2024-08-20T13:17:39.776Z	INFO	Latest commit timestamp	{"controller": "perconapgcluster", "controllerGroup": "pgv2.percona.com", "controllerKind": "PerconaPGCluster", "PerconaPGCluster": {"name":"postgres","namespace":"database"}, "namespace": "database", "name": "postgres", "reconcileID": "2f1a14dd-6132-4de7-a9a5-66ad58d68c1f", "timestamp": "2024-08-20 13:17:20.615232 +0000 UTC", "latestRestorableTime": "2024-08-20 13:03:15.036014 +0000 UTC"}
+ 2024-08-20T13:17:39.776Z	INFO	Triggering PGBackup reconcile	{"controller": "perconapgcluster", "controllerGroup": "pgv2.percona.com", "controllerKind": "PerconaPGCluster", "PerconaPGCluster": {"name":"postgres","namespace":"database"}, "namespace": "database", "name": "postgres", "reconcileID": "2f1a14dd-6132-4de7-a9a5-66ad58d68c1f", "latestBackup": "postgres-repo1-incr-m24n6", "latestRestorableTime": "2024-08-20 13:03:15.036014 +0000 UTC", "latestCommitTimestamp": "2024-08-20 13:17:20.615232 +0000 UTC"}
+ 2024-08-20T13:17:45.036Z	INFO	Got latest restorable timestamp	{"controller": "perconapgbackup", "controllerGroup": "pgv2.percona.com", "controllerKind": "PerconaPGBackup", "PerconaPGBackup": {"name":"postgres-repo1-incr-m24n6","namespace":"database"}, "namespace": "database", "name": "postgres-repo1-incr-m24n6", "reconcileID": "881ef426-eabd-4c37-928c-aebb3bd8221c", "request": {"name":"postgres-repo1-incr-m24n6","namespace":"database"}, "timestamp": "2024-08-20 13:17:20.615232 +0000 UTC"}

They occur very regularly, cyclically, and they are also related to the wal watcher introduced in #759.

I'm wondering if this is indeed as expected (constantly executing pgbackrest info to query backup info from the S3 bucket)?

If this does meet the design intent, how can I turn it off or at least reduce its frequency to keep costs down.

Steps to reproduce

The issue appears to be ingrained in the behavior that is triggered whenever backups are turned on on the 2.4.0+ version of operator.
Without any special triggering method, simply going into the database pod and checking with top reveals that the PID of pgbackrest is incrementing, indicating that the program is being executed in a continuous loop.

So this may not require a step-by-step reproduction step. 🤔

Versions

Kubernetes: 1.30.0
Operator: 2.4.1
Database: PostgreSQL 16

Anything else?

When the operator is shutdown, most of S3 API calls disappear. However, there are still pgbackrest starts with much longer time period.

No response

The text was updated successfully, but these errors were encountered:

seacom-ms · 2024-08-23T05:27:01Z

Same thing happened to me but with Azure backups configured. Converting back to s3 backups resolved it for me.

bzp2010 · 2024-08-23T10:22:42Z

I am using Backblaze Cloud based on S3-compliant API. but I think this has nothing to do with the S3 provider and the issue should happen on every S3 provider.

egegunes · 2024-08-28T13:10:42Z

Hey folks, I created https://perconadev.atlassian.net/browse/K8SPG-630 to let you disable this feature so you can control the API usage. It'll be included in upcoming v2.5.0 release

hors · 2024-08-28T14:20:21Z

@bzp2010 @seacom-ms We run "pgbackrest info --output=json .." be abel to add "latestRestorableTime" into backup object:

apiVersion: pgv2.percona.com/v2
kind: PerconaPGBackup
...
spec:
  options:
  - --type=full
  pgCluster: cluster1
  repoName: repo1
status:
  backupName: 20240828-141255F
  backupType: full
  ....
  jobName: cluster1-backup-2wct
  latestRestorableTime: 2024-08-28 14:15:32.570839+0000

but we understand that we need to have the possibility of disabling it.

bzp2010 · 2024-08-28T14:41:12Z

Thanks a lot! @egegunes @hors
I'm not very well versed in these components, but as a subjective guess, could this step be executed once at the end of each backup, rather than every few seconds? 🤔

egegunes · 2024-08-28T14:52:15Z

@bzp2010 the idea is having a background worker that watches commit timestamps and update the latest successful backup's status so users can understand the latest time they can restore to using this backup. if we do this process after a new backup finishes, I don't think we'll provide much value.

we can also add a field for users to control the period of the watcher, so you can have less frequent checks and therefore lesser API calls.

hors · 2024-08-28T14:56:27Z

@bzp2010 the idea is having a background worker that watches commit timestamps and update the latest successful backup's status so users can understand the latest time they can restore to using this backup. if we do this process after a new backup finishes, I don't think we'll provide much value.

we can also add a field for users to control the period of the watcher, so you can have less frequent checks and therefore lesser API calls.

@bzp2010 It is for PITR, and PpBackrest is uploading WALs into the repo continuously, so we need to update it more than once.

bzp2010 added the bug Something isn't working label Aug 20, 2024

egegunes mentioned this issue Aug 28, 2024

K8SPG-630: Add a field to let users disable latest restorable time tracking #879

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loop to create pgbackrest process increases S3 API usage #868

Loop to create pgbackrest process increases S3 API usage #868

bzp2010 commented Aug 20, 2024 •

edited

Loading

seacom-ms commented Aug 23, 2024

bzp2010 commented Aug 23, 2024

egegunes commented Aug 28, 2024 •

edited

Loading

hors commented Aug 28, 2024

bzp2010 commented Aug 28, 2024

egegunes commented Aug 28, 2024

hors commented Aug 28, 2024

Loop to create pgbackrest process increases S3 API usage #868

Loop to create pgbackrest process increases S3 API usage #868

Comments

bzp2010 commented Aug 20, 2024 • edited Loading

Report

More about the problem

Steps to reproduce

Versions

Anything else?

seacom-ms commented Aug 23, 2024

bzp2010 commented Aug 23, 2024

egegunes commented Aug 28, 2024 • edited Loading

hors commented Aug 28, 2024

bzp2010 commented Aug 28, 2024

egegunes commented Aug 28, 2024

hors commented Aug 28, 2024

bzp2010 commented Aug 20, 2024 •

edited

Loading

egegunes commented Aug 28, 2024 •

edited

Loading