Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make sure to suspend backups if the instance is hibernated #875

Merged
merged 4 commits into from
Jul 11, 2024

Conversation

nhudson
Copy link
Collaborator

@nhudson nhudson commented Jul 10, 2024

Add some logic to patch the ScheduledBackup to suspend backups for paused instances. This causes quite a bit of contention in the CNPG operator if you do not. Basically reconciling the backups for all paused instances every 30 seconds.

{"level":"info","ts":"2024-07-10T20:56:40Z","msg":"Couldn't find target pod, will retry in 30 seconds","controller":"backup","controllerGroup":"postgresql.cnpg.io","controllerKind":"Backup","Backup":{"name":"org-goracz-inst-instance-1-20240630101000","namespace":"org-goracz-inst-instance-1"},"namespace":"org-goracz-inst-instance-1","name":"org-goracz-inst-instance-1-20240630101000","reconcileID":"0188b21b-9fb8-45cf-98db-f6dcf31f42ef","uuid":"e7ba2595-3efe-11ef-8d3a-724d5e302e50","target":"org-goracz-inst-instance-1-1"}

Fixes: CLOUD-1009

@@ -123,7 +124,7 @@ pub(crate) async fn update_coredb_status(
.await
}

// patch_cluster_merge takes a CoreDB, Cluster and serde_json::Value and patch merges the Cluster with the new spec
// patch_cluster_merge takes a CoreDB, context and serde_json::Value and patch merges the Cluster with the new spec
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing comment

@@ -187,7 +213,7 @@ pub(crate) fn update_restarted_at(
restart_annotation_updated
}

// patch_cluster is a async function that takes a CNPG cluster and patch applys it with the new spec
// patch_cluster is a async function that takes a CNPG cluster and patch applies it with the new spec
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct spelling

@@ -484,7 +484,7 @@ async fn get_trunk_project_version(
// Check if version is provided in cdb.spec.trunk_installs
for trunk_install in &cdb.spec.trunk_installs {
if trunk_install.name == trunk_project_name {
trunk_project_version = trunk_install.version.clone();
trunk_project_version.clone_from(&trunk_install.version);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flx clippy warning

@@ -529,7 +529,7 @@ mod tests {

#[tokio::test]
async fn test_get_latest_trunk_project_version() {
let result = get_latest_trunk_project_metadata("pgmq".into()).await;
let result = get_latest_trunk_project_metadata("pgmq").await;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix clippy warning

Copy link
Contributor

@bonesmoses bonesmoses left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work for now. Eventually we should look into submitting a PR to CNPG to skip the backup reconciliation if an instance is hibernated, but not if the backup type is snapshot, since that seems to be something they want to allow.

Comment on lines 179 to 184
patch_scheduled_backup_merge(cdb, ctx, patch_scheduled_backup_spec.clone()).await?;
info!(
"Toggled scheduled backup suspend of {} to '{}'",
name, scheduled_backup_value
);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is redundant.

Comment on lines 150 to 157
if !scheduled_backup_suspend_status && cdb.spec.stop {
patch_scheduled_backup_merge(cdb, ctx, patch_scheduled_backup_spec.clone()).await?;
info!(
"Toggled scheduled backup suspend of {} to '{}'",
name, scheduled_backup_value
);
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part should handle the case where instances are already paused but the scheduled backup is not suspended.

This will not un-suspend the backups though.

@nhudson nhudson merged commit c248fd4 into main Jul 11, 2024
9 checks passed
@nhudson nhudson deleted the nhudson/CLOUD-1009 branch July 11, 2024 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants