Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MGMT-17893: Don't destroy cluster on detach #6532

Conversation

jhernand
Copy link
Contributor

@jhernand jhernand commented Jul 4, 2024

Currently the procedure to detach a cluster that was created using hosts from a late binding pool is to first delete the ManagedCluster object, then add the preserveOnDelete: true field to the ClusterDeployment and then delete that ClusterDeployment. But the cluster deployment controller doesn't look at the preserveOnDelete field at all. As a result the hosts of the cluster are returned to the pool and they are provisioned again, which effectively destroys the cluster.

This patch changes the deployment manager controller so that in that case it will check the preserveOnDelete field and if it is true it will delete the corresponding Agent objects. The result of that is that the hosts will go back to the pool, but marked the will still be marked as provisioned and they will not be provisioned again, thus avoiding the destruction of the cluster.

Note that the BareMetalHosts will not be removed, but they will stay detached.

Related: https://issues.redhat.com/browse/MGMT-17893

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Tested manually with MCE 2.5.4, replacing the assisted service image with a modified one containing this change.

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

@jhernand jhernand closed this Jul 5, 2024
@jhernand jhernand reopened this Jul 5, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 5, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jul 5, 2024

@jhernand: This pull request references MGMT-17893 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to this:

Currently the procedure to detach a cluster that was created using hosts from a late binding pool is to first delete the ManagedCluster object, then add the preserveOnDelete: true field to the ClusterDeployment and then delete that ClusterDeployment. But the cluster deployment controller doesn't look at the preserveOnDelete field at all. As a result the hosts of the cluster are returned to the pool and they are provisioned again, which effectively destroys the cluster.

This patch changes the deployment manager controller so that in that case it will check the preserveOnDelete field and if it is true it will delete the corresponding Agent objects. The result of that is that the hosts will go back to the pool, but marked the will still be marked as provisioned and they will not be provisioned again, thus avoiding the destruction of the cluster.

Note that the BareMetalHosts will not be removed, but they will stay detached.

Related: https://issues.redhat.com/browse/MGMT-17893

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Tested manually with MCE 2.5.4, replacing the assisted service image with a modified one containing this change.

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 5, 2024
Copy link

codecov bot commented Jul 5, 2024

Codecov Report

Attention: Patch coverage is 53.33333% with 7 lines in your changes missing coverage. Please review.

Project coverage is 68.42%. Comparing base (b7116f3) to head (3ee8313).
Report is 3 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6532      +/-   ##
==========================================
- Coverage   68.43%   68.42%   -0.01%     
==========================================
  Files         247      247              
  Lines       36416    36425       +9     
==========================================
+ Hits        24920    24923       +3     
- Misses       9290     9294       +4     
- Partials     2206     2208       +2     
Files Coverage Δ
...oller/controllers/clusterdeployments_controller.go 71.86% <53.33%> (-0.17%) ⬇️

... and 1 file with indirect coverage changes

@jhernand jhernand closed this Jul 5, 2024
@jhernand jhernand reopened this Jul 5, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jul 5, 2024

@jhernand: This pull request references MGMT-17893 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to this:

Currently the procedure to detach a cluster that was created using hosts from a late binding pool is to first delete the ManagedCluster object, then add the preserveOnDelete: true field to the ClusterDeployment and then delete that ClusterDeployment. But the cluster deployment controller doesn't look at the preserveOnDelete field at all. As a result the hosts of the cluster are returned to the pool and they are provisioned again, which effectively destroys the cluster.

This patch changes the deployment manager controller so that in that case it will check the preserveOnDelete field and if it is true it will delete the corresponding Agent objects. The result of that is that the hosts will go back to the pool, but marked the will still be marked as provisioned and they will not be provisioned again, thus avoiding the destruction of the cluster.

Note that the BareMetalHosts will not be removed, but they will stay detached.

Related: https://issues.redhat.com/browse/MGMT-17893

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Tested manually with MCE 2.5.4, replacing the assisted service image with a modified one containing this change.

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jhernand
Copy link
Contributor Author

jhernand commented Jul 8, 2024

/retest

Copy link
Member

@carbonin carbonin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a few minor test questions.

Comment on lines 4213 to 4215
TypeMeta: metav1.TypeMeta{
Kind: "InfraEnv",
APIVersion: fmt.Sprintf("%s/%s", aiv1beta1.GroupVersion.Group, aiv1beta1.GroupVersion.Version),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You generally shouldn't need to set TypeMeta like this. Is there a reason you're doing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy & paste, from the test above. Will remove it.

Currently the procedure to detach a cluster that was created using hosts
from a late binding pool is to first delete the `ManagedCluster` object,
then add the `preserveOnDelete: true` field to the `ClusterDeployment`
and then delete that `ClusterDeployment`. But the cluster deployment
controller doesn't look at the `preserveOnDelete` field at all. As a
result the hosts of the cluster are returned to the pool and they are
provisioned again, which effectively destroys the cluster.

This patch changes the deployment manager controller so that in that
case it will check the `preserveOnDelete` field and if it is `true` it
will delete the corresponding `Agent` objects. The result of that is
that the hosts will go back to the pool, but marked the will still be
marked as provisioned and they will not be provisioned again, thus
avoiding the destruction of the cluster.

Note that the `BareMetalHosts` will not be removed, but they will stay
detached.

Related: https://issues.redhat.com/browse/MGMT-17893
Signed-off-by: Juan Hernandez <juan.hernandez@redhat.com>
@jhernand jhernand force-pushed the dont_destroy_cluster_on_detach branch from 6374327 to 3ee8313 Compare July 9, 2024 19:18
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 9, 2024
Copy link

openshift-ci bot commented Jul 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: carbonin, jhernand

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

openshift-ci bot commented Jul 9, 2024

@jhernand: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit d2bba26 into openshift:master Jul 9, 2024
16 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-agent-installer-api-server-container-v4.17.0-202407100011.p0.gd2bba26.assembly.stream.el9 for distgit ose-agent-installer-api-server.
All builds following this will include this PR.

@jhernand jhernand deleted the dont_destroy_cluster_on_detach branch July 10, 2024 08:39
@jhernand
Copy link
Contributor Author

/cherry-pick release-ocm-2.11

@openshift-cherrypick-robot

@jhernand: new pull request created: #6556

In response to this:

/cherry-pick release-ocm-2.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants