Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check agent deployment based on Ready condition #591

Merged

Conversation

Danil-Grigorev
Copy link
Contributor

@Danil-Grigorev Danil-Grigorev commented Jul 8, 2024

What this PR does / why we need it:

status.agentDeployed and a corresponding condition in the management v3 cluster is not reflecting if the agent manifests are actually there at all times. If the old manifests are getting removed by cluster re-import or migration, new manifests can be purged by this as well. Ready condition should be better reflecting if turtles need to stop applying manifests.

Rancher is performing cleanup job with: https://github.com/rancher/rancher/blob/1c2b815c8dd66e141b21630904127adc8e03c3de/pkg/controllers/management/usercontrollers/controller.go#L289

We need to check for the job existence before performing import operation, and wait for the job and previous manifests to be gone if those are present.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #587

Special notes for your reviewer:

Checklist:

  • squashed commits into logical changes
  • includes documentation
  • adds unit tests
  • adds or updates e2e tests

@Danil-Grigorev Danil-Grigorev requested a review from a team as a code owner July 8, 2024 13:15
@Danil-Grigorev Danil-Grigorev added the kind/bug Something isn't working label Jul 8, 2024
Copy link
Contributor

@salasberryfin salasberryfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Danil-Grigorev. Just added a nit to align log message with new condition. Otherwise, LGTM.

internal/controllers/import_controller.go Outdated Show resolved Hide resolved
internal/controllers/import_controller_v3.go Outdated Show resolved Hide resolved
Copy link
Member

@alexander-demicev alexander-demicev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Danil-Grigorev
Copy link
Contributor Author

Seems like neither status.ready is reflecting the state of the agent. Only condition for provisioningv1 and in managementv3 seems to be up-to-date:

  - lastUpdateTime: "2024-07-08T13:41:50Z"
    message: Cluster agent is not connected
    reason: Disconnected
    status: "False"
    type: Ready
  fleetWorkspaceName: creategitops-lfbrqx
  observedGeneration: 2
  ready: true

@Danil-Grigorev Danil-Grigorev changed the title Check agent deployment based on status.ready and ready condition Check agent deployment based on Ready condition Jul 8, 2024
@Danil-Grigorev Danil-Grigorev force-pushed the agent-deployed-based-on-ready branch 2 times, most recently from 4d00729 to 323b721 Compare July 9, 2024 11:48
- Prevent executing cluster import until the child cluster manifests are
  fully removed.
- e2e: Rely on crust-gather for logs collection
- Check for cleanup job before import

Signed-off-by: Danil-Grigorev <danil.grigorev@suse.com>
@alexander-demicev alexander-demicev merged commit 0795c35 into rancher:main Jul 9, 2024
16 checks passed
@Danil-Grigorev Danil-Grigorev mentioned this pull request Jul 16, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Status.AgentDeployed does not reflect actual agent deployment status at all times
3 participants