Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status.AgentDeployed does not reflect actual agent deployment status at all times #587

Closed
Danil-Grigorev opened this issue Jun 28, 2024 · 0 comments · Fixed by #591
Closed
Assignees

Comments

@Danil-Grigorev
Copy link
Contributor

Danil-Grigorev commented Jun 28, 2024

What steps did you take and what happened?

Management or provisioning cluster manifest reports .status.agentDeplyment field, which should reflect the status of the agent deployment on the child cluster. In a scenario that something or someone removes components of the agent deployment on the child cluster, the field is not getting updated, which prevents turtles from re-applying agent manifests.

What did you expect to happen?

Turtles to be able to identify missing components on the child cluster and successfully execute import procedure.

How to reproduce it?

It is a race condition in a scenario when the provisioning cluster manifest is getting replaced with management cluster.

It is possible to reproduce by removing some parts of the import components on the child cluster, while the agent is not connected.

Example of the failure: https://github.com/rancher/turtles/actions/runs/9709318524/attempts/1?pr=575

apiversion: provisioning.cattle.io/v1
kind: Cluster
metadata:
  annotations:
    field.cattle.io/creatorId: system:serviceaccount:rancher-turtles-system:rancher-turtles-manager
  creationTimestamp: "2024-06-28T08:06:06Z"
  finalizers:
  - wrangler.cattle.io/cloud-config-secret-remover
  - wrangler.cattle.io/provisioning-cluster-remove
  - wrangler.cattle.io/rke-cluster-remove
  generation: 2
  labels:
    cluster-api.cattle.io/owned: ""
  name: clusterv1-docker-rke2-capi
  namespace: creategitops-r2rd5y
  ownerReferences:
  - apiVersion: cluster.x-k8s.io/v1beta1
    kind: Cluster
    name: clusterv1-docker-rke2
    uid: 33b500f2-55ae-4759-b8f9-f8e6f57f45a9
  resourceVersion: "16050"
  uid: d1c4bd11-7c2c-4204-8710-902f87699055
spec:
  localClusterAuthEndpoint: {}
status:
  agentDeployed: true
  clusterName: c-m-vxg8zckk
  conditions:
  - lastUpdateTime: "2024-06-28T08:06:06Z"
    reason: Reconciling
    status: "True"
    type: Reconciling
  - lastUpdateTime: "2024-06-28T08:06:06Z"
    status: "False"
    type: Stalled
  - lastUpdateTime: "2024-06-28T08:08:00Z"
    status: "True"
    type: Created
  - lastUpdateTime: "2024-06-28T08:06:07Z"
    status: "True"
    type: RKECluster
  - lastUpdateTime: "2024-06-28T08:06:07Z"
    status: "True"
    type: BackingNamespaceCreated
  - lastUpdateTime: "2024-06-28T08:06:07Z"
    status: "True"
    type: DefaultProjectCreated
  - lastUpdateTime: "2024-06-28T08:06:07Z"
    status: "True"
    type: SystemProjectCreated
  - lastUpdateTime: "2024-06-28T08:06:07Z"
    status: "True"
    type: InitialRolesPopulated
  - lastUpdateTime: "2024-06-28T08:06:07Z"
    status: "True"
    type: CreatorMadeOwner
  - lastUpdateTime: "2024-06-28T08:07:40Z"
    status: "True"
    type: Pending
  - lastUpdateTime: "2024-06-28T08:06:07Z"
    message: Waiting for API to be available
    status: Unknown
    type: Waiting
  - lastUpdateTime: "2024-06-28T08:06:07Z"
    status: "True"
    type: NoDiskPressure
  - lastUpdateTime: "2024-06-28T08:06:07Z"
    status: "True"
    type: NoMemoryPressure
  - lastUpdateTime: "2024-06-28T08:06:07Z"
    status: "True"
    type: SecretsMigrated
  - lastUpdateTime: "2024-06-28T08:06:07Z"
    status: "True"
    type: ServiceAccountSecretsMigrated
  - lastUpdateTime: "2024-06-28T08:06:07Z"
    status: "True"
    type: RKESecretsMigrated
  - lastUpdateTime: "2024-06-28T08:06:07Z"
    status: "True"
    type: ACISecretsMigrated
  - lastUpdateTime: "2024-06-28T08:08:00Z"
    status: "False"
    type: Connected
  - lastUpdateTime: "2024-06-28T08:07:40Z"
    status: "True"
    type: GlobalAdminsSynced
  - lastUpdateTime: "2024-06-28T08:07:41Z"
    status: "True"
    type: SystemAccountCreated
  - lastUpdateTime: "2024-06-28T08:07:42Z"
    status: "True"
    type: AgentDeployed
  - lastUpdateTime: "2024-06-28T08:08:00Z"
    message: Cluster agent is not connected
    reason: Disconnected
    status: "False"
    type: Ready
  fleetWorkspaceName: creategitops-r2rd5y
  observedGeneration: 2

Last turtles logs:


│ 08:22:55.134263       1 import_controller.go:131] "Reconciling CAPI cluster" controller="cluster" controllerGroup="cluster.x-k8s.io" control │
│ lerKind="Cluster" Cluster="creategitops-r2rd5y/clusterv1-docker-rke2" namespace="creategitops-r2rd5y" name="clusterv1-docker-rke2" reconcile │
│ ID="22efd157-5031-4918-a44f-5a332ccca168"                                                                                                    │
│ 08:22:55.134326       1 import_controller.go:251] "found cluster name" controller="cluster" controllerGroup="cluster.x-k8s.io" controllerKin │
│ d="Cluster" Cluster="creategitops-r2rd5y/clusterv1-docker-rke2" namespace="creategitops-r2rd5y" reconcileID="22efd157-5031-4918-a44f-5a332cc │
│ ca168" name="c-m-vxg8zckk"                                                                                                                   │
│ 08:22:55.134338       1 import_controller.go:254] "agent already deployed, no action needed" controller="cluster" controllerGroup="cluster.x │
│ -k8s.io" controllerKind="Cluster" Cluster="creategitops-r2rd5y/clusterv1-docker-rke2" namespace="creategitops-r2rd5y" name="clusterv1-docker │
│ -rke2" reconcileID="22efd157-5031-4918-a44f-5a332ccca168" 

Child cluster does not have a cattle-system namespace.

│ default                                                Active                                      55m                                       │
│ kube-node-lease                                        Active                                      55m                                       │
│ kube-public                                            Active                                      55m                                       │
│ kube-system                                            Active                                      55m                                       │
│ local                                                  Active                                      54m                                       │

Rancher Turtles version

v0.8.0

Anything else you would like to add?

No response

Label(s) to be applied

/kind bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant