Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay in managedCluster status update #117

Closed
josecastillolema opened this issue Dec 4, 2022 · 6 comments
Closed

Delay in managedCluster status update #117

josecastillolema opened this issue Dec 4, 2022 · 6 comments
Assignees
Labels

Comments

@josecastillolema
Copy link

josecastillolema commented Dec 4, 2022

I have a managed cluster configured with a heartbeat of 60 seconds:

% oc get -o yaml managedcluster cluster1 | grep lease
  leaseDurationSeconds: 60

However, when I scale down the klusterlet-registration-agent in the managed cluster it takes almost 5 minutes for the cluster to be reported with Unknown availability:

% k scale deployment.apps/klusterlet-registration-agent --replicas 0
deployment.apps/klusterlet-registration-agent scaled
% time cat
cat  0.00s user 0.00s system 0% cpu 4:42.50 total

On the opposite, only 5/6 seconds for the cluster to be reported as Available when scaling up the agent:

% k scale deployment.apps/klusterlet-registration-agent --replicas 1
deployment.apps/klusterlet-registration-agent scaled
% time cat                                                          
cat  0.00s user 0.00s system 0% cpu 6.344 total

I would have expected a smaller time since the heartbeat is configured to 60 seconds.

Notes:

  • Both environments are kind clusters configured as per the Quick Start guide
  • Setting leaseDurationSeconds to a lower value (i.e.: 1 or 5 seconds) does not change the outcome
@josecastillolema
Copy link
Author

Logs from the hub cluster-manager-registration-controller:

I1204 12:27:21.129691       1 event.go:285] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"open-cluster-management-hub", Name:"open-cluster-management-hub", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ManagedClusterAvailableConditionUpdated' update managed cluster "cluster1" available condition to unknown, due to its lease is not updated constantly
I1204 12:27:21.185928       1 event.go:285] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"open-cluster-management-hub", Name:"open-cluster-management-hub", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ManagedClusterConditionAvailableUpdated' Update the original taints to the [{Key:cluster.open-cluster-management.io/unreachable Value: Effect:NoSelect TimeAdded:0001-01-01 00:00:00 +0000 UTC}]
I1204 12:30:29.673784       1 event.go:285] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"open-cluster-management-hub", Name:"open-cluster-management-hub", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ManagedClusterConditionAvailableUpdated' Update the original taints to the []

@mikeshng
Copy link
Member

mikeshng commented Dec 5, 2022

I am guessing the lease is being refreshed every 60 seconds but the hub that checks the lease is still on a 5 mins interval.

CC @qiujian16 @deads2k

@qiujian16
Copy link
Member

/assign @skeeey

@skeeey
Copy link
Member

skeeey commented Dec 5, 2022

@josecastillolema

A managed cluster has a grace period for its unknown status, the grace period is leaseDurationSeconds x 5, so when the managed cluster stop to update its lease, by default, the cluster will become unknown after 5mins (the default leaseDurationSeconds is 60s), the grace period will help to tolerate some unexpected situations, e.g. the network latency, and once the managed cluster recover to update its lease, we expect the managed cluster become available immediately

And we support to reduce the leaseDurationSeconds to reduce the grace period (like you did), but unfortunately, it seems that there is a bug in current implement, I will take a look

thanks

@josecastillolema
Copy link
Author

Thanks for the quick response!

@github-actions
Copy link

Message to comment on stale issues. If none provided, will not mark issues stale

@github-actions github-actions bot added the Stale label Jul 14, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 29, 2023
zhujian7 added a commit to zhujian7/ocm that referenced this issue Sep 9, 2024
…s deleted (open-cluster-management-io#117)

* Refresh external managed token secret if service account ns changes (open-cluster-management-io#458)

Signed-off-by: zhujian <jiazhu@redhat.com>

* 🐛 Refresh external managed token secret if service account is deleted (open-cluster-management-io#504) (open-cluster-management-io#78)

* Refresh external managed token secret if service account is deleted



* Debug e2e



---------

Signed-off-by: zhujian <jiazhu@redhat.com>

---------

Signed-off-by: zhujian <jiazhu@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants