Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Fix: hello template e2e error. #558

Conversation

xuezhaojun
Copy link
Member

@xuezhaojun xuezhaojun commented Jul 5, 2024

Summary

Recently we got this addon-manager flaky errors very frequently:

  [FAILED] Timed out after 300.000s.
  Expected success, but got an error:
      <*errors.errorString | 0xc000b835c0>: 
      the addon e2e-universal-managedcluster-phbngt/hello-template available condition is not true, [{Progressing True 0 2024-07-05 04:47:53 +0000 UTC Progressing progressing... work is not ready} {RegistrationApplied True 0 2024-07-05 04:47:53 +0000 UTC SetPermissionApplied Registration of the addon agent is configured} {ClusterCertificateRotated False 0 2024-07-05 04:47:53 +0000 UTC ClientCertificateUpdateFailed Failed to rotated client certificate namespaces "open-cluster-management-agent-addon" not found} {ManifestApplied False 0 2024-07-05 04:47:53 +0000 UTC AddonManifestAppliedFailed failed to apply the manifests of addon} {Available False 0 2024-07-05 04:47:53 +0000 UTC WorkNotApplied 3 of 4 resources are not available}]
      {
          s: "the addon e2e-universal-managedcluster-phbngt/hello-template available condition is not true, [{Progressing True 0 2024-07-05 04:47:53 +0000 UTC Progressing progressing... work is not ready} {RegistrationApplied True 0 2024-07-05 04:47:53 +0000 UTC SetPermissionApplied Registration of the addon agent is configured} {ClusterCertificateRotated False 0 2024-07-05 04:47:53 +0000 UTC ClientCertificateUpdateFailed Failed to rotated client certificate namespaces \"open-cluster-management-agent-addon\" not found} {ManifestApplied False 0 2024-07-05 04:47:53 +0000 UTC AddonManifestAppliedFailed failed to apply the manifests of addon} {Available False 0 2024-07-05 04:47:53 +0000 UTC WorkNotApplied 3 of 4 resources are not available}]",
      }

From the log, you can see the root cause is because:

ClusterCertificateRotated False 0 2024-07-05 04:47:53 +0000 UTC ClientCertificateUpdateFailed Failed to rotated client certificate namespaces \"open-cluster-management-agent-addon\" not found

This is because currently, the namespace for addons are fixed to be: open-cluster-management-agent-addon. Currently we have 5 places to cleanKlusterletResources:

image

There are chances that open-cluster-management-agent is cleaned up, and then addon-manager start to test its case.

BUT, addon-manager is using the universal klusterlet create at the init of the e2e, that klusterlet is not changed, so the controller won't be reconcile to create open-cluster-management-agent-addon again.

That's the reason why we have the namespaces \"open-cluster-management-agent-addon\" not found error.


A simple way to fix it is using another ns in addon-manager cases.

@openshift-ci openshift-ci bot requested review from zhiweiyin318 and zhujian7 July 5, 2024 07:01
Copy link

codecov bot commented Jul 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 62.41%. Comparing base (b3f15c1) to head (164c90a).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #558   +/-   ##
=======================================
  Coverage   62.41%   62.41%           
=======================================
  Files         177      177           
  Lines       13857    13857           
=======================================
  Hits         8649     8649           
  Misses       4339     4339           
  Partials      869      869           
Flag Coverage Δ
unit 62.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@xuezhaojun xuezhaojun closed this Jul 5, 2024
@xuezhaojun xuezhaojun reopened this Jul 5, 2024
Signed-off-by: GitHub <noreply@github.com>
@xuezhaojun xuezhaojun force-pushed the fix-e2e-flaky-error-hello-temaplte branch from a4d9d76 to 164c90a Compare July 5, 2024 09:14
@xuezhaojun
Copy link
Member Author

xuezhaojun commented Jul 5, 2024

/assign @zhujian7
please take a review, thanks!

/assign @qiujian16
please approve, thanks!

@zhujian7
Copy link
Member

zhujian7 commented Jul 8, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Jul 8, 2024
@qiujian16
Copy link
Member

/approve

Copy link
Contributor

openshift-ci bot commented Jul 8, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qiujian16, xuezhaojun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Jul 8, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 990a639 into open-cluster-management-io:main Jul 8, 2024
17 checks passed
@xuezhaojun xuezhaojun deleted the fix-e2e-flaky-error-hello-temaplte branch July 8, 2024 02:34
zhujian7 pushed a commit to zhujian7/ocm that referenced this pull request Dec 3, 2024
Signed-off-by: GitHub <noreply@github.com>
openshift-merge-bot bot pushed a commit to stolostron/ocm that referenced this pull request Dec 3, 2024
* Only bind the agent role for the addon group

Signed-off-by: zhujian <jiazhu@redhat.com>

* Update addon rolebinding

Signed-off-by: zhujian <jiazhu@redhat.com>

* Start an addon informor for each addon

Signed-off-by: zhujian <jiazhu@redhat.com>

* Fix: addon manager flaky error. (open-cluster-management-io#558)

Signed-off-by: GitHub <noreply@github.com>

---------

Signed-off-by: zhujian <jiazhu@redhat.com>
Signed-off-by: GitHub <noreply@github.com>
Co-authored-by: xuezhao <zxue@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants