Print warning instead of error in case of unstable cluster #4315

vyasgun · 2024-08-12T10:58:35Z

Fixes: Issue #4284

Solution/Idea

Since the code doesn't exit, the error messaging might be confusing to users. Changed it to Warn

INFO Operator network is progressing
INFO Operator network is progressing
INFO Operator network is progressing
INFO Operator network is progressing
INFO Operator network is progressing
INFO Operator network is progressing
INFO Operator network is progressing
WARN Cluster is not ready: cluster operators are still not stable after 10m0.695631268s
INFO Adding crc-admin and crc-developer contexts to kubeconfig...
ERRO Cannot update kubeconfig: Head "https://oauth-openshift.apps-crc.testing:443": read tcp 127.0.0.1:60782->127.0.0.1:443: read: connection reset by peer
Started the OpenShift cluster.

The server is accessible via web console at:
  https://console-openshift-console.apps-crc.testing

Log in as administrator:
  Username: kubeadmin
  Password: 3NM8K-C5kvg-YTRW4-FhiUM

Log in as user:
  Username: developer
  Password: developer

Use the 'oc' command line interface:
  $ eval $(crc oc-env)
  $ oc login -u developer https://api.crc.testing:6443

Testing

crc start and cluster operators should not get ready within the timeout. I did it by cordoning the cluster node and running start again.

- Since the code doesn't exit, the error messaging might be confusing to users

openshift-ci · 2024-08-12T10:58:46Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign cfergeau for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2024-08-12T11:39:10Z

@vyasgun: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/security	`65d4041`	link	false	`/test security`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

praveenkumar · 2024-08-13T09:11:55Z

@vyasgun can you check in case of ERRO Cannot update kubeconfig: Head "https://oauth-openshift.apps-crc.testing:443": read tcp 127.0.0.1:60782->127.0.0.1:443: read: connection reset by peer Started the OpenShift cluster. the error code is non-zero ?

vyasgun · 2024-08-13T09:15:27Z

@praveenkumar No. Do we want it to return a non-zero exit code?

praveenkumar · 2024-08-13T09:48:28Z

@praveenkumar No. Do we want it to return a non-zero exit code?

For any error, yes we should return non-zero otherwise we should change it to warn but I think if we are not able to update the kubeconfig file then tell user how they can still access it.

vyasgun · 2024-08-13T11:34:50Z

@praveenkumar In theStart function, only 2 logging.Errorf() statements have been used and neither of them is followed by a non-zero return.
Also, kubeconfig is being updated in other places inside the function and all of them except the last one are returning an error. For example:
https://github.com/crc-org/crc/blob/main/pkg/crc/machine/start.go#L603
https://github.com/crc-org/crc/blob/main/pkg/crc/machine/start.go#L528

I'm not sure if there is a particular reason for these statements and the differences (or if it's just an oversight). Additionally, I think if updating kubeconfig is grounds for a non-zero return, so is an unstable cluster (to indicate a failure in Start). We should return these errors in the end and put any extra processing that might be still required in defer so it's always executed.

cfergeau · 2024-09-05T11:15:52Z

Additionally, I think if updating kubeconfig is grounds for a non-zero return, so is an unstable cluster (to indicate a failure in Start).

We could return a different error code in both cases when crc completes. Different error codes we might want to ignore 'cluster unstable'

The 'cannot update kubeconfig' message deserves to be made a lot more userfriendly :) Explain what won't work when this fails (I think this only means kube contexts can't be used, and that an explicit login to the cluster will be needed).

praveenkumar · 2024-09-11T03:44:26Z

Additionally, I think if updating kubeconfig is grounds for a non-zero return, so is an unstable cluster (to indicate a failure in Start).

We could return a different error code in both cases when crc completes. Different error codes we might want to ignore 'cluster unstable'

The 'cannot update kubeconfig' message deserves to be made a lot more userfriendly :) Explain what won't work when this fails (I think this only means kube contexts can't be used, and that an explicit login to the cluster will be needed).

Yes, it should tell user how to access the cluster (like export KUBECONFIG=$HOME/.crc/machine/crc/kubeconfig or use oc --kubeconfig=$HOME/.crc/machine/crc/kubeconfig . They should able to debug or check which cluster operator is not in available state.

cfergeau · 2024-10-01T09:03:53Z

We probably can get this in, and create follow-up issues for the various improvements that have been discussed during the review?

praveenkumar · 2024-10-08T06:34:07Z

@vyasgun Can you create the follow up issue which is discussed here? Once follow up issue is created we can merge this one.

vyasgun · 2024-10-08T10:26:31Z

Created a follow up issue: #4395

Print warning instead of error in case of unstable cluster

65d4041

- Since the code doesn't exit, the error messaging might be confusing to users

openshift-ci bot requested review from anjannath and evidolob August 12, 2024 10:58

vyasgun requested a review from praveenkumar August 12, 2024 11:03

praveenkumar merged commit ab7c92f into crc-org:main Oct 8, 2024
20 of 29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Print warning instead of error in case of unstable cluster #4315

Print warning instead of error in case of unstable cluster #4315

vyasgun commented Aug 12, 2024 •

edited

Loading

openshift-ci bot commented Aug 12, 2024

openshift-ci bot commented Aug 12, 2024

praveenkumar commented Aug 13, 2024

vyasgun commented Aug 13, 2024

praveenkumar commented Aug 13, 2024

vyasgun commented Aug 13, 2024

cfergeau commented Sep 5, 2024 •

edited

Loading

praveenkumar commented Sep 11, 2024

cfergeau commented Oct 1, 2024

praveenkumar commented Oct 8, 2024

vyasgun commented Oct 8, 2024

Print warning instead of error in case of unstable cluster #4315

Print warning instead of error in case of unstable cluster #4315

Conversation

vyasgun commented Aug 12, 2024 • edited Loading

Solution/Idea

Testing

openshift-ci bot commented Aug 12, 2024

openshift-ci bot commented Aug 12, 2024

praveenkumar commented Aug 13, 2024

vyasgun commented Aug 13, 2024

praveenkumar commented Aug 13, 2024

vyasgun commented Aug 13, 2024

cfergeau commented Sep 5, 2024 • edited Loading

praveenkumar commented Sep 11, 2024

cfergeau commented Oct 1, 2024

praveenkumar commented Oct 8, 2024

vyasgun commented Oct 8, 2024

vyasgun commented Aug 12, 2024 •

edited

Loading

cfergeau commented Sep 5, 2024 •

edited

Loading