-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle failures in routing reconciler in recoverable way #1166
Conversation
When a DevWorkspaceRouting fails to provision objects, it enters the 'Failed' phase. In order to prevent repeated unnecessary reconciles, the controller ignores DWRs that are already Failed, which leads to the 'Failed' phase being permanent. To avoid this, we instead set a Stopped phase when the parent workspace is stopped. This will clear the Failed phase and any message that was set, so the DevWorkspace controller will need to propagate it into the DevWorkspace object to allow it to be shown. Signed-off-by: Angel Misevski <amisevsk@redhat.com>
I've also checked that this PR doesn't break |
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## main #1166 +/- ##
==========================================
- Coverage 52.53% 52.51% -0.02%
==========================================
Files 82 82
Lines 7460 7468 +8
==========================================
+ Hits 3919 3922 +3
- Misses 3260 3266 +6
+ Partials 281 280 -1
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
I was unable to reproduce the original bug following the reproduer issue: Che's validating webhook prevents the creation of 2 Che Cluster CR's so I had to delete the webhook and quickly create another Che Cluster CR. Even then, the second Che Cluster CR wouldn't get fully set up (The dashboard deployment would endlessly be created and deleted), and creating a workspace with the che routing class wouldn't cause the workspace to fail.
However, I was able to patch the DWR as suggested in the second testing approach and got the workspace to fail with the appropriate message from the DWR controller:
NAME DEVWORKSPACE ID PHASE INFO
plain-devworkspace workspace56a1660c61624a42 Failed Failed to set up networking for workspace: testing failure
I undid the patch, set the workspace to started and it deployed successfully as expected.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: amisevsk, AObuchow The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What does this PR do?
When a DevWorkspaceRouting fails to provision objects, it enters the 'Failed' phase. In order to prevent repeated unnecessary reconciles, the controller ignores DWRs that are already Failed, which leads to the 'Failed' phase being permanent.
To avoid this, we instead set a Stopped phase when the parent workspace is stopped. This will clear the Failed phase and any message that was set, so the DevWorkspace controller will need to propagate it into the DevWorkspace object to allow it to be shown.
What issues does this PR fix or reference?
Closes #923
Is it tested? How?
To test, follow the reproducer in #923 and verify this PR fixes the issue (allows failed workspaces to be restarted):
alternatively, this PR can be tested by patching the DevWorkspace controller pod to mark routing objects as failed instead of ready by replacing this block with
in order to get a failed DevWorkspaceRouting. Then undo the patch and retest to ensure workspace can be restarted after failure.
In testing, when the DevWorkspaceRouting fails, the DevWorkspace should also be marked as failed, and the failure message on the DevWorkspace should include all applicable context from the DevWorkspaceRouting.
PR Checklist
/test v8-devworkspace-operator-e2e, v8-che-happy-path
to trigger)v8-devworkspace-operator-e2e
: DevWorkspace e2e testv8-che-happy-path
: Happy path for verification integration with Che