-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add logic to recover from quarantined and completed states #13071
Conversation
// A non-deleted workflow being in a COMPLETED state is unexpected, and should be corrected | ||
throw new UnreachableWorkflowException( | ||
String.format("ConnectionManagerWorkflow for connection %s is unreachable due to having COMPLETED status.", connectionId)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this case because when testing I saw an occurrence of a workflow with COMPLETED status that was not deleted. Since it was COMPLETED, it's state could still be queried, but no signals could be executed on it. This logic corrects this by correctly treating workflows in this state as "unreachable"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nit comment
} catch (final Exception e) { | ||
log.warn( | ||
"Could not terminate temporal workflow due to the following error; " | ||
+ "this may be because there is currently running workflow for this connection.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean "no running workflow" here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I did, good catch
…#13071) * add logic to recover from quarantined and completed states * move status retrieval into try-catch * fix typo in log * add one more tests * mvoe isWorkflowStateRunning into ConnectionManagerUtils to be more direct * format
What
Resolves #10932
How
Adds logic to handle quarantined and "completed" states of workflows, by detecting workflows in this state and properly reporting them as "unreachable". This PR also adds logic to attempt termination of a workflow before restarting it. This is important for cases like the quarantined case where we cannot start a fresh workflow for the connection until we have terminated the existing one.
Recommended reading order
x.java
y.python