-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve error handling in Lancium adapter #281
Improve error handling in Lancium adapter #281
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks fine 👍
97cf41e
to
c3a02d6
Compare
I have a quick question on this. Does the handling of a drone as crashed have any further meaning? It can (or should) happen that a HTTP Status Code 404 is given even when the drone has not crashed but was correctly terminated by TARDIS (or something else), e.g. the confirmation might just have gone lost. In case it does not make any difference, I will approve :) |
c3a02d6
to
709453d
Compare
The drone is not handled anymore by
Yes, that can happen, however the result is the same. The drone is gone and it is not handled anymore by
Yes, does not make any difference! :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still fine for me :D
During testing the integration of Lancium Compute into the infrastructure,
TARDIS
sometimes crashes due to two reasons, which can be internally handled byTARDIS
.queued
orrunning
can be changed meantime toerror
orfinished
. IfTARDIS
tries to terminate this drone, the Lancium API returns HTTP Status Code 419 ("Unable to terminate a job that is not queued or running"). In that caseTARDIS
should handle it asTardisResourceStatusUpdateFailed
exception. So, after the next resource status update, the life cycle management can take care of it.TARDIS
should handle it asTardisDroneCrashed
exception. Afterwards the life cycle management takes care of it.FYI the
handle_exceptions
context manager is involved in here.