-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update state transition for stopped workers #234
Conversation
I would like to ponder this a bit, but it seems sensible to me. Overall our model is that individual drones are expendable, so being safe and discarding one seems preferable over trying to squeeze it back to live at all cost. |
Codecov Report
@@ Coverage Diff @@
## master #234 +/- ##
=======================================
Coverage 99.34% 99.34%
=======================================
Files 54 54
Lines 2139 2139
=======================================
Hits 2125 2125
Misses 14 14
Continue to review full report at Codecov.
|
Sure, that was the idea of this pull request. ;-) |
I think this change is fine. Since the drone has to be in |
That is already in place as a work around. However, others could run into this problem, too. So, I think we should fix it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your request has been pondered and deemed to be... worthy. Merge it so!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
During the integration of OpenStack cloud resources for PUNCH4NFDI, I noticed an unexpected behaviour for Drones in
AvailableState
. In this state the status of the resource (OpenStack VM) and the machine status (availability in HTCondor) is checked. Both status are used to decided on a possible state transition.Due to missing payloads, the HTCondor daemon on this node shutdown automatically causing the machine status to be
NotAvailable
. While the resource status continues to beRunning
. In that case the drone state is reset toIntegratingState
. Since HTCondor is not restarted, the Drone remains in this state forever.I would like to discuss the following change in this pull request. In case of the behaviour descriibed above, I would like to move the Drone into
ShutDown
state causing the VM to be stopped in OpenStack. What do you think?