-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Engine: Ensure node is sealed when process excepts #6549
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6549 +/- ##
==========================================
+ Coverage 77.51% 77.83% +0.33%
==========================================
Files 560 566 +6
Lines 41444 41984 +540
==========================================
+ Hits 32120 32674 +554
+ Misses 9324 9310 -14 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sphuber! The engine is still a mysterious beast to me, but I added some questions to start improving my understanding.
from aiida.engine.utils import set_process_state_change_timestamp | ||
|
||
super().on_entered(from_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the parent class version of this method do? Why was it called at the end of the method first and is there any risk of moving it up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parent class is the Process
class in plumpy
and above that the StateMachine
. It makes sure that all state transitions are handled properly and callbacks that are registered are invoked. To be completely honest, I am not sure why it was below, I just put it back where it should be really. It may have been moved down together with the change relating to the huge comment of the update_outputs
having to be called before updating the node state. But those are operations that work on the node, not the process itself, so don't see how that should affect the logic called in the parent Process
class.
from aiida.engine.utils import set_process_state_change_timestamp | ||
|
||
super().on_entered(from_state) | ||
|
||
if self._state.LABEL is ProcessState.EXCEPTED: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we don't move this logic into the try
-except
below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where exactly? The point is that if this conditional is true, we already excepted and we don't want to except again, because that would cause the bug we are fixing. I guess perhaps one other solution would be to wrap all custom code in this overridden method inside a try-except and only reraise if the current state is not already excepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering why you don't add it after the except ValueError
on line 439, only raising in case the process state is not already excepted. Then the process state would be set in the finally
part anyways. I suppose in the current version, all exceptions are pre-empted, but is this what we want?
Also, do the lines on 444-445 no longer have to be executed?
self._save_checkpoint()
set_process_state_change_timestamp(self.node)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning was that if we are already excepted, we want to do the bare minimum necessary because any additional operation can cause another exception which would display the bug we are trying to fix where the node is not sealed. The updating of the checkpoint and process state change timestamp are not critical. So out of safety, I was just thinking to only do the node state and then just wrap things up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, this is good to go for me, thanks @sphuber!
Processes that hit a certain exception were not being sealed. This would cause problems when trying to export them, which only allows sealed nodes. The problem occurs when another exception occurs while handling the original exception. An example is when `Process.update_outputs` would raise a `ValueError` because an unstored node had be attached as an output. Since this method is called in `on_entered`, which is called when the process entered a new state, it would be called again when it entered the excepted state. Since the process was already excepted, the rest of the state changes is cut short by `plumpy`. This would cause the process to never go to the final `TERMINATED` state and so the `on_terminated` method would not be called, which is where the process' node is sealed. The solution is to check the current state in `on_entered` and if it is `EXCEPTED` to simply return and no longer perform any updates on the node. This should prevent any other exceptions from being hit and ensure the process transitions properly to the final terminated state. The only update that is still performed is to update the process state on the process' node, otherwise it would not properly be shown as excepted.
28636f7
to
c7ffdd1
Compare
Processes that hit a certain exception were not being sealed. This would cause problems when trying to export them, which only allows sealed nodes. The problem occurs when another exception occurs while handling the original exception. An example is when `Process.update_outputs` would raise a `ValueError` because an unstored node had be attached as an output. Since this method is called in `on_entered`, which is called when the process entered a new state, it would be called again when it entered the excepted state. Since the process was already excepted, the rest of the state changes is cut short by `plumpy`. This would cause the process to never go to the final `TERMINATED` state and so the `on_terminated` method would not be called, which is where the process' node is sealed. The solution is to check the current state in `on_entered` and if it is `EXCEPTED` to simply return and no longer perform any updates on the node. This should prevent any other exceptions from being hit and ensure the process transitions properly to the final terminated state. The only update that is still performed is to update the process state on the process' node, otherwise it would not properly be shown as excepted. Cherry-pick: cbf672f
Processes that hit a certain exception were not being sealed. This would cause problems when trying to export them, which only allows sealed nodes. The problem occurs when another exception occurs while handling the original exception. An example is when `Process.update_outputs` would raise a `ValueError` because an unstored node had be attached as an output. Since this method is called in `on_entered`, which is called when the process entered a new state, it would be called again when it entered the excepted state. Since the process was already excepted, the rest of the state changes is cut short by `plumpy`. This would cause the process to never go to the final `TERMINATED` state and so the `on_terminated` method would not be called, which is where the process' node is sealed. The solution is to check the current state in `on_entered` and if it is `EXCEPTED` to simply return and no longer perform any updates on the node. This should prevent any other exceptions from being hit and ensure the process transitions properly to the final terminated state. The only update that is still performed is to update the process state on the process' node, otherwise it would not properly be shown as excepted.
Fixes #6548
Processes that hit a certain exception were not being sealed. This would cause problems when trying to export them, which only allows sealed nodes. The problem occurs when another exception occurs while handling the original exception.
An example is when
Process.update_outputs
would raise aValueError
because an unstored node had be attached as an output. Since this method is called inon_entered
, which is called when the process entered a new state, it would be called again when it entered the excepted state. Since the process was already excepted, the rest of the state changes is cut short byplumpy
. This would cause the process to never go to the finalTERMINATED
state and so theon_terminated
method would not be called, which is where the process' node is sealed.The solution is to check the current state in
on_entered
and if it isEXCEPTED
to simply return and no longer perform any updates on the node. This should prevent any other exceptions from being hit and ensure the process transitions properly to the final terminated state. The only update that is still performed is to update the process state on the process' node, otherwise it would not properly be shown as excepted.