Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine: Ensure node is sealed when process excepts #6549

Merged
merged 1 commit into from
Aug 6, 2024

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Jul 24, 2024

Fixes #6548

Processes that hit a certain exception were not being sealed. This would cause problems when trying to export them, which only allows sealed nodes. The problem occurs when another exception occurs while handling the original exception.

An example is when Process.update_outputs would raise a ValueError because an unstored node had be attached as an output. Since this method is called in on_entered, which is called when the process entered a new state, it would be called again when it entered the excepted state. Since the process was already excepted, the rest of the state changes is cut short by plumpy. This would cause the process to never go to the final TERMINATED state and so the on_terminated method would not be called, which is where the process' node is sealed.

The solution is to check the current state in on_entered and if it is EXCEPTED to simply return and no longer perform any updates on the node. This should prevent any other exceptions from being hit and ensure the process transitions properly to the final terminated state. The only update that is still performed is to update the process state on the process' node, otherwise it would not properly be shown as excepted.

Copy link

codecov bot commented Jul 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.83%. Comparing base (ef60b66) to head (c7ffdd1).
Report is 127 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6549      +/-   ##
==========================================
+ Coverage   77.51%   77.83%   +0.33%     
==========================================
  Files         560      566       +6     
  Lines       41444    41984     +540     
==========================================
+ Hits        32120    32674     +554     
+ Misses       9324     9310      -14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sphuber sphuber requested review from mbercx and GeigerJ2 July 24, 2024 22:17
Copy link
Member

@mbercx mbercx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sphuber! The engine is still a mysterious beast to me, but I added some questions to start improving my understanding.

from aiida.engine.utils import set_process_state_change_timestamp

super().on_entered(from_state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the parent class version of this method do? Why was it called at the end of the method first and is there any risk of moving it up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parent class is the Process class in plumpy and above that the StateMachine. It makes sure that all state transitions are handled properly and callbacks that are registered are invoked. To be completely honest, I am not sure why it was below, I just put it back where it should be really. It may have been moved down together with the change relating to the huge comment of the update_outputs having to be called before updating the node state. But those are operations that work on the node, not the process itself, so don't see how that should affect the logic called in the parent Process class.

from aiida.engine.utils import set_process_state_change_timestamp

super().on_entered(from_state)

if self._state.LABEL is ProcessState.EXCEPTED:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we don't move this logic into the try-except below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where exactly? The point is that if this conditional is true, we already excepted and we don't want to except again, because that would cause the bug we are fixing. I guess perhaps one other solution would be to wrap all custom code in this overridden method inside a try-except and only reraise if the current state is not already excepted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering why you don't add it after the except ValueError on line 439, only raising in case the process state is not already excepted. Then the process state would be set in the finally part anyways. I suppose in the current version, all exceptions are pre-empted, but is this what we want?

Also, do the lines on 444-445 no longer have to be executed?

        self._save_checkpoint()
        set_process_state_change_timestamp(self.node)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning was that if we are already excepted, we want to do the bare minimum necessary because any additional operation can cause another exception which would display the bug we are trying to fix where the node is not sealed. The updating of the checkpoint and process state change timestamp are not critical. So out of safety, I was just thinking to only do the node state and then just wrap things up.

@mbercx mbercx self-requested a review August 6, 2024 06:41
Copy link
Member

@mbercx mbercx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, this is good to go for me, thanks @sphuber!

Processes that hit a certain exception were not being sealed. This would
cause problems when trying to export them, which only allows sealed
nodes. The problem occurs when another exception occurs while handling
the original exception.

An example is when `Process.update_outputs` would raise a `ValueError`
because an unstored node had be attached as an output. Since this method
is called in `on_entered`, which is called when the process entered a
new state, it would be called again when it entered the excepted state.
Since the process was already excepted, the rest of the state changes is
cut short by `plumpy`. This would cause the process to never go to the
final `TERMINATED` state and so the `on_terminated` method would not be
called, which is where the process' node is sealed.

The solution is to check the current state in `on_entered` and if it is
`EXCEPTED` to simply return and no longer perform any updates on the
node. This should prevent any other exceptions from being hit and ensure
the process transitions properly to the final terminated state. The only
update that is still performed is to update the process state on the
process' node, otherwise it would not properly be shown as excepted.
@sphuber sphuber force-pushed the fix/6548/seal-excepted-process branch from 28636f7 to c7ffdd1 Compare August 6, 2024 07:16
@sphuber sphuber merged commit cbf672f into aiidateam:main Aug 6, 2024
11 checks passed
@sphuber sphuber deleted the fix/6548/seal-excepted-process branch August 6, 2024 08:41
sphuber added a commit that referenced this pull request Aug 7, 2024
Processes that hit a certain exception were not being sealed. This would
cause problems when trying to export them, which only allows sealed
nodes. The problem occurs when another exception occurs while handling
the original exception.

An example is when `Process.update_outputs` would raise a `ValueError`
because an unstored node had be attached as an output. Since this method
is called in `on_entered`, which is called when the process entered a
new state, it would be called again when it entered the excepted state.
Since the process was already excepted, the rest of the state changes is
cut short by `plumpy`. This would cause the process to never go to the
final `TERMINATED` state and so the `on_terminated` method would not be
called, which is where the process' node is sealed.

The solution is to check the current state in `on_entered` and if it is
`EXCEPTED` to simply return and no longer perform any updates on the
node. This should prevent any other exceptions from being hit and ensure
the process transitions properly to the final terminated state. The only
update that is still performed is to update the process state on the
process' node, otherwise it would not properly be shown as excepted.

Cherry-pick: cbf672f
mikibonacci pushed a commit to mikibonacci/aiida-core that referenced this pull request Sep 3, 2024
Processes that hit a certain exception were not being sealed. This would
cause problems when trying to export them, which only allows sealed
nodes. The problem occurs when another exception occurs while handling
the original exception.

An example is when `Process.update_outputs` would raise a `ValueError`
because an unstored node had be attached as an output. Since this method
is called in `on_entered`, which is called when the process entered a
new state, it would be called again when it entered the excepted state.
Since the process was already excepted, the rest of the state changes is
cut short by `plumpy`. This would cause the process to never go to the
final `TERMINATED` state and so the `on_terminated` method would not be
called, which is where the process' node is sealed.

The solution is to check the current state in `on_entered` and if it is
`EXCEPTED` to simply return and no longer perform any updates on the
node. This should prevent any other exceptions from being hit and ensure
the process transitions properly to the final terminated state. The only
update that is still performed is to update the process state on the
process' node, otherwise it would not properly be shown as excepted.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Excepted Workchain is not sealed
2 participants