-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Checkpointing of Trailblaze Algorithm #1159
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
My only comments are:
- We could likely easily make checkpointing optional, not required, through the Python API (though Yank should always checkpoint it)
- We should migrate the trailblazing code to openmmtools soon.
@@ -1962,7 +1963,7 @@ def _get_number_box_waters(self, *args): | |||
# ============================================================================== | |||
|
|||
def trailblaze_alchemical_protocol(thermodynamic_state, sampler_state, mcmc_move, state_parameters, | |||
std_energy_threshold=0.5, threshold_tolerance=0.05, | |||
checkpoint_path, std_energy_threshold=0.5, threshold_tolerance=0.05, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to make checkpointing optional, or required via the Python API? This approach seems to make it required, but it seems like it would be simple to make it optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be useful to have it optional in the function (although exposing it in the YAML will be slightly more complicated). It's possible tests are not passing also because you're defining a positional argument after a keyword one.
I'm going to merge these changes so that I can pick up from here and store also the configurations generated by trailblaze in a separate PR. |
Ah! I didn't notice the test failures. I'm going to make the changes directly here then. |
The failing tests have been already fixed in #1168 so I think we can merge safely. I'll correct eventual issues in the next PR if there are any. |
Description
This PR aims to implement a basic checkpointing system for the trailblazing algorithm - currently if the algorithm is interrupted while running (e.g. if an insufficient wall clock limit is set by a job submission script), the partially computed optimized lambda states are lost.
Status