Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler: abstract generation of submit script env variables #5283

Merged

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Dec 17, 2021

The environment variables defined in the JobTemplate need to be
written to the submit script header. The Scheduler.get_submit_script
relied on the __get_submit_script_header abstract method to do this.
This forces each plugin to write this code, even though this is very
likely to be scheduler independent.

Therefore it is best to abstract this functionality to the base class
such that each scheduler plugin automatically has this implemented. If
really needed, the plugin can still override the behavior by explicitly
reimplementing the _get_submit_script_environment_variables method.

Note that it looks that _get_submit_script_environment_variables could
be a staticmethod, but unfortunately it is not possible to call the
super when overriding the staticmethod in a subclass.

@sphuber sphuber force-pushed the fix/scheduler-environment-variables-to-base branch 2 times, most recently from 9b11c69 to f2963d9 Compare December 17, 2021 14:57
@sphuber
Copy link
Contributor Author

sphuber commented Dec 17, 2021

In terms of backwards compatibility; I think the worst that can happen is that plugins out there that currently also generate the environment variables themselves in their _get_submit_script_header implementation, the variables will be declared twice. But the new declaration of the base class will be added after, so should overrule. At least that ensures the upcoming feature that customizes the quoting will be respected, but it could potentially override important environment variable declarations.

@giovannipizzi
Copy link
Member

Just two comments:

  • I would keep the empty lines above and below for readability of the script (but I won't fight over it if you don't agree)
  • For external plugins who already copy-pasted the code, they will end up now writing the header twice? Would it be instead better for backward-compatibility to just provide the code in the base class, but call it explicitly in each plugin rather than in the base class?

@sphuber
Copy link
Contributor Author

sphuber commented Dec 17, 2021

I would keep the empty lines above and below for readability of the script (but I won't fight over it if you don't agree)

Well prepare to fight 🤼‍♂️ ;) Actually, I was aiming to replicate the current behavior. There are tests for the direct scheduler that compare the literal output of a generated script. When running tests locally, with the empty lines, the test fail. But now it fails remotely. I will look into it.

For external plugins who already copy-pasted the code, they will end up now writing the header twice? Would it be instead better for backward-compatibility to just provide the code in the base class, but call it explicitly in each plugin rather than in the base class?

They won't end up writing the entire header twice. It will just include the environment variables declaration twice if they also copy pasted this from the _get_submit_script_header method. I described this in my second comment, but don't see how we can fix this. We can call it in the plugins that we include in aiida-core, but there is no way to influence this for external plugins, obviously.

The environment variables defined in the `JobTemplate` need to be
written to the submit script header. The `Scheduler.get_submit_script`
relied on the `__get_submit_script_header` abstract method to do this.
This forces each plugin to write this code, even though this is very
likely to be scheduler independent.

Therefore it is best to abstract this functionality to the base class
such that each scheduler plugin automatically has this implemented. If
really needed, the plugin can still override the behavior by explicitly
reimplementing the `_get_submit_script_environment_variables` method.

Note that it looks that `_get_submit_script_environment_variables` could
be a `staticmethod`, but unfortunately it is not possible to call the
super when overriding the staticmethod in a subclass.
@sphuber sphuber force-pushed the fix/scheduler-environment-variables-to-base branch from f2963d9 to 6d9199e Compare December 17, 2021 16:08
@codecov
Copy link

codecov bot commented Dec 17, 2021

Codecov Report

Merging #5283 (5af8dce) into develop (7fad822) will increase coverage by 0.05%.
The diff coverage is 64.29%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5283      +/-   ##
===========================================
+ Coverage    81.46%   81.51%   +0.05%     
===========================================
  Files          530      530              
  Lines        37139    37103      -36     
===========================================
- Hits         30253    30241      -12     
+ Misses        6886     6862      -24     
Flag Coverage Δ
django 76.98% <64.29%> (+0.05%) ⬆️
sqlalchemy 75.97% <64.29%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiida/schedulers/plugins/lsf.py 72.80% <0.00%> (+1.76%) ⬆️
aiida/schedulers/plugins/pbsbaseclasses.py 58.53% <0.00%> (+1.34%) ⬆️
aiida/schedulers/plugins/slurm.py 71.59% <0.00%> (+1.45%) ⬆️
aiida/schedulers/plugins/direct.py 66.67% <50.00%> (-0.19%) ⬇️
aiida/schedulers/scheduler.py 84.32% <87.50%> (+0.18%) ⬆️
aiida/schedulers/plugins/sge.py 70.26% <100.00%> (-0.57%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7fad822...5af8dce. Read the comment docs.

@sphuber
Copy link
Contributor Author

sphuber commented Dec 17, 2021

@giovannipizzi the empty lines are fixed now.

@giovannipizzi
Copy link
Member

They won't end up writing the entire header twice. It will just include the environment variables declaration twice if they also copy pasted this from the _get_submit_script_header method.

Yes, sure. Still it might be a problem if the environment variables depend on each other and are not idempotent? E.g. something like

export VARIABLE=${VARIABLE}+xxx

Running this twice is a problem.

I don't see how we can fix this.

As I said, we leave (as now) the responsibility to the plugin to write those lines calling the method that you now implemented in the super class. (i.e. for each plugin where you removed the code, you just add a 1-line call to the method _get_submit_script_environment_variables).
However, I see that this gives weaker guarantees on the escaping, in case plugins don't call that method. But I would say this is a bug in the plugin, at someone will at some point open an issue and this will be fixed in the plugin?

In any case, the number of scheduler plugins is small, so maybe we can add to #5284 to create PRs or issues to fix all the scheduler plugins we are aware of (in the aiida registry), as well as add this information to the wiki page for 2.0.

@giovannipizzi
Copy link
Member

I.e.: if you still feel that the second approach is better, I'm OK to approve this and I'd kindly ask you to update both #5284 and the wiki.

giovannipizzi
giovannipizzi previously approved these changes Dec 17, 2021
Copy link
Member

@giovannipizzi giovannipizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-approving for simplicity, but please dismiss this if instead you prefer to go with my other suggestion (each plugin should manually call the part to write the env vars).

@sphuber sphuber force-pushed the fix/scheduler-environment-variables-to-base branch 2 times, most recently from bcf768b to ccb2850 Compare December 19, 2021 08:48
Have `_get_submit_script_environment_variables` include the starting and
end markers, and do not call it in `Scheduler.get_submit_script`, but
instead let the subclasses call it in `_get_submit_script_header`
themselves. This prevents existing external plugins printing the
environment variables twice, and still the logic is provided in a single
place.

This still require external plugins to update to use the new method to
print the environment variables for it to pick up the changes if
`aiida-core` decides to improve the implementation.
@sphuber sphuber force-pushed the fix/scheduler-environment-variables-to-base branch from ccb2850 to 5af8dce Compare December 19, 2021 11:31
@sphuber
Copy link
Contributor Author

sphuber commented Dec 19, 2021

I followed your suggestion @giovannipizzi and made the plugins call the new abstracted method. I agree that this is safer as it won't break existing functionality and we can suggest existing plugins to adopt the new method.

@sphuber sphuber merged commit c743a33 into aiidateam:develop Dec 19, 2021
@sphuber sphuber deleted the fix/scheduler-environment-variables-to-base branch December 19, 2021 14:32
@sphuber
Copy link
Contributor Author

sphuber commented Dec 19, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants