Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double quote escape for bash script #5280

Closed
wants to merge 16 commits into from

Conversation

unkcpz
Copy link
Member

@unkcpz unkcpz commented Dec 17, 2021

I separate the implementation out from the container code PR. Also slightly touch and trying to address the issue of environment_variables of calcjob mentioned at #4836.

I make a demo change on SGE scheduler and found a bug there (please see the second commit 9d2f639e6). If you think this is the right direction for #4836. I will continue on implement this for other schedulers.

pinning @bosonie for comment since you open the issue mentioned @giovannipizzi to double-check this is the right way to do the escape. Also @mbercx since maybe you need more magic about escape in you hyperqueue plugin?

@sphuber sphuber self-requested a review December 17, 2021 14:09
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with the principle of adding a new option that can change the default behavior. Just have a question about the name and the fact that each subclass explicitly has to be updated to respect the new option. I think this requires a change in the setup of Scheduler._get_submit_script_header.

@@ -244,6 +244,8 @@ def define(cls, spec: CalcJobProcessSpec) -> None: # type: ignore[override]
help='If set to true, the submission script will load the system environment variables',)
spec.input('metadata.options.environment_variables', valid_type=dict, default=lambda: {},
help='Set a dictionary of custom environment variables for this calculation',)
spec.input('metadata.options.use_double_quotes', valid_type=bool, default=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this only applies to the environment_variables option, should we name this option more explicitly, like environment_variables_double_quotes? This makes it a lot clearer that it only affects that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe envvar_double_quotes instead of environment_variables to avoid to have a super-long name that users have to type? I'm not 100% sure of this name though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is not precise to naming it as use_double_quotes but environment_variables_double_quotes really too long. My initial consideration is that use_double_quotes also influent other parts like command line parameters. Do you think it is too much to do this in the calcjob metadata setting with one option?

aiida/schedulers/plugins/sge.py Outdated Show resolved Hide resolved
Copy link
Member

@giovannipizzi giovannipizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing that is missing is how to use double quotes for the computer and code properties. E.g. the mpirun command (this is what we need for the hyperqueue plugin). Pinging also @mbercx.

Would a simple, backward-compatible option be to add a _USE_DOUBLE_QUOTES class variable for scheduler plugins? And AiiDA checks if it exists and honours it if present (going back to single quotes if it's not present of if the value is False).

Or even, if we want to give more flexibility, we could have _QUOTE_MODE variable that accepts a value from an Enum, e.g. SINGLE (default), DOUBLE (new option), or even NONE. @sphuber what do you think?

@@ -244,6 +244,8 @@ def define(cls, spec: CalcJobProcessSpec) -> None: # type: ignore[override]
help='If set to true, the submission script will load the system environment variables',)
spec.input('metadata.options.environment_variables', valid_type=dict, default=lambda: {},
help='Set a dictionary of custom environment variables for this calculation',)
spec.input('metadata.options.use_double_quotes', valid_type=bool, default=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe envvar_double_quotes instead of environment_variables to avoid to have a super-long name that users have to type? I'm not 100% sure of this name though

@sphuber
Copy link
Contributor

sphuber commented Dec 17, 2021

The thing that is missing is how to use double quotes for the computer and code properties.

If the point of this is really to affect quoting in general, then having an option called quote_style (that can have a number of options as you said) would probably be better yeah. Big question remains though how we can make sure that plugins respect this. As I indicated in another comment, with the current design, the option would probably not be respected for the environment variables since each plugin has to reimplement this. Existing ones outside of aiida-core will not do this clearly, after this change.

Anyway, I opened PR #5283 to abstract the environment variable part to solve that, but if we extend the quoting also to other things of the script, such as the mpi_command, then I am not sure if there is an easy way for us to guarantee this is respected by plugins.

@sphuber
Copy link
Contributor

sphuber commented Dec 17, 2021

And just for my understanding, is @unkcpz implementing this just because of @bosonie feature request, or is this needed elsewhere as well? Would be good to have those specific use case requirements as well if that is the case.

@unkcpz
Copy link
Member Author

unkcpz commented Dec 17, 2021

And just for my understanding, is @unkcpz implementing this just because of @bosonie feature request, or is this needed elsewhere as well?

I implement this in order to affect the quotes style in general, just thought @bosonie feature request is another case that can be addressed by extra parameter of scape_for_bash and fixed first.

@giovannipizzi
Copy link
Member

It's needed at least for the aiida-hyperqueue plugin (see aiidateam/aiida-hyperqueue#2).
Also, the PR of @unkcpz on supporting containerised code seemed to need this, right (to specify mount point etc.)

@unkcpz unkcpz force-pushed the double-quote-bash branch 3 times, most recently from e0b9b13 to f85cac4 Compare December 20, 2021 10:57
@unkcpz unkcpz requested a review from sphuber December 20, 2021 10:58
@unkcpz
Copy link
Member Author

unkcpz commented Dec 20, 2021

@sphuber Thanks for the updating! I rebased this pr and make the changes.

@codecov
Copy link

codecov bot commented Dec 20, 2021

Codecov Report

Merging #5280 (0b2db9d) into develop (dce0041) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5280      +/-   ##
===========================================
+ Coverage    82.13%   82.14%   +0.02%     
===========================================
  Files          533      533              
  Lines        38485    38515      +30     
===========================================
+ Hits         31605    31635      +30     
  Misses        6880     6880              
Flag Coverage Δ
django 77.23% <100.00%> (+0.02%) ⬆️
sqlalchemy 76.53% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiida/common/datastructures.py 100.00% <ø> (ø)
aiida/schedulers/datastructures.py 94.55% <ø> (ø)
aiida/cmdline/commands/cmd_code.py 90.58% <100.00%> (+0.04%) ⬆️
aiida/cmdline/params/options/commands/code.py 100.00% <100.00%> (ø)
aiida/common/escaping.py 100.00% <100.00%> (ø)
aiida/engine/processes/calcjobs/calcjob.py 90.32% <100.00%> (+0.12%) ⬆️
aiida/manage/tests/pytest_fixtures.py 94.32% <100.00%> (+0.07%) ⬆️
aiida/orm/computers.py 82.42% <100.00%> (+0.24%) ⬆️
aiida/orm/nodes/data/code.py 91.86% <100.00%> (+0.23%) ⬆️
aiida/orm/utils/builders/code.py 85.13% <100.00%> (+0.26%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dce0041...0b2db9d. Read the comment docs.

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @unkcpz . Some changes to the docstrings, but the rest of the implementation is ok. I have to admit that I am still not a fan of envvar_double_quotes. It may very well be a bit shorter, but I tend to think that shorter is not better if you are sacrificing consistency and clarity. But if others think this is the best option then I won't stop it.

aiida/common/escaping.py Show resolved Hide resolved
aiida/engine/processes/calcjobs/calcjob.py Outdated Show resolved Hide resolved
aiida/schedulers/datastructures.py Outdated Show resolved Hide resolved
@unkcpz
Copy link
Member Author

unkcpz commented Dec 20, 2021

@sphuber Thanks, I make the changes as per your advice.

For envvar_double_quotes I personally think this is name is clear and concise, although this is not the pythonic naming style for sure.

@giovannipizzi
Copy link
Member

Mmm, I see the point on the naming - I realise that there is also an environment_variables option, so indeed it's not very consistent. However very long names are also a bit annoying...

So I see two options (for the variable indicating whether to escape with double quotes (instead of single quotes) the value of the environment variables:

  • envvar_double_quotes
  • environment_variables_double_quotes

Maybe someone else who'd be actually using this feature might be interested in chipping in? E.g. @bosonie @mbercx

More generally - I see this PR is only about env vars. What about telling to use double quotes e.g. for the command line arguments of a code (will this be a new attribute in the CalcInfo returned by the plugin?), or to the mpirun_command of the computer (will this be an option in the computer setup?). I imagine this is the (partial) focus of #5250 - just asking to make sure we don't want this option defined in this PR also control escaping of other parts of the script (thinking to it, I think this is OK as it is, and the other properties are not specific to CalcJob but to either Code, CalcInfo or Computer).

@unkcpz
Copy link
Member Author

unkcpz commented Jan 7, 2022

I add an extra commit 8d3463d to this PR to enable double-quotes setting for calcjob job script code command. This option is a computer property and can be set by computer.set_use_double_quotes(True). I think it will work for both @mbercx hyperquene case and my container code to have ENV in the code executable command of the job script.

To summary, the main dilemma in adding this double-quotes setting is where this option should be set, if it is set in multiple places below what is the order of override?

  1. Set double-quotes as a computer property. (in the new commit, only this is applied. There is no problem for hyperqueue case since one computer has only one scheduler and hyperqueue implement a new scheduler and it is bound to the computer.)
  2. Set it along with the code set up. (With this we can only use double quotes for container code, it is more flexible and backward compatible, but it is complex do decide whether this should override the double quotes setting above).
  3. Allowed to set it in CalcJob object option. (Have the same issue in how to override the option set for computer, but this will give flexibility for CalcJob users to tune this parameter as they expect.)

@unkcpz unkcpz force-pushed the double-quote-bash branch 5 times, most recently from 3c800d6 to 4758dbb Compare January 25, 2022 21:25
@unkcpz
Copy link
Member Author

unkcpz commented Jan 25, 2022

The way of command execute line in script generating from code_info and code is now shaped to:

computer_cmdline_params[0] computer_cmdline_params[1] ... cmdline_params[0] cmdline_params[1] ... < stdin > stdout 
  1. add computer_cmdline_params for CodeInfo which separate the parameters such as mpirun and scheduler related (in aiida-hyperqueue case) from cmdline_params.
  2. add use_double_quotes which if True the cmdline_params will all escape by double-quotes.
  3. add custom_cmdline_string, if this string is set, the whole executable line will not be escaped but left as it is. This will be the solution for the tricky command line of executable code. such as the docker run and keep the maximum flexibility to configure the line.
'docker' 'run' '-it' '-v' "$PWD:/workdir:rw" '-w' "/workdir" 'ubuntu' 'sh' '-c' '/bin/bash < aiida.in > aiida.out'

@unkcpz unkcpz force-pushed the double-quote-bash branch from 3b212fd to ba40f5a Compare February 8, 2022 22:59
@unkcpz
Copy link
Member Author

unkcpz commented Feb 11, 2022

Close this PR since it is included in the containerized PR and will be separated into different small PRs include #5349

@unkcpz unkcpz closed this Feb 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants