Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use /usr/bin/env -S bash instead of /bin/bash on the host system #5684

Closed
wants to merge 1 commit into from

Conversation

rollf
Copy link

@rollf rollf commented Jan 20, 2025

On systems without /bin/bash (e.g. NixOS), invoking tasks may fail because of some hard-coded references to /bin/bash. Those references are either

  • fine (because the task runs in a container where /bin/bash is available),
  • don't matter (because the shebang is ignored)
  • or cause failure (if /bin/bash cannot be invoked).

This PR adapts Nextflow for the last scenario. Instead of invoking /bin/bash, /usr/bin/env -S bash is used where necessary. This approach (or similar) has been discussed previously:

However, compared the previous discussions/implementations, I specifically do not propose to change /bin/bash unless really necessary. By "necessary" I mean that bash is invoked on a local host system where it may not be available. Namely, when <task>/.command.run shall be invoked (always on the host) and when <task>/.command.sh is invoked in a non-container context (i.e. also on the host). I have not touched any other occurrences of /bin/bash, esp. not within the container.

I'm happy to change the the code as requested. I have not added any comments since I expect (at least) change requests. Please review this with an unbiased mind set. Multiple people have expressed their intention to use nextflow on systems without /bin/bash.

I have tested solely on NixOS.

Copy link

netlify bot commented Jan 20, 2025

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 3b55cc6
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/67907cf0c510180008a52d85

@pditommaso pditommaso force-pushed the master branch 2 times, most recently from 1f834a0 to 6454605 Compare January 20, 2025 15:13
@@ -85,6 +87,7 @@ class BashWrapperBuilder {
log.warn "Invalid value for `NXF_DEBUG` variable: $str -- See http://www.nextflow.io/docs/latest/config.html#environment-variables"
}
BASH = Collections.unmodifiableList( level > 0 ? ['/bin/bash','-uex'] : ['/bin/bash','-ue'] )
ENV_BASH = Collections.unmodifiableList(['/usr/bin/env', '-S', 'bash', BASH[1]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why introducing this variable instead of relying on process.shell?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this. process.shell is /bin/bash ... by default. Does this answer your question?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that after #4292 you should be able to use process.shell = ['/usr/bin/env, '-S', 'bash'] in your config.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I still need clarification: Do you suggest this PR is not necessary at all because one could set process.shell = ['/usr/bin/env, '-S', 'bash'] in a pipeline's configuration or do you suggest a different implementation (not using ENV_BASH as I have proposed) would be a technically better approach to achieve this PR's goal?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Yes", to my first or second interpretation? I feel like the are mutually exclusive. 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I suggest this PR is not necessary at all because one could set process.shell

Copy link
Author

@rollf rollf Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Disclaimer: I assume that the additional usage of /usr/bin/env -S bash on the host system (to invoke .command.run as well as for non-container task execution) is acceptable by the Nextflow maintainers and we just need to sort out the technical implementation. Note that the POSIX-compliant existence of /usr/bin/env is already expected by Nextflow and please also consider the PR description above.]

procoess.shell cannot not be used for various reasons:

  • The value of process.shell will be used for task execution in both the non-container as well as the container scenario. This means that one would need to set this to a value that works in the host as well as in the container env if one wants to freely disable/enable containerization. And this contradicts the whole point.
  • Setting process.shell =/usr/bin/env -S bash -ue (because I want -ue) in the nextflow.config causes problems during tracing (because due to Use shell directive with trace command in wrapper script #4292, Nextflow will favor process.shell's actual value [it's not BASH] and then use -ue which Nextflow specifically doesn't expect for tracing)
  • Even if it worked, that means that everybody without /bin/bash needs to adapt every pipeline they want to run. I'm not trying to fix a specific problem I have with a specific pipeline, I'd like to contribute here in order to solve the general problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of /usr/bin/env is not viable because adds another level of indirection of in the process execution making (even) more complex the composition of the launch command and the passing of shell option.

I propose to address the problem of providing a custom Bash path in a more structural manner in this PR #5696

@@ -65,6 +65,8 @@ class BashWrapperBuilder {

static final public List<String> BASH
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to rename BASH to BIN_BASH to make the difference clear.

@@ -131,7 +131,7 @@ class LocalTaskHandler extends TaskHandler implements FusionAwareTask {
}

protected ProcessBuilder localProcessBuilder() {
final cmd = new ArrayList<String>(BashWrapperBuilder.BASH) << wrapperFile.getName()
final cmd = new ArrayList<String>(BashWrapperBuilder.ENV_BASH) << wrapperFile.getName()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sure that .command.run gets executed correctly on a host without /bin/bash.

@@ -136,7 +136,7 @@ class TaskBean implements Serializable, Cloneable {
this.useMicromamba = task.getCondaConfig()?.useMicromamba()
this.spackEnv = task.getSpackEnv()
this.moduleNames = task.config.getModule()
this.shell = task.config.getShell() ?: BashWrapperBuilder.BASH
this.shell = task.config.getShell(task.isContainerEnabled()) ?: BashWrapperBuilder.BASH
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, task will always (?) have a shell since it comes with the process defaults. I have not touched the "else"-block here since I think it is actually not necessary (and it did not cause issues for me even though I don't have /bin/bash).

@@ -374,10 +374,10 @@ class TaskConfig extends LazyMap implements Cloneable {
return (List<String>) get('secret')
}

List<String> getShell() {
List<String> getShell(boolean isContainerEnabled = true) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to adapt the unit tests but I don't know how 🤷 .

@@ -85,6 +87,7 @@ class BashWrapperBuilder {
log.warn "Invalid value for `NXF_DEBUG` variable: $str -- See http://www.nextflow.io/docs/latest/config.html#environment-variables"
}
BASH = Collections.unmodifiableList( level > 0 ? ['/bin/bash','-uex'] : ['/bin/bash','-ue'] )
ENV_BASH = Collections.unmodifiableList(['/usr/bin/env', '-S', 'bash', BASH[1]])
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this. process.shell is /bin/bash ... by default. Does this answer your question?

@rollf rollf changed the title Allow Nextflow to be used on (host) systems without /bin/bash Use /usr/bin/env -S bash instead of /bin/bash on the host system Jan 21, 2025
@rollf rollf marked this pull request as ready for review January 21, 2025 12:38
On systems without /bin/bash (e.g. NixOS), invoking tasks may fail because
of some hard-coded references to /bin/bash. Those references are either fine
(because the task runs in a container where /bin/bash *is* available),
don't matter (because the shebang is ignored) or cause failure (because
/bin/bash cannot be invoked). This commit adapts Nextflow for the last
scenario. Instead of invoking `/bin/bash`, the actually available bash
executable is called.

Signed-off-by: Rolf Schröder <rolf.schroeder@limbus-medtec.com>
@rollf rollf force-pushed the nextflow-env-bash branch from a1eb9a5 to 3b55cc6 Compare January 22, 2025 05:06
@@ -125,7 +125,6 @@ class ProcessConfig implements Map<String,Object>, Cloneable {
static final Map<String,Object> DEFAULT_CONFIG = [
debug: false,
cacheable: true,
shell: BashWrapperBuilder.BASH,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file and the next one make sure that Nextflow does not assume /bin/bash execution by default. Rather, it used /bin/bash when the configuration requests task containerization, otherwise, /usr/bin/env -S bash is used (i.e. when the task is to be executed on the host system itself).

@pditommaso
Copy link
Member

Closing in favour of #5696

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants