Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implicit variables are null when using kuberun #1050

Closed
bentsherman opened this issue Feb 26, 2019 · 23 comments
Closed

Implicit variables are null when using kuberun #1050

bentsherman opened this issue Feb 26, 2019 · 23 comments

Comments

@bentsherman
Copy link
Member

Bug report

Expected behavior and actual behavior

When I use ${baseDir} in my config file, it is properly recognized when I run my pipeline using nextflow run:

$ cat nextflow.config
[...]
      clip_path = "${baseDir}/files/fasta_adapter.txt"
[...]
$ nextflow -C nextflow.config run systemsgenetics/GEMmaker -with-docker
[...]
  Trimmomatic clip path:      ~/.nextflow/assets/systemsgenetics/GEMmaker/files/fasta_adapter.txt
[...]

Where nextflow.config is taken from nextflow.config.example in the GEMmaker repo. However, when I try the same thing with kuberun, nextflow does not recognize ${baseDir}:

$ nextflow -C nextflow.config kuberun systemsgenetics/GEMmaker
[...]
  Trimmomatic clip path:      null/files/fasta_adapter.txt
[...]

Steps to reproduce the problem

See above example.

Program output

See above example.

Environment

  • Nextflow version: 19.01.0
  • Java version: 1.8.0_191
  • Operating system: Ubuntu
@pditommaso
Copy link
Member

Made some tests but I was not able to replicate this

@bentsherman
Copy link
Member Author

@pditommaso Could you be more specific? Do you have a kubernetes cluster that you could test it on?

Also I'm using 19.01 but is there an easy way to test the latest build, like a nightly build?

@pditommaso
Copy link
Member

I'm using a local K8s cluster deployed along with Docker.

Also I'm using 19.01 but is there an easy way to test the latest build, like a nightly build?

Yes, you can try latest edge release using this command:

NXF_VER=19.03.0-edge nextflow run .. etc

Or exporting the NXF_VER variable in your (host) environment and using NF as usual.

@bentsherman
Copy link
Member Author

It looks like $baseDir is evaluated in nextflow.config before it arrives on the k8s cluster. I checked the configmap which transmits nextflow.config:

$ kubectl get configmap nf-config-f5791ed2 -o yaml
[...]
             clip_path = 'null/files/fasta_adapter.txt'
[...]

Could we change nextflow so that it doesn't evaluate variables like $baseDir until the config file is in kubernetes?

@pditommaso
Copy link
Member

ah, this helps a lot. Let me check.

@pditommaso
Copy link
Member

This is quite tricky to solve. Unfortunately, for now, you will need to avoid the use baseDir in the config when deploying via kubern.

@bentsherman
Copy link
Member Author

Yes I suspected it might be that way. It's no problem, I can work around it pretty easily.

@stale
Copy link

stale bot commented Apr 27, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Apr 27, 2020
@pditommaso pditommaso added stale and removed wontfix labels Apr 27, 2020
@stale stale bot closed this as completed Jun 26, 2020
@bentsherman
Copy link
Member Author

Just wanted to add that projectDir has the same issue, so I think this problem is general to all implicit variables as they are resolved in nextflow.config before executing nextflow run on the k8s cluster. For anyone who is dealing with this problem, you can work around it for any param that uses an implicit variable by specifying that param with the explicit project directory on k8s via command-line or params file.

@bentsherman bentsherman changed the title baseDir in config file is null when using kuberun Implicit variables are null when using kuberun Sep 28, 2021
@xhejtman
Copy link
Contributor

Paolo pointed me to this issue. I might have similar issue as functions from nextflow.config are not passed when using kuberun which make some pipelines to fail, e.g., nf-core/sarek. Is there any workaround for this? Our workaround is using patched nextflow binaries so that a hook is executed before running nextflow and adding the functions to pod's nextflow.config. Is there any possiblity to do it without patch? Or are you open to implementing some kind of hooks support? Can create PR for hooks.

@bentsherman
Copy link
Member Author

What if you include the functions in the main nextflow.config in the github repo? Does it work then?

kuberun is really only intended for basic usage of k8s, so it doesn't support everything you might want to do. You can always fall back to doing manually what kuberun does automatically: create a yaml for a submitter pod that launches your nextflow pipeline with whatever options, then create the pod. In fact I created a script to do just this. I made it by reverse engineering the pod yaml that is generated by kuberun. IIRC it does not have any problem with implicit variables, and shouldn't have a problem with extra functions.

Feel free to submit a PR based on your work. We might be able to figure out a solution from it. But in general it is recommended that you use an external service like Nextflow Tower to manage submitter pods instead of kuberun.

@bentsherman bentsherman reopened this Jan 18, 2022
@stale stale bot removed the stale label Jan 18, 2022
@xhejtman
Copy link
Contributor

Are you able to run nf-core/sarek pipeline then? I think you do not need any data as it fails on the functions.

To be more clear, I do not use functions in my local nextflow.config.

As can be seen here: (at the very end)
https://github.com/nf-core/sarek/blob/master/nextflow.config

there is a function check_resources. kuberun download this config from git (I know you can alter the local copy, but it is a bit ugly to me), it creates a config map from this config. In this config map, the function is missing and it fails as check_resources it not defined.

So we created a script, that is injected into command of the master pod, this script simply appends the function to the nextflow.config from the config map.

As of PR, I was wondering if adding script name to be executed in the master pod before executing nextflow itself would be accepted? If so, I will happily do PR. I suppose using cmdline option to specify path + scriptname? -headscript ?

@bentsherman
Copy link
Member Author

I will try nf-core/sarek with kuberun when I get a chance this week. But I understand the problem you describe.

I don't think your general solution of injecting an arbitrary pre-script hook will be accepted into Nextflow. We might be able to fix this particular problem with kuberun. I will look into it when I get a chance, this week or next week hopefully.

Either way, you can provision your own submitter node (like a head node on a HPC cluster) and launch the pipeline from that node using nextflow run with k8s executor. That is essentially what kuberun does for you in the background.

@xhejtman
Copy link
Contributor

I don't think your general solution of injecting an arbitrary pre-script hook will be accepted into Nextflow. We might be able to fix this particular problem with kuberun. I will look into it when I get a chance, this week or next week hopefully.

Well, I see that tower.nf offers to run pre-run script. So why not to add the same for CLI?

@bentsherman
Copy link
Member Author

Ah, I haven't played with that feature in Tower, so I'll have to look at it and see how it might be imitated in Nextflow itself. But also, if you're already using Tower then why do you still need kuberun?

@xhejtman
Copy link
Contributor

xhejtman commented Jan 22, 2022

No, I just tried your demo to see what tower offers and see if it can be used by someone that is not familiar with nextflow and pipelines and cmd line. But some users still prefer CLI as they have a single shell script which they just need to run.

@bentsherman
Copy link
Member Author

I finally was able to spin up a GKE cluster and test the sarek pipeline. It fails with kuberun as you say, but I was able to get around kuberun by the following:

  • provision a head pod manually via kubectl create
  • login to the head pod via kubectl exec
  • run the nextflow pipeline like normal: nextflow run nf-core/sarek -profile test

I provide the PVC through a config setting:

k8s {
  storageClaimName = '<pvc-name>'
}

I used kube-login.sh from this repo to provision the head node.

With Tower it makes to have a pre-run script because Tower is responsible for provisioning the head node. Nextflow isn't intended to provision head nodes in general, it just so happens that it can hack one together in k8s via kuberun. But kuberun has many limitations, including this issue, so it is not fit for production use. If someone can find a solution to this issue then a PR is welcome, but ultimately the solution is to use an external service like Tower.

If someone prefers the CLI, there is now a Tower CLI too. If they just have a small test and kuberun isn't working for them, they can use the solution I described here.

@xhejtman
Copy link
Contributor

xhejtman commented Feb 5, 2022

OK, thanks for testing.

Do you know why kuberun manipulates the nextflow.config? Is there some strong reason for it? I basically need kuberun just to provision the head node and running nextflow, mostly the same thing kube-login.sh does. Was kuberun intended for something more?

We already run nextflow from some 'proxy' pod, but you need to run it at least from tmux/screen to run it in background.

To conclude, if you say that kuberun is not for production use, we can provide workaround (e.g. as you suggested), however I see kuberun easier for end user.

@bentsherman
Copy link
Member Author

bentsherman commented Feb 7, 2022

Eventually I'd like to study how kuberun works under the hood, but I think the problem is that kuberun resolves the nextflow config prematurely, which is why implicit variables and functions get lost in the process.

The other script in that repo, kube-run.sh, really does what I want kuberun to do, which is to provision a head pod, copy any local nextflow config, launch nextflow like normal from the head pod, and then sit back and forward the logs. That script also avoids the implicit variable issue and you don't have to stay connected to the pod. That script is really the ideal workaround, more so that kube-login.sh, and that's what I recommend you use so long as this issue persists in kuberun.

@bentsherman
Copy link
Member Author

See also #870 (comment)

@olgabot
Copy link

olgabot commented Jun 2, 2022

Seems related to this: nextflow-io/rnaseq-nf#13

@stale
Copy link

stale bot commented Jan 16, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@bentsherman
Copy link
Member Author

Closing since we are moving away from kuberun. Use a workflow platform (e.g. Tower) to manage your submitter pods, or use Wave+Fusion to run Nextflow without a submitter pod / PVC.

@bentsherman bentsherman closed this as not planned Won't fix, can't repro, duplicate, stale Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants