Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

creating a custom config file for slurm cluster? #644

Closed
Thomieh73 opened this issue Jul 27, 2024 · 1 comment
Closed

creating a custom config file for slurm cluster? #644

Thomieh73 opened this issue Jul 27, 2024 · 1 comment

Comments

@Thomieh73
Copy link

Hi, I have started to try out the mag pipeline. Instead of trying my own data, I now first decided to run the test profile. Makes more sense :-) But I run into some configuration trouble.

Here is what I did.
I can run the command:

 nextflow run nf-core/mag -r 3.0.2 -profile test,apptainer --outdir mag_test -work-dir $USERWORK/nf_mag -resume

That finished okay, but runs on the login node of our HPC cluster, which is not okay.

So I created costum config file, based on the base.config file provided with the pipeline and modified it slightly
See attached here: saga_mag.config.txt

In short, I added these three lines to the process:

executor                = 'slurm'
    clusterOptions          = '--job-name=Saga_nxf --account=nn10070k'
    queueSize               = 24

and to jobs with large memory I added:

clusterOptions  = '--job-name=Saga_nxf --account=nn10070k --partition=bigmem'

I then also found out that I needed to address the missing function check_max in the config file. I added that entire function by copying this into my config file:

// Function to ensure that resource requirements don't go beyond
// a maximum limit
def check_max(obj, type) {
    if (type == 'memory') {
        try {
            if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1)
                return params.max_memory as nextflow.util.MemoryUnit
            else
                return obj
        } catch (all) {
            println "   ### ERROR ###   Max memory '${params.max_memory}' is not valid! Using default value: $obj"
            return obj
        }
    } else if (type == 'time') {
        try {
            if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1)
                return params.max_time as nextflow.util.Duration
            else
                return obj
        } catch (all) {
            println "   ### ERROR ###   Max time '${params.max_time}' is not valid! Using default value: $obj"
            return obj
        }
    } else if (type == 'cpus') {
        try {
            return Math.min( obj, params.max_cpus as int )
        } catch (all) {
            println "   ### ERROR ###   Max cpus '${params.max_cpus}' is not valid! Using default value: $obj"
            return obj
        }
    }
}

Next I ran the pipeline with this command:

nextflow run nf-core/mag -r 3.0.2 -profile test,apptainer --outdir mag_test -work-dir $USERWORK/nf_mag -resume -c saga_mag.config

It however fails, because the slurm jobs do not get memory allocated. I see that when I check the files .command.run in the process folder of different tasks.
I see something like this in the top of the file:

#!/bin/bash
#SBATCH -J nf-NFCORE_MAG_MAG_KRONA_KRONADB
#SBATCH -o /cluster/work/users/thhaverk/nf_mag/0c/a7ed03141b73e94a5d8286720673ea/.command.log
#SBATCH --no-requeue
#SBATCH --signal B:USR2@30
#SBATCH --job-name=Saga_nxf --account=nn10070k
NXF_CHDIR=/cluster/work/users/thhaverk/nf_mag/0c/a7ed03141b73e94a5d8286720673ea
### ---
### name: 'NFCORE_MAG:MAG:KRONA_KRONADB'
### container: '/cluster/work/users/thhaverk/apptainer_img/quay.io-biocontainers-krona-2.7.1--pl526_5.img'
### outputs:
### - 'taxonomy/taxonomy.tab'
### - 'versions.yml'
### ...
set -e
set -u
NXF_DEBUG=${NXF_DEBUG:=0}; [[ $NXF_DEBUG > 1 ]] && set -x
NXF_ENTRY=${1:-nxf_main}

So it seems that I do get a slurm job requested, but the .command.run file does not get memory, time and cpu's assigned.

I do not understand why it is failing. I checked the nextflow slack group and found the thread for https://nextflow.slack.com/archives/CEQBS091V/p1614245714076900

Where a similar set-up is described. Any ideas what I am missing here?

@jfy133
Copy link
Member

jfy133 commented Jul 27, 2024

Hmm, I'm not sure. Unfortunately it's hard to look from my phone...

However if you've copied from base config that is maybe overkill and makes it harder to debug. You so don't need everything from in there just to get the pipeline working with your cluster.

I would suggest maybe start from scratch with making the config following the following tutorial:

https://nf-co.re/docs/tutorials/use_nf-core_pipelines/config_institutional_profile

And see if it's still not working then.

I'm going to close this issue as this is a nextflow configuration issue rather than a pipeline issue, but feel free to keep commenting here and I'll try to keep checking it.

Or even better, ask on the nf-core slack (Https://nf-co.re/join for instructions if you're not already there) on the #configs channel if you're still having issues

@jfy133 jfy133 closed this as completed Jul 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants