drmaa.errors.InvalidAttributeValueException #102

amayer21 · 2019-10-17T18:18:14Z

Hello,
I'm trying to set up cgat-core on a slurm cluster and can't make it send jobs to the nodes. I have the feeling it's something stupid I'm missing...

I wrote a very simple pipeline (attached) that's running perfectly well with the command

python pipeline_test.py make full -v5 --no-cluster

However if I remove the no-cluster option, I have the following error (see logs of Test5 attached for more detailed error message):

drmaa.errors.InvalidAttributeValueException: code 14: Invalid native specification: cpus-per-task=1 (Unsupported option: --cpus-per-task)

I also noticed in the log file that the cluster queue et queue manager were wrong

# cluster_queue                           : None \
# cluster_queue_manager                   : sge \

despite the fact that my ~/.cgat.yml file specifies

tmpdir: /scratch_giga/Alice
shared_tmpdir: /scratch_giga
cluster_queue_manager: slurm
queue: ptfgen
memory_ulimit: False

I know the ~/.cgat.yml file is read because the log file also specifies

# 2019-10-15 17:35:13,639 WARNING local temporary directory /scratch_giga/Alice did not exist - created

And further down in the same log file, I find --partition=all.q which confuses me quite a lot (I think it's the default partition on cgat cluster but not the one specified in the ~/.cgat.yml or in the pipeline.yml).

Of note, if I specify the queue and queue manager in the python command and run the pipeline with

python pipeline_test.py make full -v5 --cluster-queue-manager=slurm --cluster-queue=ptfgen

the log file (Test6, attached) now indicates the right cluster queue and queue manager but the partition is still all.q in

# 2019-10-15 17:36:03,313 INFO job-options: -J NGS19-J229_Yumie_B2_HTO_S1_L001_LineCount.txt  --cpus-per-task=1 --mem-per-cpu=4000 --partition=all.q

and I still have the same drmaa error

# 'drmaa.errors.InvalidAttributeValueException(code 14: Invalid native specification: cpus-per-task=1 (Unsupported option: --cpus-per-task))' raised in\
#                                      Task = def counting_line_fastq(...): \
#                                      Job  = [../Data/NGS19-J229_Yumie_B2_HTO_S1_L001_R1_001.fastq.gz -> Data/NGS19-J229_Yumie_B2_HTO_S1_L001_LineCount.txt] \

I've attached the pipeline_test.py.txt, the pipeline.yml.txt (.txt needed to be allowed to upload here) and the log files of Test5 with make full -v5 and Test6 with --cluster-queue-manager=slurm --cluster-queue=ptfgen. Happy to provide more info if needed or to test everything you think may help.

Thank you very much!

All the best,

Alice

pipeline_test.py.txt
pipeline-yml.txt
cgatcore_Test5_4374626.log
cgatcore_Test6_4374627.log

The text was updated successfully, but these errors were encountered:

Acribbs · 2019-10-17T18:49:08Z

You haven't included the error, however have you set up your .cgat file correctly? cluster config documentation will help.

amayer21 · 2019-10-17T18:59:14Z

Sorry I did a mistake with keyboard shorcut when I started writing the message and I've been editing it since then ;-)
I think the .cgat file is correct...

amayer21 · 2019-10-17T19:03:06Z

I've also tried adding

cluster:
  queue: "ptfgen"
  queue_manager: "slurm"

to my pipeline.yml but it doesn't change anything (cluster_queue_manager:sge and same error message)

Acribbs · 2019-10-17T19:06:38Z

Can you try using queue_manager: slurm instead of cluster_queue_manager: slurm to your .cgat.yml file?

amayer21 · 2019-10-17T19:15:32Z

I just tried with

tmpdir: /scratch_giga/Alice
shared_tmpdir: /scratch_giga

cluster: 
    queue_manager: slurm
    queue: ptfgen
    memory_ulimit: False

which is copied from Kevin Rue-Albrecht .cgat.yml and it gave the exact same output...

amayer21 · 2019-10-17T19:17:18Z

Not sure if it matters but the version of slurm installed on our cluster is 14.11.11 (quite old one I think).

Acribbs · 2019-10-17T19:49:13Z

ok, your .cgat.yml looks ok to me. Oh thats quite an old version, that was released 26th Nov 2014, looks like slurm is 19.05 as the most stable release.

Its quite difficult to debug on my end, but looking at the original error message (Unsupported option: --cpus-per-task) looks like there isn't an option --cpu-per-task in your version of slurm. However, that cant be the case because this option was added to slurm in 2008.

However, given that the options at the beginning of the pipeline suggesting that it is defaulting still to sge and not slurm this is most likely not an issue with slurm and with cgatcore.

I will have a think. Just checking the obvious your ~/cgat.yml is in your home folder and not elsewhere?

amayer21 · 2019-10-17T19:49:41Z

I just googled Unsupported option: --cpus-per-task and found this: natefoo/slurm-drmaa#2

I assume that's the DRMAA I'm using? According to release notes, it supports --cpus-per-task only since the last version (1.1.0). Before to run my pipeline, I have to do:

export DRMAA_LIBRARY_PATH=/cm/shared/apps/slurm-drmaa/lib/libdrmaa.so.1.0.6

so I guess our version doesn't support the cpu-per-task option...

amayer21 · 2019-10-17T19:50:16Z

Yes .cgat.yml is in my home folder

amayer21 · 2019-10-17T19:55:05Z

We are sharing our cluster with a diagnostic team from the hospital, which is why we can't update slurm :-( Having said that, the slurm-drmaa library 1.0.6 is from 2017, so that may be something we could update...

Acribbs · 2019-10-17T19:55:21Z

Ah yeh, it seems like your slurm-drmaa is too old then and you will need >1.1.0.

amayer21 · 2019-10-21T14:29:59Z

Hi Adam,
We've installed slurm-drmaa 1.1.0 and it solved the problem.
I also had to remove the --units and ElapsedRaw from the sacct options in the script cluster.py because they don't exist in slurm 14.11.11. My understanding is that these are there only to be printed in the pipeline.log for benchmarking. Is that right?

It's now working and I even don't have to specify the queue manager and the queue in the command line :-)

Just FYI, in the log file, it's still written

cluster_queue                           : None \
cluster_queue_manager                   : sge \

For me it's not a problem as it's actually using slurm and the partition specified in the ~/.cgat.yml file, but wanted to mention it just in case (log file attached: pipeline.log ).

Thanks for your help!
All the best,
Alice

amayer21 · 2019-10-21T14:32:35Z

PS: I'm so happy it works and to finally be able to write proper pipelines for my analysis :-)

Acribbs · 2019-10-21T15:00:48Z

Ah great! Very happy its working. Also thanks for the points you make. Happy scienceing.

amayer21 closed this as completed Oct 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drmaa.errors.InvalidAttributeValueException #102

drmaa.errors.InvalidAttributeValueException #102

amayer21 commented Oct 17, 2019 •

edited

Loading

Acribbs commented Oct 17, 2019

amayer21 commented Oct 17, 2019

amayer21 commented Oct 17, 2019

Acribbs commented Oct 17, 2019

amayer21 commented Oct 17, 2019

amayer21 commented Oct 17, 2019 •

edited

Loading

Acribbs commented Oct 17, 2019

amayer21 commented Oct 17, 2019

amayer21 commented Oct 17, 2019

amayer21 commented Oct 17, 2019

Acribbs commented Oct 17, 2019

amayer21 commented Oct 21, 2019

amayer21 commented Oct 21, 2019

Acribbs commented Oct 21, 2019

drmaa.errors.InvalidAttributeValueException #102

drmaa.errors.InvalidAttributeValueException #102

Comments

amayer21 commented Oct 17, 2019 • edited Loading

Acribbs commented Oct 17, 2019

amayer21 commented Oct 17, 2019

amayer21 commented Oct 17, 2019

Acribbs commented Oct 17, 2019

amayer21 commented Oct 17, 2019

amayer21 commented Oct 17, 2019 • edited Loading

Acribbs commented Oct 17, 2019

amayer21 commented Oct 17, 2019

amayer21 commented Oct 17, 2019

amayer21 commented Oct 17, 2019

Acribbs commented Oct 17, 2019

amayer21 commented Oct 21, 2019

amayer21 commented Oct 21, 2019

Acribbs commented Oct 21, 2019

amayer21 commented Oct 17, 2019 •

edited

Loading

amayer21 commented Oct 17, 2019 •

edited

Loading