Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drmaa.errors.InvalidAttributeValueException #102

Closed
amayer21 opened this issue Oct 17, 2019 · 14 comments
Closed

drmaa.errors.InvalidAttributeValueException #102

amayer21 opened this issue Oct 17, 2019 · 14 comments

Comments

@amayer21
Copy link

amayer21 commented Oct 17, 2019

Hello,
I'm trying to set up cgat-core on a slurm cluster and can't make it send jobs to the nodes. I have the feeling it's something stupid I'm missing...

I wrote a very simple pipeline (attached) that's running perfectly well with the command

python pipeline_test.py make full -v5 --no-cluster

However if I remove the no-cluster option, I have the following error (see logs of Test5 attached for more detailed error message):

drmaa.errors.InvalidAttributeValueException: code 14: Invalid native specification: cpus-per-task=1 (Unsupported option: --cpus-per-task)

I also noticed in the log file that the cluster queue et queue manager were wrong

# cluster_queue                           : None \
# cluster_queue_manager                   : sge \

despite the fact that my ~/.cgat.yml file specifies

tmpdir: /scratch_giga/Alice
shared_tmpdir: /scratch_giga
cluster_queue_manager: slurm
queue: ptfgen
memory_ulimit: False

I know the ~/.cgat.yml file is read because the log file also specifies

# 2019-10-15 17:35:13,639 WARNING local temporary directory /scratch_giga/Alice did not exist - created

And further down in the same log file, I find --partition=all.q which confuses me quite a lot (I think it's the default partition on cgat cluster but not the one specified in the ~/.cgat.yml or in the pipeline.yml).

Of note, if I specify the queue and queue manager in the python command and run the pipeline with

python pipeline_test.py make full -v5 --cluster-queue-manager=slurm --cluster-queue=ptfgen

the log file (Test6, attached) now indicates the right cluster queue and queue manager but the partition is still all.q in

# 2019-10-15 17:36:03,313 INFO job-options: -J NGS19-J229_Yumie_B2_HTO_S1_L001_LineCount.txt  --cpus-per-task=1 --mem-per-cpu=4000 --partition=all.q

and I still have the same drmaa error

# 'drmaa.errors.InvalidAttributeValueException(code 14: Invalid native specification: cpus-per-task=1 (Unsupported option: --cpus-per-task))' raised in\
#                                      Task = def counting_line_fastq(...): \
#                                      Job  = [../Data/NGS19-J229_Yumie_B2_HTO_S1_L001_R1_001.fastq.gz -> Data/NGS19-J229_Yumie_B2_HTO_S1_L001_LineCount.txt] \

I've attached the pipeline_test.py.txt, the pipeline.yml.txt (.txt needed to be allowed to upload here) and the log files of Test5 with make full -v5 and Test6 with --cluster-queue-manager=slurm --cluster-queue=ptfgen. Happy to provide more info if needed or to test everything you think may help.

Thank you very much!

All the best,

Alice

pipeline_test.py.txt
pipeline-yml.txt
cgatcore_Test5_4374626.log
cgatcore_Test6_4374627.log

@Acribbs
Copy link
Contributor

Acribbs commented Oct 17, 2019

You haven't included the error, however have you set up your .cgat file correctly? cluster config documentation will help.

@amayer21
Copy link
Author

Sorry I did a mistake with keyboard shorcut when I started writing the message and I've been editing it since then ;-)
I think the .cgat file is correct...

@amayer21
Copy link
Author

I've also tried adding

cluster:
  queue: "ptfgen"
  queue_manager: "slurm"

to my pipeline.yml but it doesn't change anything (cluster_queue_manager:sge and same error message)

@Acribbs
Copy link
Contributor

Acribbs commented Oct 17, 2019

Can you try using queue_manager: slurm instead of cluster_queue_manager: slurm to your .cgat.yml file?

@amayer21
Copy link
Author

I just tried with

tmpdir: /scratch_giga/Alice
shared_tmpdir: /scratch_giga

cluster: 
    queue_manager: slurm
    queue: ptfgen
    memory_ulimit: False

which is copied from Kevin Rue-Albrecht .cgat.yml and it gave the exact same output...

@amayer21
Copy link
Author

amayer21 commented Oct 17, 2019

Not sure if it matters but the version of slurm installed on our cluster is 14.11.11 (quite old one I think).

@Acribbs
Copy link
Contributor

Acribbs commented Oct 17, 2019

ok, your .cgat.yml looks ok to me. Oh thats quite an old version, that was released 26th Nov 2014, looks like slurm is 19.05 as the most stable release.

Its quite difficult to debug on my end, but looking at the original error message (Unsupported option: --cpus-per-task) looks like there isn't an option --cpu-per-task in your version of slurm. However, that cant be the case because this option was added to slurm in 2008.

However, given that the options at the beginning of the pipeline suggesting that it is defaulting still to sge and not slurm this is most likely not an issue with slurm and with cgatcore.

I will have a think. Just checking the obvious your ~/cgat.yml is in your home folder and not elsewhere?

@amayer21
Copy link
Author

I just googled Unsupported option: --cpus-per-task and found this: natefoo/slurm-drmaa#2

I assume that's the DRMAA I'm using? According to release notes, it supports --cpus-per-task only since the last version (1.1.0). Before to run my pipeline, I have to do:

export DRMAA_LIBRARY_PATH=/cm/shared/apps/slurm-drmaa/lib/libdrmaa.so.1.0.6

so I guess our version doesn't support the cpu-per-task option...

@amayer21
Copy link
Author

Yes .cgat.yml is in my home folder

@amayer21
Copy link
Author

We are sharing our cluster with a diagnostic team from the hospital, which is why we can't update slurm :-( Having said that, the slurm-drmaa library 1.0.6 is from 2017, so that may be something we could update...

@Acribbs
Copy link
Contributor

Acribbs commented Oct 17, 2019

Ah yeh, it seems like your slurm-drmaa is too old then and you will need >1.1.0.

@amayer21
Copy link
Author

Hi Adam,
We've installed slurm-drmaa 1.1.0 and it solved the problem.
I also had to remove the --units and ElapsedRaw from the sacct options in the script cluster.py because they don't exist in slurm 14.11.11. My understanding is that these are there only to be printed in the pipeline.log for benchmarking. Is that right?

It's now working and I even don't have to specify the queue manager and the queue in the command line :-)

Just FYI, in the log file, it's still written

cluster_queue                           : None \
cluster_queue_manager                   : sge \

For me it's not a problem as it's actually using slurm and the partition specified in the ~/.cgat.yml file, but wanted to mention it just in case (log file attached: pipeline.log ).

Thanks for your help!
All the best,
Alice

@amayer21
Copy link
Author

PS: I'm so happy it works and to finally be able to write proper pipelines for my analysis :-)

@Acribbs
Copy link
Contributor

Acribbs commented Oct 21, 2019

Ah great! Very happy its working. Also thanks for the points you make. Happy scienceing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants