Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with Singularity #427

Closed
npavlovikj opened this issue Jun 4, 2020 · 7 comments
Closed

Error with Singularity #427

npavlovikj opened this issue Jun 4, 2020 · 7 comments
Labels
question Further information is requested

Comments

@npavlovikj
Copy link

npavlovikj commented Jun 4, 2020

Hi,

I was wondering if anyone has encountered the error Command error: FATAL: resolved path $WORK/.singularity/nfcore-rnaseq-1.4.2.img doesn't match with opened path $WORK/.singularity/nfcore-rnaseq-1.4.2.img (deleted) when running "rnaseq" with Singularity on a cluster?
I don't get this error every time though, just sometimes (usually the second time I run the command), and I haven't been able to figure out what is the reason for that.
The image exists in $WORK/.singularity when I see the error.

I use the following modules and variables:

module load singularity
module load nextflow
export NXF_SINGULARITY_CACHEDIR=$WORK/.singularity
export NXF_HOME=$WORK
export SINGULARITY_CACHEDIR=$WORK/.singularity

and the commands I have tried are:

nextflow run nf-core/rnaseq -r 1.4.2 -profile singularity --genome 'UMD3.1' --singleEnd --reads '*.fq.gz' -resume --max_cpus 16 --max_memory '100.GB' --max_time '168.h' -c nextflow.config

and

nextflow run nf-core/rnaseq -r 1.4.2 -with-singularity $WORK/.singularity/nfcore-rnaseq-1.4.2.img --genome 'UMD3.1' --singleEnd --reads '../*.fq.gz' -resume --max_cpus 16 --max_memory '100.GB' --max_time '168.h' -c nextflow.config

Both of them worked the first time, and usually give the error the second time they are executed (not resumed).

Is this a caching issue, and should I remove the cache after every run, or I am missing some additional configuration?
Any suggestions and possible ideas are highly appreciated!

Thank you,
Natasha

@apeltzer apeltzer added the question Further information is requested label Jun 6, 2020
@apeltzer
Copy link
Member

apeltzer commented Jun 6, 2020

Hi Natasha!

thanks for opening the issue and letting us know about this problem. The cache shouldn't be removed after every run, that should be just fine. Can you elaborate on which singularity version you are using as well? The error seems indeed a bit odd to me, might need to think a bit more.

@nf-core/core anyone else has an idea?

@npavlovikj
Copy link
Author

Hi @apeltzer ,

Thank you for your reply!

We are using Singularity 3.5.3 on EL6. I do agree that the error seems odd - especially that it doesn't happen every time.

This is more detailed log from the latest run:

[49/50e8a9] process > output_documentation           [100%] 1 of 1, failed: 1 ✘
Error executing process > 'output_documentation'
Caused by:
  Process `output_documentation` terminated with an error exit status (255)
Command executed:
  markdown_to_html.r output.md results_description.html
Command exit status:
  255
Command output:
  (empty)
Command error:
  FATAL:   resolved path $WORK/.singularity/cache/oci-tmp/683625354fdcdefaad1a2710e4ba0517ac5316e47dddd6148aa8d685282a857b/rnaseq_1.4.2.sif doesn't match with opened path $WORK/.singularity/cache/oci-tmp/683625354fdcdefaad1a2710e4ba0517ac5316e47dddd6148aa8d685282a857b/rnaseq_1.4.2.sif (deleted)

I switched to using conda as a profile, and multiple runs of the same data worked fine. While using conda is a feasible and working solution for me, I am really curious to understand the reason behind the Singularity issue I encountered.

Thank you,
Natasha

@apeltzer
Copy link
Member

apeltzer commented Jun 8, 2020

Thank you Natasha for providing more feedback on this!

Could you maybe test if its e.g. always failing on the same node? We have a debug profile that you could run alongside to trace back if e.g. the singularity installation is broken on a specific node. Sounds a bit like searching for the needle in the haystack, but I've seen this and it could explain what happens here (e.g. bind paths for Singularity on one node were not correct in one case I remember from a while ago).

Simply use your command and add debug to it:

nextflow run nf-core/rnaseq.... -profile foobar,singularity,debug (or similar)

I'd also be interested in finding out what caused this - glad you shared your experience here.

@npavlovikj
Copy link
Author

I did a little bit more searching, and I found this apptainer/singularity#3249 (comment).
Looks like this is a known issue with Singularity 3.5.* on EL6 systems that should be fixed in the newest 3.6 release.
Although using -with-singularity should have worked in this case, I do think the combination of Singularity and OS versions is the reason for the error I have observed.

@apeltzer , I really appreciate you looking into this error with me, and please feel free to close this issue.

Thank you,
Natasha

@apeltzer
Copy link
Member

apeltzer commented Jun 9, 2020

Thank you (!) for pointing us towards this issue & thereby resolving another issue we never got fixed (and we know now why!) 🥇

Thank you!

Alex

@apeltzer
Copy link
Member

apeltzer commented Jun 9, 2020

(I've linked from the other issue to this thread ... -> thanks for that super helpful link to Singularity :-))

@npavlovikj
Copy link
Author

Hah, I was not aware of the other issue, but I am glad that all unintentionally worked out :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants