Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected calls to www.ebi.ac.uk when using cram files #180

Closed
bartcharbon opened this issue Mar 9, 2023 · 5 comments
Closed

Unexpected calls to www.ebi.ac.uk when using cram files #180

bartcharbon opened this issue Mar 9, 2023 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@bartcharbon
Copy link

We use clair3 for out variant calling and provide it with our own reference fasta file.
We are seeing some unexpected calls to https://www.ebi.ac.uk in the cases we are using cram files as input.

From the log:
Calling variants ...

[W::find_file_url] Failed to read reference "https://www.ebi.ac.uk/ena/cram/md5/3210fecf1eb92d5489da4346b3fddc6e": Broken pipe
Total processed positions in chr4 (chunk 3/39) : 0

Our hypothesis is that the samtools call used to work with the cram files is not provided with the reference file (-T argument), and therefor falls back to www.ebi.ac.uk.

In addition to the fact that we would like our pipeline to be selfcontained and not dependent on external servers, this can also lead to quite severe performance losses if the connection to www.ebi.ac.uk is slow.

Currently we work around the issue by converting our crams to bams and using those for clair3.

@SamStudio8
Copy link

This is an unfortunate side effect of CRAM. The good news is this lookup can be avoided through a "reference cache", see the section on REF_PATH and REF_CACHE here: https://www.htslib.org/workflow/cram.html. samtools is bundled with a perl script to build a cache for a reference. You'll want to build the cache as per their instructions and then export REF_PATH to any scripts that use the CRAM.

@aquaskyline
Copy link
Member

aquaskyline commented Apr 20, 2023

Requiring REF_PATH be set if CRAM is used in the next release. Overturned. Instead, users are asked to set REF_PATH in their own setup if necessary.

@zhengzhenxian
Copy link
Collaborator

zhengzhenxian commented Apr 26, 2023

@bartcharbon Based on our local testing, it seems that setting the REF_CACHE environment variable can resolve the URL issue you encountered. You can try export REF_PATH=${ABSOLUTE_REF_PATH} before running run_clair3.sh.

Additionally, we noticed that the errors in connecting to EBI can increase the processing time but do not affect the final result. Therefore, you can directly use the completed jobs' results without re-running them to save time.

@SamStudio8
Copy link

we noticed that the errors in connecting to EBI can increase the processing time but do not affect the final result

Indeed, it's just fetching the reference sequence from EBI using the checksum in the CRAM SQ lines.

@aquaskyline
Copy link
Member

Fixed in v1.0.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants