-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
possible bug: slowdown and crashing when using uncompressed reference. #53
Comments
As a follow up, I've also come across an old support thread, saying an older version of bioperl could be to blame, but afaik, we're using 1.6.924
Cheers |
I'm surprised to see the VEP uses Bio::DB::Fasta if it can't use Bio::DB::HTS::Faidx, and this has a number of issues:
However, even using Bio::DB::HTS::Faidx, generating HGVS adds significant runtime to VEP due to various internal overheads (sequence lookup being one of them), so if you can in any way avoid this that will help. It is something we are looking at improving in future versions. |
Hi Will, Thanks |
I'd say that you may not see much faster performance using an uncompressed file, but the library is infinitely better with regards to index stability etc. If possible definitely use a bgzipped FASTA file; it's faster (I think around 10x) and of course takes up way less disk space too. |
Hi Will, I did some debugging and found the issue is caused by a "missing" htslib file. Cheers |
Presumably the lzma library? We also had this issue and had to switch to using a fixed release of htslib for Travis... |
Actually, the missing library is M |
Hi Will, I was checking out the automated installer en noticed that the default install of htslib is v1.3.2. Cheers |
Eventually we may, yes, but really it makes little difference - the differences between the versions AFAIK have no bearing on the functionality that VEP uses. |
Hi Will,
We're experiencing the following:
using the command:
I'm getting a
gzip: stdout: Broken pipe
error, which doesn't really do anything, but just looks ugly.What's worse is that this command takes forever to run (8h+) and then crashes. The issue is solved by removing the
--hgvs --shift_hgvs 1 --fasta /genomes/Hsapiens/hg38/seq/hg38.fa
flags, but then we are left without some crucial annotations.Since our vep installation has some HTSLIB issues, we're not able to test this with the default bgzipped fasta file.
I have a suspicion that the issue stems from the indexing of the fasta file. Instead of
hg38.fa.index
as was usual. I'm gettinghg38.fa.index.dir
andhg38.fa.index.pag
.Another (smaller) annoyance is that both Genesplicer and MaxEntScan don't work without specifying a fasta using the
--fasta
flag.Any idea's in this?
Cheers
M
The text was updated successfully, but these errors were encountered: