Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[E::bwa_idx_load_from_disk] fail to locate the index files #14

Open
v-mukhina opened this issue Mar 5, 2024 · 7 comments
Open

[E::bwa_idx_load_from_disk] fail to locate the index files #14

v-mukhina opened this issue Mar 5, 2024 · 7 comments

Comments

@v-mukhina
Copy link

Hi Sara,
could you please help me?
My issue is probably related to this one #10
I'm using following singularity command to run fastvifi on a test files

python run_kraken_vifi_container.py
--singularity
--input-file test/test_reads_1.fq
--input-file-2 test/test_reads_2.fq
--output-dir ../test_out
--virus hpv
--kraken-db-path ../kraken_datasets
--vifi-viral-ref-dir ../viral_data/
--human-chr-list test/human_chr_list.txt
--vifi-human-ref-dir ../data_repo
--level sample-level --skip-bwa-filter --keep-intermediate-files

Right after kraken finishes I face a bwa-related error

...
[E::bwa_idx_load_from_disk] fail to locate the index files
Traceback (most recent call last):
File "/home/ViFi/scripts/get_trans_new.py", line 104, in
bamFile = pysam.Samfile(opts.dataName[0], 'rb')
File "pysam/libcalignmentfile.pyx", line 747, in pysam.libcalignmentfile.AlignmentFile.cinit
File "pysam/libcalignmentfile.pyx", line 996, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False
[E::hts_open_format] Failed to open file "/home/output/output_hpv.unknown.bam" : No such file or directory
...

All subsequent bam files are also empty.

I have data_repo and viral_data loaded from the google drive using link from the readme and there are no files looking like bwa index files. This error does not disappear after indexing hg38 and hg19 fasta files in data_repo. How do i fix this error?

Btw, it appears that hg19 value is hardcoded here
https://github.com/sara-javadzadeh/ViFi/blob/b1a649685af0620a1d16a8940bb3e21db0fa17b5/scripts/cluster.sh#L10C1-L17C1
I am not sure if this script is used anywhere.

Best, Vera

@sara-javadzadeh
Copy link
Owner

Hi Vera,

Thanks for reaching out and sorry for the delay in response. This error is complaining about the input bam file not being present when ViFi is attempting to process. This could happen if one of the filtering steps that's running before the ViFi step is failing. Therefore, the input file to ViFi is empty and that's why we get the error. Could you please share all the non-empty intermediate fasta/fastq and bam files created by the command? I'm trying to figure out which step of the way is causing the problem.

Best,
Sara

@v-mukhina
Copy link
Author

v-mukhina commented Mar 19, 2024

Unfortunately I deleted all related files already and switched to another software. However, it looks like the issue is not the bam file itself but the reference one. bwa_idx_load_from_disk error usually pops up when the reference fasta file is not indexed by bwa index. I believe ViFi crashes on the very first bwa command (bwa mem?) that requires those index files for the reference and then all following bam files are empty or just absent.

@v-mukhina
Copy link
Author

oh wait I found them! all bam files are empty but fastq files are not
Archive.zip

@sara-javadzadeh
Copy link
Owner

Hi Vera,

Sorry to hear about your troubles with FastViFi. Thanks for sharing the output files. As you mentioned, It looks like the kraken step works well and the ViFi step fails. I could not replicate this problem as it works correctly on my end, using your exact command. I hear your point about index files for reference fasta files and it sounds valid. But it looks like the problem persists after you indexed the GrCh38 reference file in data_repo directory. 
ViFi uses viral_data/hpv/grch38_hpv.fas file to map the input fastq files to the reference human and viral genomes. Do you have this file present in the downloaded viral_data directory? If so, could you please try indexing this fasta file as well and trying again?

Also, could you please share the version of singularity you are using? I am successfully running tests with singularity version 3.8.6.

The point you mentioned about HG19 reference being hard-coded in the code, is a good catch, but that code is not called for viral read detection.

Best,
Sara

@v-mukhina
Copy link
Author

v-mukhina commented Mar 20, 2024

this is what i have in the viral_data/hpv folder (viral_data.tar.gz was downloaded from the vifi repository as suggested in the readme):
image

@v-mukhina
Copy link
Author

I've indexed hpv.unaligned.fas on my own to ensure this was not the reason for my issue.

@sara-javadzadeh
Copy link
Owner

Hi Vera,

I believe I understand the source of problem. There should be a grch38_hpv.fas and corresponding index file in the viral_data/hpv folder. This file is automatically created using these two lines in the setup_linux_mac.sh in the ViFi repo. I suggest running the whole script setup_linux_mac.sh. Moreover, as you already downloaded the data_repo and viral_data, please make sure to copy/move them to where setup_linux_mac.sh script is (in ViFi directory), before running it so it does not download the two directories again. The script creates human-viral-reference files for three viruses: HPV, HBV and HCV. If you are interested only in HPV, feel free to edit this line on the script to only run for hpv.

You should have a grch38_hpv.fas and corresponding index file in the viral_data/hpv directory after running this command. If you cannot see these files after running the setup script, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants