Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kraken2 database structure - "Missing output file(s) */*.k2d" #187

Closed
skrakau opened this issue May 6, 2021 · 9 comments
Closed

Kraken2 database structure - "Missing output file(s) */*.k2d" #187

skrakau opened this issue May 6, 2021 · 9 comments

Comments

@skrakau
Copy link
Member

skrakau commented May 6, 2021

Thanks. Not sure how this is supposed to be used. I pulled the k2_pluspf_20210127.tar.gz database. It errored out and seems to be trying only to do "tar xf" and not "tar xfz" so I decompressed the file. But I still get an error. I even tried to untar everything so it could find the expected k2d files. But i still get the following error when running "nextflow run nf-core/mag -profile singularity --input 'CSJP002A_{R1,R2}.fastq.gz' -c nextflow.config --max_memory '186.GB' --max_cpus 20 --kraken2_db ../kraken2_db_plus_protozoa_fungi/k2_pluspf_20210127.tar"

and

[gailr@node005 Assembly_Binning_old]$ ls -l ../kraken2_db_plus_protozoa_fungi/
total 125734856
-rw-r--r-- 1 gailr gailr 2633638 Jan 25 14:08 database100mers.kmer_distrib
-rw-r--r-- 1 gailr gailr 2368734 Jan 25 15:56 database150mers.kmer_distrib
-rw-r--r-- 1 gailr gailr 2172211 Jan 25 18:15 database200mers.kmer_distrib
-rw-r--r-- 1 gailr gailr 2015809 Jan 25 21:04 database250mers.kmer_distrib
-rw-r--r-- 1 gailr gailr 1856924 Jan 26 00:23 database300mers.kmer_distrib
-rw-r--r-- 1 gailr gailr 3008505 Jan 25 11:50 database50mers.kmer_distrib
-rw-r--r-- 1 gailr gailr 2831915 Jan 25 12:51 database75mers.kmer_distrib
-rw-r--r-- 1 gailr gailr 53493044692 Jan 25 10:49 hash.k2d
-rw-r--r-- 1 gailr gailr 2182856 Jan 25 10:56 inspect.txt
-rw-rw-r-- 1 gailr gailr 53518428160 Jan 27 07:17 k2_pluspf_20210127.tar
-rw-r--r-- 1 gailr gailr 64 Jan 25 10:49 opts.k2d
-rw-r--r-- 1 gailr gailr 3766899 Jan 25 06:37 seqid2taxid.map
-rw-r--r-- 1 gailr gailr 2529898 Jan 25 06:50 taxo.k2d


Error executing process > 'kraken2_db_preparation (1)'

Caused by:
Missing output file(s) */*.k2d expected by process kraken2_db_preparation (1)

Command executed:

tar -xf "k2_pluspf_20210127.tar"

Command exit status:
0

Command output:
(empty)

Work dir:
/ifs/groups/eces450650Grp/data/Assembly_Binning_old/work/ad/090d71c71ff568764e9e894c69eab2

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

Originally posted by @gailrosen in #186 (comment)

@skrakau
Copy link
Member Author

skrakau commented May 6, 2021

See also #186

@skrakau
Copy link
Member Author

skrakau commented May 6, 2021

The mag pipeline currently assumes that within the input *.tgz or .tar.gz (both should work) is a folder that contains the *.k2d file, i.e. it is looking for a file */*.k2d after unpacking. This was the case for previous prebuild kraken2 databases. If something changed, this would cause an error.

If this is the case please let us know and provide a link to the respective database which does not work.

However, as an instant solution, you could create this directory structure yourself and compress it accordingly.

@gailrosen
Copy link

But you can see above that went i decompress and untar it, it does have k2d files:
-rw-r--r-- 1 gailr gailr 64 Jan 25 10:49 opts.k2d
-rw-r--r-- 1 gailr gailr 3766899 Jan 25 06:37 seqid2taxid.map
-rw-r--r-- 1 gailr gailr 2529898 Jan 25 06:50 taxo.k2d

I have tried putting the database in my current directory... i have tried doing "singularity {
enabled = true
autoMounts = true
}" in the nextflow.config

I got the file from here (and as you can clearly see above, it has k2d files in it). https://genome-idx.s3.amazonaws.com/kraken/k2_pluspf_20210127.tar.gz

very frustrated... but i wouldn't be suprised if it's trying to uncompress/untar it into a directory that is not binded to the Singularity and therefore, all the files disappear when it looks for it?

@skrakau
Copy link
Member Author

skrakau commented May 6, 2021

these files need to be in an extra subfolder (see https://github.com/nf-core/test-datasets/raw/mag/test_data/minigut_kraken.tgz as an example)

@gailrosen
Copy link

that seems to have worked and it is processing. holding my breath

@gailrosen
Copy link

This is fantastic -- I actually saw a "successfully completed" this time. Now, I have to figure out.. if I want to now do CAT analysis... can I just do the -resume flag and not have to recompute everything all over again? Thank you for being patient with all my noob questions!

@d4straub
Copy link
Collaborator

d4straub commented May 7, 2021

if I want to now do CAT analysis... can I just do the -resume flag and not have to recompute everything all over again?

correct, just add -resume and any other parameters. But just to be on the save side you could also add --outdir results_new to not overwrite your previous results in case something goes wrong.

@gailrosen
Copy link

Excellent advice.. thank you.

@skrakau
Copy link
Member Author

skrakau commented May 25, 2021

Addressed in #194.

@skrakau skrakau closed this as completed May 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants