Failing to build custom database for HBV #9

wskang1202 · 2023-05-19T19:34:50Z

Hi Sara,

I've been trying to build custom databases by following FastViFi Readme. Building databases for HCV and EBV were successful, however, building the hbv databases for k=18, k-22 were unsuccessful. The following message was shown in the log file:

scan_fasta_file.pl: unable to determine taxonomy ID for sequence hbv_ref7
No preliminary seqid/taxid mapping files found, aborting.

Is there a way to solve this problem?

Best,
Wonseok

sara-javadzadeh · 2023-05-23T03:09:29Z

Hi Wonseok,

It looks like the file prelim_map.txt is missing. Does the file exist in the kraken2/<your HBV db name>/taxonomy directory? If not, one reason could be that downloading the library failed. Could you please run download_custom_kraken_library.sh for HBV again and check if the prelim_map.txt file is downloaded in your HBV database directory?

Please let me know if this didn't work for you.

Best,
Sara

wskang1202 · 2023-05-23T18:38:29Z

Hi, Sara,

I ran download_custom_kraken_library.sh for HBV again, and I can see that there is prelim_map.txt file in kraken2/Kraken2StandardDB_k_18_hbv/taxonomy but the file itself is empty.

Best,
Wonseok

sara-javadzadeh · 2023-05-24T01:50:19Z

Hi Wonseok,

Do you get an error when running download_custom_kraken_library.sh for the HBV dataset?
Could you please check if the prelim_map.txt is present and non-empty in the HCV and EBV databases that you created successfully before?

Best,
Sara

wskang1202 · 2023-05-24T02:37:29Z

Hi, Sara.

The prelim_map.txt is present and non-empty in the successfully-made databases (HCV and EBV as well as k_25_hbv_hg databases). However the file is empty for the unsuccessful k_18_hbv and k_22_hbv databases. I've attached the log.txt file in case you might want to check out.

Thank you,
Wonseok

sara-javadzadeh · 2023-05-25T19:23:14Z

Hi Wonseok,

Did you try running the build_custom_kraken_index.sh script on k_18_hbv database, after running download_custom_kraken_library.sh? If so, was there any error?

mrzResearchArena · 2023-06-05T18:45:59Z

Hi Javadzadeh,

I have downloaded your suggested dataset for sample-level FastFiVi for the HPV virus: https://drive.google.com/file/d/1QYn5lDWjvhtIWCrwmzDc_1fy8ANrXWz1/view?usp=sharing. However, when I was attempting to extract it, it showed errors (tar -xzvf kraken_datasets.tar.gz). Could you please suggest to me how I can figure it out?

sara-javadzadeh · 2023-06-12T19:07:26Z

Hi Muhammod,

Thanks for reaching out.
Could you please share the error messages when running tar -xzvf kraken_datasets.tar.gz?

mrzResearchArena · 2023-06-12T21:11:45Z

Hi Javadzadeh,

Thank you so much for your response. I was getting the below errors. The downloaded file size is "15796400321" bytes.

gzip: stdin: invalid compressed data--crc error
tar: Child returned status 1
tar: Error is not recoverable: exiting now

ls -l kraken_datasets.tar.gz

sara-javadzadeh · 2023-06-12T23:14:03Z

Hi again,

Thanks! Although the output to ls -l command is truncated in your reply, I can see the file size in your text above. The file size seems correct.

Did you try running gunzip kraken_datasets.tar.gz and then running tar -xvf kraken_datasets.tar? If it's failing, could you please share the error?

By the way, the uncompressed should be about 60GB. Is that taken into consideration?

Thanks,
Sara

mrzResearchArena · 2023-06-12T23:21:13Z

Hi Javadzadeh,

Yes, I have tried. However, it doesn't work out.

gzip: kraken_datasets.tar.gz: invalid compressed data--crc error

mrzResearchArena · 2023-06-12T23:36:55Z

Hi Javazadesh, could you please provide a different download link?

sara-javadzadeh · 2023-06-12T23:42:03Z

I can provide another link, it'll take a couple of hours to upload the database.
In the meantime, could you please check the following?

Could you please share the output of the following command? file kraken_datasets.tar.gz
Check if tar -tf kraken_datasets.tar.gz can list the files without the error or not. If an error, could you please share it?

Sara

mrzResearchArena · 2023-06-13T01:47:28Z

Yes, it shows errors. You can view the error by clicking the link.

tar -tf kraken_datasets.tar.gz > errors-text.txt

Output:

kraken_datasets/
kraken_datasets/Kraken2StandardDB_k_22_hpv/
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/readme.txt
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/merged.dmp
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/taxdump.tar.gz
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/names.dmp
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/taxdump.untarflag
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/accmap.dlflag
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/delnodes.dmp
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/citations.dmp
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/nodes.dmp
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/nucl_gb.accession2taxid
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/gc.prt
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/nucl_wgs.accession2taxid
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/division.dmp
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/gencode.dmp
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/taxdump.dlflag
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxonomy/prelim_map.txt
kraken_datasets/Kraken2StandardDB_k_22_hpv/seqid2taxid.map
kraken_datasets/Kraken2StandardDB_k_22_hpv/hash.k2d
kraken_datasets/Kraken2StandardDB_k_22_hpv/taxo.k2d
kraken_datasets/Kraken2StandardDB_k_22_hpv/library/
kraken_datasets/Kraken2StandardDB_k_22_hpv/library/added/
kraken_datasets/Kraken2StandardDB_k_22_hpv/library/added/prelim_map.txt
kraken_datasets/Kraken2StandardDB_k_22_hpv/library/added/9TbkQmfdkG.fna.masked
kraken_datasets/Kraken2StandardDB_k_22_hpv/library/added/9TbkQmfdkG.fna
kraken_datasets/Kraken2StandardDB_k_22_hpv/library/added/prelim_map_3IwJCtpJpX.txt
kraken_datasets/Kraken2StandardDB_k_22_hpv/opts.k2d
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/taxo.k2d
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/human/
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/human/prelim_map.txt
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/human/assembly_summary.txt
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/human/library.fna.masked
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/human/library.fna
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/human/manifest.txt
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/added/
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/added/prelim_map_SeYmVYHiCd.txt
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/added/prelim_map.txt
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/added/rKtNPyn11J.fna
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/library/added/rKtNPyn11J.fna.masked
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/opts.k2d
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/taxonomy/
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/taxonomy/gencode.dmp
kraken_datasets/Kraken2StandardDB_k_25_hpv_hg/taxonomy/nucl_wgs.accession2taxid
7486\t47861343\nAG288467\tAG288467.1\t57486\t47861344\nAG288468\tAG288468.1\t57486\t47861345\nAG288469\tAG28846
639\t112961221\nDQ844259\tDQ844259.1\t1639\t112961224\nDQ844260\tDQ844260.1\t1639\t112961227\nDQ844261\tDQ84426
253\t113251528\nED394649\tED394649.1\t6253\t113251529\nED394650\tED394650.1\t6253\t113251530\nED394651\tED39465
322560303\nJG336704\tJG336704.1\t30301\t322560304\nJG336705\tJG336705.1\t30301\t322560305\nJG336706\tJG336706.
697\nKR112558\tKR112558.1\t1387109\t955261699\nKR112559\tKR112559.1\t1690892\t955261701\nKR112560\tKR112560.1\t
0\t1531460990\nMM160627\tMM160627.1\t0\t1531460991\nMM160628\tMM160628.1\t0\t1531460992\nMM160629\tMM160629.1\t0
61476\t1946114713\nOC673268\tOC673268.1\t61476\t1946114714\nOC673269\tOC673269.1\t61476\t1946114715\nOC673270\t
\t61472\t1948381426\nOD593408\tOD593408.1\t61472\t1948381428\nOD593409\tOD593409.1\t61472\t1948381430\nOD593410
61472\t1947471274\nOD855123\tOD855123.1\t61472\t1947471275\nOD855124\tOD855124.1\t61472\t1947471276\nOD855125\t
\t61474\t1962876452\nOE366104\tOE366104.1\t61474\t1962876453\nOE366105\tOE366105.1\t61474\t1962876454\nOE366106
61474\t1964446754\nOE507499\tOE507499.1\t61474\t1964446757\nOE507500\tOE507500.1\t61474\t1964446760\nOE507501\t
\t61474\t1965131656\nOE607256\tOE607256.1\t61474\t1965131659\nOE607257\tOE607257.1\t61474\t1965131662\nOE607258
003024004.1\t663202\t302664848\nXM_003024005\tXM_003024005.1\t663202\t302664850\nXM_003024006\tXM_003024006.
tar: Skipping to next header
tar: Archive contains ‘9.1\t5748’ where numeric mode_t value expected
tar: Archive contains ‘.1\t57486\t478’ where numeric time_t value expected
7486\t47861343\nAG288467\tAG288467.1\t57486\t47861344\nAG288468\tAG288468.1\t57486\t47861345\nAG288469\tAG28846
tar: Skipping to next header
tar: Archive contains ‘0672.1\t4113\t’ where numeric off_t value expected
tar: Archive contains ‘119.1\t262687’ where numeric off_t value expected
tar: Archive contains ‘1.1\t1639’ where numeric mode_t value expected
tar: Archive contains ‘.1\t1639\t1129’ where numeric time_t value expected
tar: Archive contains ‘\t1129612’ where numeric uid_t value expected
639\t112961221\nDQ844259\tDQ844259.1\t1639\t112961224\nDQ844260\tDQ844260.1\t1639\t112961227\nDQ844261\tDQ84426
tar: Skipping to next header
tar: Archive contains ‘1.1\t6253’ where numeric mode_t value expected
tar: Archive contains ‘.1\t6253\t1132’ where numeric time_t value expected
253\t113251528\nED394649\tED394649.1\t6253\t113251529\nED394650\tED394650.1\t6253\t113251530\nED394651\tED39465
tar: Skipping to next header
tar: Archive contains ‘1609\nEZ97768’ where numeric off_t value expected
tar: Archive contains ‘\tHE793950.1\t’ where numeric off_t value expected
322560303\nJG336704\tJG336704.1\t30301\t322560304\nJG336705\tJG336705.1\t30301\t322560305\nJG336706\tJG336706.
tar: Skipping to next header
tar: Archive contains ‘1759748\t’ where numeric mode_t value expected
tar: Archive contains ‘95526170’ where numeric uid_t value expected
697\nKR112558\tKR112558.1\t1387109\t955261699\nKR112559\tKR112559.1\t1690892\t955261701\nKR112560\tKR112560.1\t
tar: Skipping to next header
tar: Archive contains ‘\tLA487646.1\t’ where numeric off_t value expected
tar: Archive contains ‘29\tMC492929.’ where numeric off_t value expected
tar: Archive contains ‘31460994\nMM1’ where numeric time_t value expected
tar: Archive contains ‘993\nMM16’ where numeric uid_t value expected
0\t1531460990\nMM160627\tMM160627.1\t0\t1531460991\nMM160628\tMM160628.1\t0\t1531460992\nMM160629\tMM160629.1\t0
tar: Skipping to next header
tar: Archive contains ‘_019029293.1’ where numeric off_t value expected
tar: Archive contains ‘\t50390\t15815’ where numeric off_t value expected
tar: Archive contains ‘OC673270’ where numeric mode_t value expected
tar: Archive contains ‘\tOC673271.1\t’ where numeric time_t value expected
tar: Archive contains ‘.1\t61476’ where numeric uid_t value expected
tar: Archive contains ‘\t1946114’ where numeric gid_t value expected
61476\t1946114713\nOC673268\tOC673268.1\t61476\t1946114714\nOC673269\tOC673269.1\t61476\t1946114715\nOC673270\t
tar: Skipping to next header
tar: Archive contains ‘\tOD59341’ where numeric mode_t value expected
tar: Archive contains ‘0.1\t6147’ where numeric uid_t value expected
\t61472\t1948381426\nOD593408\tOD593408.1\t61472\t1948381428\nOD593409\tOD593409.1\t61472\t1948381430\nOD593410
tar: Skipping to next header
tar: Archive contains ‘OD855125’ where numeric mode_t value expected
tar: Archive contains ‘\tOD855126.1\t’ where numeric time_t value expected
tar: Archive contains ‘.1\t61472’ where numeric uid_t value expected
tar: Archive contains ‘\t1947471’ where numeric gid_t value expected
61472\t1947471274\nOD855123\tOD855123.1\t61472\t1947471275\nOD855124\tOD855124.1\t61472\t1947471276\nOD855125\t
tar: Skipping to next header
tar: Archive contains ‘\tOE36610’ where numeric mode_t value expected
tar: Archive contains ‘6.1\t6147’ where numeric uid_t value expected
\t61474\t1962876452\nOE366104\tOE366104.1\t61474\t1962876453\nOE366105\tOE366105.1\t61474\t1962876454\nOE366106
tar: Skipping to next header
tar: Archive contains ‘OE507501’ where numeric mode_t value expected
tar: Archive contains ‘\tOE507502.1\t’ where numeric time_t value expected
tar: Archive contains ‘.1\t61474’ where numeric uid_t value expected
tar: Archive contains ‘\t1964446’ where numeric gid_t value expected
61474\t1964446754\nOE507499\tOE507499.1\t61474\t1964446757\nOE507500\tOE507500.1\t61474\t1964446760\nOE507501\t
tar: Skipping to next header
tar: Archive contains ‘081\nOE597102’ where numeric off_t value expected
tar: Archive contains ‘\tOE60725’ where numeric mode_t value expected
tar: Archive contains ‘9\tOE607259.1’ where numeric time_t value expected
tar: Archive contains ‘8.1\t6147’ where numeric uid_t value expected
\t61474\t1965131656\nOE607256\tOE607256.1\t61474\t1965131659\nOE607257\tOE607257.1\t61474\t1965131662\nOE607258
tar: Skipping to next header
tar: Archive contains ‘03024007.1\t6’ where numeric time_t value expected
tar: Archive contains ‘\t3026648’ where numeric uid_t value expected
003024004.1\t663202\t302664848\nXM_003024005\tXM_003024005.1\t663202\t302664850\nXM_003024006\tXM_003024006.
tar: Skipping to next header
tar: Archive contains ‘008481066.2\t’ where numeric off_t value expected

gzip: stdin: invalid compressed data--crc error

gzip: stdin: invalid compressed data--length error
tar: Child returned status 1
tar: Error is not recoverable: exiting now

sara-javadzadeh · 2023-06-13T01:59:13Z

Thanks for checking.
I'm uploading the databases again, it'll take another couple of hours to fully upload. I'll share the link here as soon as it happens.
In the meantime, it might be worth setting up a new Conda environment, installing tar and trying to extract the database files in this new clean environment. Let me know if you still get the errors.

Sara

sara-javadzadeh · 2023-06-13T03:21:07Z

Hi again,

Here's a second link for the same Kraken databases: https://drive.google.com/file/d/1DrKgDE7fl5Tff2bV8K9XBxLYsbTeOcgh/view?usp=sharing

I suspect this might be a tar library incompatibility rather than file problem. I was able to list the contents of kraken_datasets.tar.gz using the first link (provided in the README file). Here's my tar version on macOS 12.1
tar --version bsdtar 3.5.1 - libarchive 3.5.1 zlib/1.2.11 liblzma/5.0.5 bz2lib/1.0.8
That's why I would recommend updating your tar package or create a new Conda environment and try it again as above. Let me know how it goes.

Sara

mrzResearchArena · 2023-06-14T16:47:11Z

Thank you, Ms. Javadzadeh. It helped me a lot.

I used a Python script instead of tar, and this time, it has not shown errors. After extracting, I got a 61.4 GB file size. Is it the correct file size?

import tarfile

sourcePATH = '/mnt/sdb1/kraken2/kraken_datasets.tar.gz'
destinationPATH = '/mnt/sdb1/kraken2/'

with tarfile.open(sourcePATH) as tar:
    tar.extractall(destinationPATH)
    tar.close()

sara-javadzadeh · 2023-06-14T17:27:59Z

Great! Thanks for letting me know. The size of extracted files sound reasonable. Sara

…

On Wed, Jun 14, 2023 at 9:47 AM Rafsanjani, Muhammod < ***@***.***> wrote: Thank you, Ms. Javadzadeh. It helped me a lot. I used a Python script instead of tar, and this time, it has not shown errors. After extracting, I got a 61.4 GB file size. Is it the correct file size? import tarfile sourcePATH = '/mnt/sdb1/kraken2/kraken_datasets.tar.gz'destinationPATH = '/mnt/sdb1/kraken2/' with tarfile.open(sourcePATH) as tar: tar.extractall(destinationPATH) tar.close() — Reply to this email directly, view it on GitHub <#9 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOGKYDQYTPYP3TJYW3C5L53XLHTJVANCNFSM6AAAAAAYIFHEZI> . You are receiving this because you commented.Message ID: ***@***.***>

cubense · 2023-12-05T09:05:36Z

Hi Wonseok,

Did you try running the build_custom_kraken_index.sh script on k_18_hbv database, after running download_custom_kraken_library.sh? If so, was there any error?

hi sara
i meet the same error with Wonseok
no error running the build_custom_kraken_index.sh and download_custom_kraken_library.sh in k_18 and k_22
prelim_map.txt is empty in k_18 and k_22
but prelim_map.txt in k_25_hg is ok
i running the docker show error does not contain necessary file taxo.k2d

cubense mentioned this issue Dec 9, 2023

error in run kraken_vifi_conainer.py using docker: subprocess.calledprocesserror returned non-zero exit status 1 #13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing to build custom database for HBV #9

Failing to build custom database for HBV #9

wskang1202 commented May 19, 2023 •

edited

Loading

sara-javadzadeh commented May 23, 2023

wskang1202 commented May 23, 2023

sara-javadzadeh commented May 24, 2023

wskang1202 commented May 24, 2023

sara-javadzadeh commented May 25, 2023

mrzResearchArena commented Jun 5, 2023 •

edited

Loading

sara-javadzadeh commented Jun 12, 2023

mrzResearchArena commented Jun 12, 2023

sara-javadzadeh commented Jun 12, 2023

mrzResearchArena commented Jun 12, 2023 •

edited

Loading

mrzResearchArena commented Jun 12, 2023

sara-javadzadeh commented Jun 12, 2023

mrzResearchArena commented Jun 13, 2023

sara-javadzadeh commented Jun 13, 2023

sara-javadzadeh commented Jun 13, 2023

mrzResearchArena commented Jun 14, 2023

sara-javadzadeh commented Jun 14, 2023 via email

cubense commented Dec 5, 2023

Failing to build custom database for HBV #9

Failing to build custom database for HBV #9

Comments

wskang1202 commented May 19, 2023 • edited Loading

sara-javadzadeh commented May 23, 2023

wskang1202 commented May 23, 2023

sara-javadzadeh commented May 24, 2023

wskang1202 commented May 24, 2023

sara-javadzadeh commented May 25, 2023

mrzResearchArena commented Jun 5, 2023 • edited Loading

sara-javadzadeh commented Jun 12, 2023

mrzResearchArena commented Jun 12, 2023

sara-javadzadeh commented Jun 12, 2023

mrzResearchArena commented Jun 12, 2023 • edited Loading

mrzResearchArena commented Jun 12, 2023

sara-javadzadeh commented Jun 12, 2023

mrzResearchArena commented Jun 13, 2023

sara-javadzadeh commented Jun 13, 2023

sara-javadzadeh commented Jun 13, 2023

mrzResearchArena commented Jun 14, 2023

sara-javadzadeh commented Jun 14, 2023 via email

cubense commented Dec 5, 2023

wskang1202 commented May 19, 2023 •

edited

Loading

mrzResearchArena commented Jun 5, 2023 •

edited

Loading

mrzResearchArena commented Jun 12, 2023 •

edited

Loading