GTDB #19

andrewjmc · 2021-10-11T15:24:13Z

Hi,

I would love to clean human contaminated sequences from the GTDB bacteria and archaea (r95) and NCBI viruses and fungi, as classifications are being badly affected in some samples of mine with high human DNA proportion. I already have a concatenated .faa file for kraken, and a seqid2taxid.map file. However, because it is a custom-built database, and incorporates GTDB, the taxids bear no relation to NCBI IDs. I have a names.dmp and nodes.dmp file.

Could I tweak conterminator to process this database? It is a 120 Gb sequence database. I can't see how much RAM is required, but naively following the idea of linear time, I would hope I could process my database in under a day.

Best wishes,

Andrew

The text was updated successfully, but these errors were encountered:

martin-steinegger · 2022-10-25T05:24:08Z

The database module should allow you to download the GTDB database. It will build names.dmp and nodes.dmp based on the GTDB taxonomy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GTDB #19

GTDB #19

andrewjmc commented Oct 11, 2021

martin-steinegger commented Oct 25, 2022

GTDB #19

GTDB #19

Comments

andrewjmc commented Oct 11, 2021

martin-steinegger commented Oct 25, 2022