Skip to content

Hatchet

pchaumeil edited this page Apr 5, 2022 · 2 revisions

Hatchet is a tool used to split the GTDB-Tk reference tree into smaller sub-tree to reduce the memory footprint of the tool.

Hatchet is an internal tool and may break outside of ACE.

Prune Reference Tree

This step reduce the number of genomes in the reference tree to one genome per rank of interest.

hatchet pick --domain bac -r f --tree release89/pplacer/gtdb_r89_bac120.refpkg/bac120_r89_unroot.pplacer.tree --msa release89/pplacer/gtdb_r89_bac120.refpkg/bac120_msa_r89.faa --taxonomy release89/taxonomy/gtdb_taxonomy.tsv --red_file release89/mrca_red/gtdbtk_r89_bac120.tsv --output_dir output

hatchet pick is generating a shell script 'pick_one_genome.sh' calling different third party tools.

Once the pruned tree is generated we need to recreate the red file with the remaining nodes.

hatchet red --raw_tree ../release89/pplacer/gtdb_r89_bac120.refpkg/bac120_r89_unroot.pplacer.tree --pruned_tree gtdb_pruned.tree --red_file ../release89/mrca_red/gtdbtk_r89_bac120.tsv --output new_red_file.tsv

Use the Hatchet workflow

hatchet hatchet_wf -d bac -t gtdb_r207_bac120_decorated_fullids.tree --msa /srv/projects/gtdbtk/test_for_ms/benchmark_time_r207/tk_package/pplacer/gtdb_r207_bac120.refpkg/gtdb_r207_bac120_concatenated_gtdb_headers.faa --tax ../../taxonomy/bac120_taxonomy_r207_reps.tsv -o split/hatchet_wf_use_original_log --red_file phylorank_outliers/gtdb_r207_bac120_decorated_fullids.node_rd.tsv --original_log gtdb_r207_bac120_fasttree.log

Clone this wiki locally