-
Notifications
You must be signed in to change notification settings - Fork 2
Create GTDB SSU tree
Pierre Chaumeil edited this page May 9, 2023
·
5 revisions
- gtdb metadata export --format csv --output gtdb_metadata_rXX_.csv
- gtdb genomes ssu_export --output gtdb_rXX_.fna
- genometreetk ssu_tree gtdb_metadata_rXX_.csv gtdb_rXX_.fna . -c 24 --min_scaffold_length 5000
- genometreetk outgroup ...
- phylorank decorate ...
Genomes and the scaffold containing the 16S rRNA genes are filtered in order to try and avoid erroneous genes (i.e. 16S rRNA genes that are contamination within the genome). Contaminating 16S rRNA genes are preferentially found in genomes with low estimated quality, poor assembly statistics, and on short scaffolds. A BLAST-based filtering step is also used to filter out 16S rRNA genes that appear to be incongruent with the taxonomic assignment of the genome (see the GTDB manuscript for details).
- get the list of reps bac120_reps.lst and the ssu file generated for the website (ssu_all_.fna
- run
convert_sequence.py
fromscripts_dev/ssu_tree/convert_sequence.py
to get the list of 16S sequences great than 700aa for reps - run Sina on this list of sequences.
sina -i reps_ssu_non_aligned_gt700.fna -o reps_ssu_aligned_4frames_NR99_gt700.fasta --db ../../SILVA_138.1_SSURef_NR99_12_06_20_opt.arb -t all
- run
convert_arb_file
to recreate the arb metadata file and replace the genome sequences by their aligned SSU sequences (50K long)