This pipeline was designed for the subgenome identification of homeologous diploid genomes present in allopolyploids, also considering potentially unknown progenitors.
The Phylogenomic Subgenomic Detection (PhyloSD) pipeline includes three sequential Nearest Diploid Species Node, Bootstrapping Refinement, and Subgenome Assignment algorithms.
The protocol is explained in detail in:
R Sancho, LA Inda, A Diaz-Perez,L Des Marais, S Gordon, JP Vogel, J Lusinska, R Hasterok, B Contreras-Moreira, P Catalan (2022) Tracking the Ancestry of Known and 'Ghost' Homeologous Subgenomes in Model Grass Brachypodium Polyploids. The Plant Journal http://doi.org/10.1111/tpj.15650
The supporting information (supplementary tables, figures, appendix and methods) that accompanies this paper can be found at Dryad repository https://doi.org/10.5061/dryad.ncjsxksqw
A Docker container with dependencies installed can be found at:
https://hub.docker.com/r/eeadcsiccompbio/allopolyploids
- 0) Pre-processing of the data set and installation of dependencies
- 1) NEAREST DIPLOID SPECIES NODE algorithm
- 2) BOOTSTRAPPING REFINEMENT algorithm
- 3) SUBGENOME ASSIGNMENT algorithm
- 4) OPTIONAL DATING ANALYSIS
This pipeline has several dependencies, which can be installed as explained below in folder bin/:
software | URL |
---|---|
TrimAl | https://github.com/inab/trimal |
newick_utils | https://github.com/tjunier/newick_utils |
IQ-TREE | http://www.iqtree.org |
GET_HOMOLOGUES-EST | https://github.com/eead-csic-compbio/get_homologues |
concat_alignments.pl | https://github.com/vinuesa/get_phylomarkers |
Consensus.pl | https://github.com/josephhughes/Sequence-manipulation |
R | https://www.r-project.org |
In addition, the instructions below require wget, curl, make, git, perl and parallel on your system.
In Debian-like systems like Ubuntu please copy and paste the following command lines in your terminal to install:
# system dependencies
sudo apt-get install -y libdb-dev parallel git curl cpanminus
# R
sudo apt-get install -y r-base
# third-party binaries and Perl dependencies
make install
In RedHat-like systems:
# Perl
sudo yum install libdb-devel
curl -L https://cpanmin.us | perl - --sudo App::cpanminus
# R
sudo yum install epel-release
sudo apt-get install R
# third-party binaries and Perl dependencies
make install
In order to check yout installation please run:
make test
In order to download the data used for the Brachypodium, used in the tutorial below, benchmark please do:
make brachy
Note that Brachypodium RNA-seq data were also deposited at the European Nucleotide Archive with run accessions numbers:
- ERR3633153 (B. arbuscula)
- ERR3634426 (B. boissieri)
- ERR3634464 (B. hybridum)
- ERR3634593 (B. mexicanum) *
- ERR3634717 (B. phoenicoides Bpho422)
- ERR3634894 (B. phoenicoides Bpho6)
- ERR3634970 (B. pinnatum)
- ERR3636695 (B. retusum)
- ERR3636791 (B. rupestre)
- ERR3636828 (B. stacei)
B. sylvaticum RNA-seq data from accession Brasy-Esp were obtained from the study by Fox et al., 2013.
Transcriptomic data were also retrieved for the outgroup species Oryza sativa (SRX738077) and Hordeum vulgare (ERR159679), which were used to root the phylogenetic Brachypodium trees and to provide a stem branch for the grafting of ancestral homeologs/subgenomes.
The following abbreviations were used:
Bmex --> B.mexicanum
Bret --> B.retusum
Bboi --> B.boissieri
Bhyb --> B.hybridum
Brup --> B.rupestre
Bpho --> B.phoenicoides (accession Bpho6)
B422 --> B.phoenicoides (accession Bpho422)
Bsta --> B.stacei
Bdis --> B.distachyon
Barb --> B.arbuscula
Bpin --> B.pinnatum
Bsyl --> B.sylvaticum
Osat --> Oryza sativa
Hvul --> Hordeum vulgare
This first step requires https://github.com/eead-csic-compbio/get_homologues and the set of transcripts in folder genome_transcripts, most of them assembled de novo with https://github.com/trinityrnaseq/trinityrnaseq . With the following command we request clusters with sequence identity >= 80% and 75% coverage of the shortest sequence
bin/get_homologues/get_homologues-est.pl -d genome_transcripts \
-m cluster -I genome_transcripts/species.list -M -A -S 80 \
&> gen.M.A.S80.core.cluster.log
This produces 3675 clusters.
We can now compute pan-gene matrices for these core clusters:
bin/get_homologues/compare_clusters.pl \
-d genome_transcripts_est_homologues/arb8075_alltaxa_no_sorghum_no_sylCor.list_algOMCL_e0_S80_/ \
-o core_clusters_Brachypodium -m -n &> compare.core.log
We can optionally plot an Average Nucleotide Identity (ANI) matrix:
# Note: this requires additional R dependencies
bin/get_homologues/install_R_deps.R
bin/get_homologues/plot_matrix_heatmap.sh -i \
genome_transcripts_est_homologues/arb8075_alltaxa_no_sorghum_no_sylCor.list_algOMCL_e0_S80_Avg_identity.tab \
-H 14 -W 26 -t "ANI of transcripts in 3675 core clusters" -N -o pdf
From the clusters obtained earlier we can now produce MVIEW collapsed, multiple sequence alignments (MSA), while also annotating Pfam domains:
for FILE in `ls core_clusters/*.fna`; do
echo $FILE;
bin/get_homologues/annotate_cluster.pl -D -f $FILE -o $FILE.aln.fna -c 20 &>> collapse.align.core.log;
done
1.1) Edit polyconfig.pm file to adapt the previous information to your specific analyses. Define diploids, polyploids, outgroup, rooted species, ad-hoc labelling rules, ...
We might need to simplify the FASTA headers, which is something that should be taylored according to the user's data.
for FILE in *collapsed.fna.aln.fna; do
echo $FILE;
perl -p -i -e 's/>(.+?) .+/>$1/g; s/:\d+:\d+:[+-]//g' $FILE;
done
Original header:
>c43592_g1_i1_chr8:11969473:11975162:+[B422_80_75] collapsed:6,{12}, Pfam:Microtubule associated protein (MAP65/ASE1 family);(PF03999,)
Simplified header:
>c43592_g1_i1_chr8[B422_80_75]
The goal is to leave the species identifier as the last bit, in this case within [brackets]. You might need to follow a sligthly different approach with your own data.
You can also simplify the names of your files (optional):
for f in *.fna;
do mv -- "$f" "$(basename -- "$f" .collapsed.fna.aln.fna).fa";
done
Original names:
99998_c33211_g1_i1_chr2-26015115-26017843--.collapsed.fna.aln.fna
Renamed names:
99998_c33211_g1_i1_chr2-26015115-26017843--.fa
Note: All FASTA files are compressed in fasta.files.tar.bz2 and saved in 01_collapsed_core_clusters directory
1.3) Filter Mutiple Sequences Alignments (MSAs) according to the diploid block and remove inconsistent positions
We now take these alignments of nucleotide sequences of both diploid and polyploid species and produce trimmed FASTA files suitable for phylogenetic tree inference. The goal is to define a solid diploid backbone, which should be covered by outgroup sequences as well, and then use it to filter out polyploid sequences/alleles with diploid block overlap < $MINBLOCKOVERLAP. Therefore, some sequences and species might be removed from the initial input.
This process is computed using scripts/_trim_MSA_block.pl and the defauld parameters are defined in polyconfig.pm file:
Default values for paremeters controlling how aligned blocks of sequences are produced in script _trim_MSA_block.pl our $MINBLOCKLENGTH = 100; our $MAXGAPSPERBLOCK = 100; # tolerated gaps for diploids in block our $MINBLOCKOVERLAP = 0.50; # fraction of diploid block covered by outgroups & polyploids
We run the tool:
for FILE in *.fa;
do echo $FILE;
perl scripts/_trim_MSA_block.pl -i $FILE -o $FILE.block.fna \
-m 100 -M 100 -O 0.5 -R 'c\d+_g\d+_i' \
-t scripts/species2rename.tsv &>> blocks.log;
done
We recover 2001 aligned core clusters with a consistent diploid block. You can save the filter files in 02_blocks directory
Warning: In this step, some alignments can lost outgroups (Osat or Hvul). If we count the number of block.fna files with each taxon we recover: Osat --> 1925/2001 Hvul --> 1944/2001 Bsta --> 2001/2001 Bdis --> 2001/2001 Barb --> 2001/2001 Bpin --> 2001/2001 Bsyl --> 2001/2001
You can count the files with all diploid and outgroup taxa using the regular expression:
grep -e '^>.*_Osat' -e '^>.*_Hvul' -e '^>.*_Bsta' -e '^>.*_Bdis' -e '^>.*_Barb' -e '^>.*_Bpin' -e '^>.*_Bsyl' *.block.fna \
| cut -d":" -f1 | uniq -c | cut -d" " -f7,8 | grep -c "^7"
We recover 1878 alignments with all diploid and outgroup taxa.
We save a list of files with all diploid and ougroup taxa:
grep -e '^>.*_Osat' -e '^>.*_Hvul' -e '^>.*_Bsta' -e '^>.*_Bdis' -e '^>.*_Barb' -e '^>.*_Bpin' -e '^>.*_Bsyl' *.block.fna | cut -d":" -f1 | uniq -c | cut -d" " -f7,8 | grep "^7" | cut -d" " -f2 > list_block_all_fasta.txt
We can now trim the resulting blocks with https://vicfero.github.io/trimal to remove the inconsistent positions of the alignments:
for FILE in *block.fna; do
echo $FILE;
bin/trimal-1.4.1/source/trimal -in $FILE -out $FILE.trimmed.fna -automated1;
done
We can now run IQ-TREE in parallel. The results are saved in the folder 03_iqtree_treefiles/iqtree_treefile (see results in treefile.tar.bz2 file)
ls *.trimmed.fna | parallel --gnu -j 3 bin/iqtree-1.6.12-Linux/bin/iqtree \
-alrt 1000 -bb 1000 -nt AUTO -AICc -s {} :::
We will now root and ladderize the nodes (starting with rice. Root defined in polyconfig.pm) in the resulting trees, which are stored in folder 03_iqtree_treefiles/rooted_trees (see root.ph.tar.bz2 file).
Note: this script requires Perl modules Bio::TreeIO and Bio::Phylo:
sudo cpan -i Bio::TreeIO Bio::Phylo
for FILE in *.treefile;
do perl scripts/_reroot_tree.pl $FILE > $FILE.root.ph;
echo $FILE;
done
Warning: if there is not "root" taxon, an empty file is created. Remove this files are recomended
find . -empty -type f -delete
Warning: some files have not Hvul outgroup. This will be filtered upstream. We save them into a list.
We save the list of files (root.ph) with all taxa:
ls *.ph | grep -f ../02_blocks/list_block_all_fasta.txt > list_root_tree_all_taxa.txt
Note: one of this files would be lost in the next step because of "Calling node's branch length will be zero (set -FORCE to force)--exiting". This file starts with "12615".
We can now ask how many different diploid skeletons are there and select gene trees with congruent diploid topology. The next script must be run within folder 03_iqtree_treefiles/rooted_trees and requires http://cegg.unige.ch/newick_utils
for FILE in *root.ph;
do perl scripts/_check_diploids.pl $FILE;
done > diploids.log
The tool also create the pruned tree files (trees with only diploid species. Not save in this example).
Note: There are 1924 pruned trees (and 1925 rooted trees) because one tree show the error "Calling node's branch length will be zero (set -FORCE to force)--exiting". This file starts with "12615"
We check and count the diferent diploid topologies:
grep -v "#" diploids.log | sort | uniq -c | sort -n
Examples (number of trees/genes and diploid topologies): e.g. Counts Diploid skeleton/topologies 75 (Osat,(Hvul,(Bsta,(Bdis,(Barb,(Bsyl,Bpin)))))); 124 (Osat,(Hvul,(Bsta,(Bdis,(Bsyl,(Barb,Bpin)))))); 165 (Osat,(Hvul,(Bsta,(Bdis,(Bpin,(Barb,Bsyl)))))); 254 (Osat,(Hvul,(Bsta,(Bdis,(Barb,(Bpin,Bsyl))))));
Congruent diploid skeletons/topologies (all variants):
(Osat,(Hvul,(Bsta,(Bdis,(Barb,(Bpin,Bsyl)))))); (Osat,(Hvul,(Bsta,(Bdis,(Barb,(Bsyl,Bpin))))));
(Osat,(Hvul,(Bsta,(Bdis,((Bsyl,Bpin),Barb))))); (Osat,(Hvul,(Bsta,(Bdis,((Bpin,Bsyl),Barb)))));
We convert the .log file to a table (tsv format):
grep -e '# .*fna.treefile.root.ph*' -e '(' "diploids.log" \
| cut -d" " -f1,2 | sed 's/# //g' | awk '{printf "%s%s",$0,NR%2?"\t":RS}' > table_diploids.tsv
We create the table with all congruent diploid topologies:
grep -w \
-e '(Osat,(Hvul,(Bsta,(Bdis,(Barb,(Bpin,Bsyl))))));' \
-e '(Osat,(Hvul,(Bsta,(Bdis,(Barb,(Bsyl,Bpin))))));' \
-e '(Osat,(Hvul,(Bsta,(Bdis,((Bsyl,Bpin),Barb)))));' \
-e '(Osat,(Hvul,(Bsta,(Bdis,((Bpin,Bsyl),Barb)))));' \
table_diploids.tsv > table_diploids_congruent.tsv
We check and count the congruent diploid topologies
cut -f2 table_diploids_congruent.tsv | sort | uniq -c
Counts Diploid skeleton/topologies 254 (Osat,(Hvul,(Bsta,(Bdis,(Barb,(Bpin,Bsyl)))))); 75 (Osat,(Hvul,(Bsta,(Bdis,(Barb,(Bsyl,Bpin))))));
We recover 329 trees with the congruent diploid topology.
We create the list of congruent FASTA files (save in folder 02_blocks).
cut table_diploids_congruent.tsv -f1 | sed 's/.treefile.root.ph//' > list_congruent_fasta.txt
We create the list of congruent tree (rooted.ph) files (save in folder 03_iqtree_treefiles/rooted_trees).
cut table_diploids_congruent.tsv -f1 > list_congruent_trees.txt
1.7) We will now copy files (.fna, root.ph, root.ph.pruned) with congruent diploid topologies to folder 04_congruent_and_labelled_files
Run the commands in 02_blocks
ls | grep -f list_congruent_fasta.txt | xargs cp -t ../04_congruent_and_labelled_files
Run in 03_iqtree_treefiles/rooted_trees
ls | grep -f list_congruent_trees.txt | xargs cp -t ../../04_congruent_and_labelled_files
Note: These results are not included but you can run the code to recover them
In this step, we will label polyploid sequences (homeolog) with some pre-defined codes according to their position with respect to the diploid skeleton. We will do that with scripts/_check_lineages_polyploids.pl, which has several parameters defined previously (see polyconfig.pm file). This script takes as input an MSA and tree file with congruent diploid skeletons and produces two labelled MSA files with sorted diploid species and labelled polyploid homeologs (A,B,C,etc) species, a full MSA with some only-gap rows where one polypoid sequence is missed and a reduced MSA excluding missing polyploid sequences (homeologs), and the labelled trees. Ad-hoc rules encoded (see polyconfig.pm) are defined using the species or clades of diploids used such as a reference to do the labelling process. These rules should be carefully adapted to other clades. Labelled files are stored in the folder 04_congruent_and_labelled_files with the names label.fna.tar.bz2 (full labelled MSAs), label.reduced.fna.tar.bz2 (reduced labelled MSAs) and label.ph.tar.bz2 (Labelled trees), respectively.
for FILE in *.fna; do
perl scripts/_check_lineages_polyploids.pl -t $FILE\.treefile.root.ph -f $FILE;
done
You can run the script using a loop:
nohup loop_check_lineage_polyploid.sh &> lineage_codes_blocks_ABCDEFGHI.log &
The preliminary statistic of labelled polyploid homeologs can be calculated using the scripts/_make_lineages_stats.pl. Note: these labels correspond to homeologs, which are represented as lower-case letters in the phylogenies of the paper.
perl scripts/_make_lineages_stats.pl lineage_codes_blocks_ABCDEFGHI_Brachypodium.log
Table of preliminary homeolog-type statistics:
A B C D E F G H I all
Bmex 111 54 51 7 16 0 1 0 0 240
Bboi 96 48 58 12 37 3 7 1 1 263
Bret 46 13 49 6 83 22 40 18 15 292
Bhyb 2 157 2 114 0 0 0 0 1 276
Brup 1 1 1 5 62 39 84 48 57 298
Bpho 3 0 5 3 80 53 72 34 50 300
B422 1 2 2 4 70 53 79 34 51 296
-------
Total alleles: 1965
We use 05_bootstrapping_analysis such as working directory of this step
We create the pruned aligment using such as input files the reduced labelled MSAs (04_congruent_and_labelled_files/label.reduced.fna.tar.bz2). Firstly, we do the files that will be include in the pruned alignment. One list/file per polyploid homeolog (see folder 05_bootstrapping_analysis/list_taxa)
For each labelled homeolog (above e.g. Bmex_A):
for FILE in *.label.reduced.fna.trimmed.fna; do cat ./list_taxa_Brachypodium/Bmex_A.list.txt \
| awk '{gsub("_","\\_",$0);$0="(?s)^>"$0".*?(?=\\n(\\z|>))"}1' | pcregrep -oM -f - $FILE > $FILE.extract_Bmex_A.fna; done
Some pruned MSAs (.extract_Bmex_A.fna, ...) will not include any polyploid. Thus, we have to save only the pruned MSA with polyploid sequences.
grep -E 'Bmex|Bboi|Bret|Bhyb|Brup|Bpho|B422' *.fna | cut -d":" -f1 > list_files_extract_with_polyploid.txt
Move pruned FASTA files with diploids + outgroups + one homeolog to one_allopolyploid_plus_diploids/2_1_files_extract_with_polyploid directory (now empty)
ls *.fna | grep -f list_files_extract_with_polyploid.txt | xargs mv -t files_extract_with_polyploid
Removed the other pruned FASTA ".extract" files without polyploid sequences
ls *.fna | parallel --gnu -j 60 bin/iqtree-1.6.12-Linux/bin/iqtree -b 1000 -nt 1 -AICc -s {} :::
You can run it using a bash script and nohup:
nohup iqtree_non_parametric_bootstrapping_1000.sh > nohup_bootstrapping_1000.txt &
These .boottrees files will be saved in the folder 2_2_iqtree_boottrees_1000.
The script/_bootstrap_label_stats.pl is used for this task. If only root (rootonly) flag is actived, two files for each boottree are created: trees (.ph) extracted from .boottrees, and rooted and sorted trees (.root.ph)
nohup scripts/_bootstrap_label_stats.pl 2_2_iqtree_boottrees_1000/ rootonly &
The statistics have to take in count just in trees with the congruent diploid skeleton (see next step).
for FILE in ./*.relabelled/*.root.ph;
do perl scripts/_check_diploids.pl $FILE;
done &>> stats_diploid_skeleton_1000boottrees.log
See diploid skeletons topologies in "stats_diploid_skeleton_1000boottrees.log" created previously
grep -e '# .fa.block.label.reduced.fna.trimmed' -e '(' "stats_diploid_skeleton_1000boottrees.log"
| cut -d" " -f1,2 | sed 's/# .///g' | awk '{printf "%s%s",$0,NR%2?"\t":RS}'
| sort -t/ -k1,1 -k2,2n > table_diploid_skeleton_1000boottrees.tsv
1,965,000 boottrees (1,965 alleles/homeolog x 1,000 boottrees per homeolog),
Create a file with the list of directories (alleles/homeologs):
grep "# ./" stats_diploid_skeleton_1000boottrees.log | cut -d"/" -f2 \
| uniq > stats_diploid_skeleton_1000boottrees_directories.txt
Again, we check the congruent diploid topology (all variants) and save the files with this diploid skeleton:
(Osat,(Hvul,(Bsta,(Bdis,(Barb,(Bpin,Bsyl)))))); (Osat,(Hvul,(Bsta,(Bdis,(Barb,(Bsyl,Bpin))))));
(Osat,(Hvul,(Bsta,(Bdis,((Bsyl,Bpin),Barb))))); (Osat,(Hvul,(Bsta,(Bdis,((Bpin,Bsyl),Barb)))));
We create a table with the files and all congruent topologies:
grep -w \
-e '(Osat,(Hvul,(Bsta,(Bdis,(Barb,(Bpin,Bsyl))))));' \
-e '(Osat,(Hvul,(Bsta,(Bdis,(Barb,(Bsyl,Bpin))))));' \
-e '(Osat,(Hvul,(Bsta,(Bdis,((Bsyl,Bpin),Barb)))));' \
-e '(Osat,(Hvul,(Bsta,(Bdis,((Bpin,Bsyl),Barb)))));' \
table_diploid_skeleton_1000boottrees.tsv > table_diploid_skeleton_1000boottrees_congruent.tsv
A total of 1,002,542 boottrees show the congruent diploid skeleton
We create a table with the congruent directories:
cut -d"/" -f1 table_diploid_skeleton_1000boottrees_congruent.tsv \
| uniq > table_diploid_skeleton_1000boottrees_congruent_directories.txt
1,960 directories/alleles/homeologs show the congruent diploid skeleton
*Note: five directories do not appear in this list because there are not any boottrees with congruent diploid skeleton. However, those should be considered to remove them as part of the "congruent_diploid_skeleton_1000" files with less than 100 boottrees showing congruent diploid skeleton.
diff stats_diploid_skeleton_1000boottrees_directories.txt table_diploid_skeleton_1000boottrees_congruent_directories.txt
Five directories (alleles) have 0 boottrees with congruent diploid skeleton:
108136_c31693_g1_i1_chr5-23779430-23783492-+.fa.block.label.reduced.fna.trimmed.fna.extract_Bboi_B.fna.boottrees.relabelled
108136_c31693_g1_i1_chr5-23779430-23783492-+.fa.block.label.reduced.fna.trimmed.fna.extract_Bpho_E.fna.boottrees.relabelled
111530_c38754_g2_i5_chr5-9236742-9244175-+.fa.block.label.reduced.fna.trimmed.fna.extract_Bhyb_B.fna.boottrees.relabelled
93237_c39035_g2_i1_chr2-30560610-30570642-+.fa.block.label.reduced.fna.trimmed.fna.extract_Bret_E.fna.boottrees.relabelled
93349_c39377_g2_i1_chr4-4428242-4434064--.fa.block.label.reduced.fna.trimmed.fna.extract_Bboi_C.fna.boottrees.relabelled
If you execute the folowing command, it returns nothing because there are not any cogruent diploid skeletons:
grep -A8 "111530_c38754_g2_i5_chr5-9236742-9244175-+.fa.block.label.reduced.fna.trimmed.fna.extract_Bhyb_B.fna.boottrees.relabelled" \
stats_diploid_skeleton_1000boottrees.log \
| grep -w -e '(Osat,(Hvul,(Bsta,(Bdis,(Barb,(Bpin,Bsyl))))));' \
-e '(Osat,(Hvul,(Bsta,(Bdis,(Barb,(Bsyl,Bpin))))));' \
-e '(Osat,(Hvul,(Bsta,(Bdis,((Bsyl,Bpin),Barb)))));' \
-e '(Osat,(Hvul,(Bsta,(Bdis,((Bpin,Bsyl),Barb)))));'
We check and count the congruent diploid topologies (1000 boottrees)
cut -f2 table_diploid_skeleton_1000boottrees_congruent.tsv | sort | uniq -c
Counts Diploid skeletons/topologies
929020 (Osat,(Hvul,(Bsta,(Bdis,(Barb,(Bpin,Bsyl))))));
73522 (Osat,(Hvul,(Bsta,(Bdis,(Barb,(Bsyl,Bpin))))));
So, there are 1002542 boottrees of total 1965000 with congruent diploid skeleton. We count the boottrees per each allele/homeolog with congruent diploid skeleton.
sed -r 's/\/[0-9]+\.root.ph//g' table_diploid_skeleton_1000boottrees_congruent.tsv | uniq -c \
| awk '{a[$2]+=$1}END{for(i in a) print i,a[i]}' > table_diploid_skeleton_1000boottrees_congruent_counts.tsv
We create a list of congruent files (boottrees):
cut -f1 table_diploid_skeleton_1000boottrees_congruent.tsv | sort -t"/" -k1,1 -k2,2n \
> list_diploid_skeleton_boottrees_congruent_files_sorted.txt
e.g. The list shows the directory/allele (100065_c35185..../) and the boottrees files with congruent diploid skeleton (1.root.ph, ...)
100065_c35185_g1_i1_chr2-2924216-2928564--.fa.block.label.reduced.fna.trimmed.fna.extract_Bboi_A.fna.boottrees.relabelled/1.root.ph
100065_c35185_g1_i1_chr2-2924216-2928564--.fa.block.label.reduced.fna.trimmed.fna.extract_Bboi_A.fna.boottrees.relabelled/2.root.ph
100065_c35185_g1_i1_chr2-2924216-2928564--.fa.block.label.reduced.fna.trimmed.fna.extract_Bboi_A.fna.boottrees.relabelled/3.root.ph
100065_c35185_g1_i1_chr2-2924216-2928564--.fa.block.label.reduced.fna.trimmed.fna.extract_Bboi_A.fna.boottrees.relabelled/4.root.ph
100065_c35185_g1_i1_chr2-2924216-2928564--.fa.block.label.reduced.fna.trimmed.fna.extract_Bboi_A.fna.boottrees.relabelled/5.root.ph
100065_c35185_g1_i1_chr2-2924216-2928564--.fa.block.label.reduced.fna.trimmed.fna.extract_Bboi_A.fna.boottrees.relabelled/6.root.ph
100065_c35185_g1_i1_chr2-2924216-2928564--.fa.block.label.reduced.fna.trimmed.fna.extract_Bboi_A.fna.boottrees.relabelled/7.root.ph
...
We have to check if any allele shows less than 100 boottrees with congruent diploid skeleton. In our analysis, 177 alleles (directories) show less than 100 boottrees with congruente diploid skeleton (five alleles show zero; see previous step).
e.g. Above you can see the directory/allele id and the counts of boottrees with congruent diploid skeleton
Allele id boottrees
108136_c31693_g1_i1_chr5-23779430-23783492-+.fa.block.label.reduced.fna.trimmed.fna.extract_Bhyb_D.fna.boottrees.relabelled 1
110196_c13070_g1_i1_chr6-8389172-8390724-+.fa.block.label.reduced.fna.trimmed.fna.extract_Bpho_H.fna.boottrees.relabelled 1
108136_c31693_g1_i1_chr5-23779430-23783492-+.fa.block.label.reduced.fna.trimmed.fna.extract_Bmex_C.fna.boottrees.relabelled 2
110196_c13070_g1_i1_chr6-8389172-8390724-+.fa.block.label.reduced.fna.trimmed.fna.extract_B422_H.fna.boottrees.relabelled 2
93349_c39377_g2_i1_chr4-4428242-4434064--.fa.block.label.reduced.fna.trimmed.fna.extract_Brup_F.fna.boottrees.relabelled 2
110196_c13070_g1_i1_chr6-8389172-8390724-+.fa.block.label.reduced.fna.trimmed.fna.extract_Bhyb_D.fna.boottrees.relabelled 3
...
We create a list (e.g. list_diploid_skeleton_boottrees_congruent_directories.txt) of congruent directories (alleles) with at least 100 boottrees with congruent diploid skeleton. This list includes 1,788 alleles (directories) (we do not include the 177 alleles (directories) with less than 100 boottrees with congruent diploid skeleton) e.g. to sort the alleles of the list:
cat list_diploid_skeleton_boottrees_congruent_directories.txt | sort > list_diploid_skeleton_boottrees_congruent_directories_sorted.txt
100065_c35185_g1_i1_chr2-2924216-2928564--.fa.block.label.reduced.fna.trimmed.fna.extract_Bboi_A.fna.boottrees.relabelled
100065_c35185_g1_i1_chr2-2924216-2928564--.fa.block.label.reduced.fna.trimmed.fna.extract_Bhyb_D.fna.boottrees.relabelled
100065_c35185_g1_i1_chr2-2924216-2928564--.fa.block.label.reduced.fna.trimmed.fna.extract_Bmex_A.fna.boottrees.relabelled
100065_c35185_g1_i1_chr2-2924216-2928564--.fa.block.label.reduced.fna.trimmed.fna.extract_Bpho_E.fna.boottrees.relabelled
100128_c35206_g3_i1_chr2-21577376-21581870-+.fa.block.label.reduced.fna.trimmed.fna.extract_B422_F.fna.boottrees.relabelled
100128_c35206_g3_i1_chr2-21577376-21581870-+.fa.block.label.reduced.fna.trimmed.fna.extract_Bboi_E.fna.boottrees.relabelled
...
We create a list with the congruent files (boottrees) according to congruent directories (alleles) with at least 100 boottrees with congruent diploid skeleton:
grep -f list_diploid_skeleton_boottrees_congruent_directories_sorted.txt \
list_diploid_skeleton_boottrees_congruent_files_sorted.txt \
> list_diploid_skeleton_boottrees_congruent_files_filtered.txt
We can run this command in the folder 2_2_iqtree_boottrees_1000
cat list_diploid_skeleton_boottrees_congruent_files_filtered.txt | \
xargs cp --parents -t ../2_3_iqtree_boottrees_1000_congruent/
You can check the number of boottrees into 2_3_iqtree_boottrees_1000_congruent directory
ls -R | grep -c "root.ph"
We count 993,743 boottrees with congruent diploid skeleton (1,788 directories, alleles)
2.7) Random selection of 100 boottrees (root.ph) for each directory/allele in the folder 2_3_iqtree_boottrees_1000_congruent
We run the commands in the folder 2_3_iqtree_boottrees_1000_congruent
for d in *.relabelled;
do for f in *.root.ph;
do shuf -n 100 -e $d/$f | xargs cp --parents -t ../2_4_iqtree_boottrees_100_random_congruent;
done; done
We can check the number of boottrees in 2_4_iqtree_boottrees_100_random_congruent directory´
ls -R | grep -c "root.ph"
We count 178,800 boottrees (1,788 directories/alleles x 100 boottrees per directory)
We count the number of boottrees per directory (allele) to make sure the number is 100:
for d in *.relabelled;
do (cd $d && ls | wc | cut -d" " -f5);
done
100
100
100
...
We put together in a single boottrees file each of the hundred trees of each allele (it will be create into each directory the boottrees file):
for d in *.relabelled;
do for f in *.ph;
do (cd $d && cat $f | sed 's/;/;\n/g' > $d.boottrees);
done; done
We move the new boottrees files to a new directory 2_5_iqtree_boottrees_100_random_congruent_concatenate (run in the folder 2_4_iqtree_boottrees_100_random_congruent)
for d in *.relabelled;
do for f in *.boottrees;
do (cd $d && mv $f ../../2_5_iqtree_boottrees_100_random_congruent_concatenate);
done; done
We can rename boottrees files in 2_5_iqtree_boottrees_100_random_congruent_concatenate directory:
for file in *.boottrees;
do mv "$file" "$(basename "$file" .boottrees.relabelled.boottrees).boottrees";
done
2.8) Re-label and compute the statistics of the 100 random boottrees with congruent diploid skeleton for each directory/allele:
We run the scripts/_bootstrap_label_stats.pl tool to re-label the boottrees and calculate the statistics of homeolog-types (A, B, C,...)
nohup scripts/_bootstrap_label_stats.pl \
2_5_iqtree_boottrees_100_random_congruent_concatenate/ > stats_100_boottrees_ramdom_congruent.tsv &
There may be alleles where not all of your boottrees have been labelled. This can occur if one polyploid homeolog branch is inserted between the two outgroups (g. Osat and Hvul) and you have not been defined any ad-hoc labelling rules for this possibility because the genus under study is more recent than both outgroups. To get more consistent statistics in this cases, we re-label all boottrees with a congruent diploid skeleton (not just 100 boottrees) and conduct the random selection of 100 re-labelled boottrees.
Firstly, we create a list (e.g. list_alleles_round_2.txt in the folder 2_3_iqtree_boottrees_1000_congruent) of alleles (directories) that do not show 100 random boottrees re-labelled. We detect 79 alleles with some homeolog without labelling. We copy these to the filder 2_6_iqtree_boottrees_1000_congruent_round_2 and repeat the previous steps. Run in the folder 2_3_iqtree_boottrees_1000_congruent
cat list_alleles_round_2.txt | xargs cp -r --parents -t ../2_6_iqtree_boottrees_1000_congruent_round_2/
2.10) We put together in a single boottrees file each of the 100 boottrees (additional round) of each allele.
Run in the folder 2_6_iqtree_boottrees_1000_congruent_round_2
for d in *.relabelled;
do for f in *.ph;
do (cd $d && cat $f | sed 's/;/;\n/g' > $d.boottrees);
done; done
We move the concatenated boottrees files to new directory 2_7_iqtree_boottrees_1000_congruent_round_2_concatenate
for d in *.relabelled;
do for f in *.boottrees;
do (cd $d && mv $f ../../2_7_iqtree_boottrees_1000_congruent_round_2_concatenate);
done; done
We rename the boottrees files in 2_7_iqtree_boottrees_1000_congruent_round_2_concatenate directory
for file in *.boottrees;
do mv "$file" "$(basename "$file" .boottrees.relabelled.boottrees).boottrees";
done
2.11) Re-label and compute the statistics of the boottrees (additional round) with congruent diploid skeleton for each directory/allele
We run the scripts/_bootstrap_label_stats.pl tool to re-label the boottrees and calculate the statistics of homeolog-types (A, B, C,...)
nohup scripts/_bootstrap_label_stats.pl \
2_7_iqtree_boottrees_1000_congruent_round_2_concatenate > stats_all_boottrees_congruent_round_2.tsv &
Run in 2_7_iqtree_boottrees_1000_congruent_round_2_concatenate to copy to 2_8_iqtree_boottrees_100_random_congruent_round_2 directory
for d in *.relabelled;
do for f in *.root.ph;
do shuf -n 100 -e $d/$f | xargs cp --parents -t ../2_8_iqtree_boottrees_100_random_congruent_round_2;
done; done
We check the amount of boottrees in 2_8_iqtree_boottrees_100_random_congruent_round_2 directory
ls -R | grep -c "root.ph"
7,900 boottrees (79 directories/alleles x 100 boottrees per directory)
We count the number of boottrees per directory (allele) to make sure the number is 100:
for d in *.relabelled;
do (cd $d && ls | wc | cut -d" " -f5);
done
100
100
100
...
we put together in a single boottrees file each of the hundred trees of each allele (it will be create into each directory the boottrees file)
for d in *.relabelled;
do for f in *.ph;
do (cd $d && cat $f | sed 's/;/;\n/g' > $d.boottrees);
done; done
We move the new boottrees files to the new 2_9_iqtree_boottrees_100_random_congruent_round_2_concatenate directory
for d in *.relabelled;
do for f in *.boottrees;
do (cd $d && mv $f ../../2_9_iqtree_boottrees_100_random_congruent_round_2_concatenate);
done; done
We recover the 79 .boottrees files and rename the boottrees files
for file in *.boottrees;
do mv "$file" "$(basename "$file" .boottrees.relabelled.boottrees).boottrees";
done
2.13) Re-label and compute the statistics of the random selection of 100 boottrees (additional round) with congruent diploid skeleton for each directory/allele. We run the scripts/_bootstrap_label_stats.pl tool to re-label the boottrees and calculate the statistics of homeolog-types (A, B, C,...)
nohup scripts/_bootstrap_label_stats.pl \
2_9_iqtree_boottrees_100_random_congruent_round_2_concatenate/ > stats_100_boottrees_ramdom_congruent_round_2.tsv &
According to the congruent diploid skeleton statistics and the re-labelling of boottrees (two rounds), we replace with gaps those homeologs that do not reach the minimum threshold of 100 boottrees with right diploid skeleton, or wrong labelling according to the bootstrapping analyses. In some cases, the complete gene (with multiple homeolgos) with every homeolgo wrong can be removed. We copy the files "*.fa.block.label.fna.trimmed.fna (04_congruent_and_labelled_files)" to the nwe directory "06_labelled_alignment_corrected". These files (congruent genes) are edited changing wrong labelled alleles, according to bootstrapping analyses, by gaps. Remember to update the labelled statistics subtracting the gapped alleles. Finally 560 homeologs and 7 complete genes (alignment) are removed/gapped. We keep 1,405 homeologs and 322 alignments/genes (without removing under-represeted alleles (see next steps)). These file are saved such as compressed file fasta.label.trimmed.tar.bz2.
During the previous steps, some columns could have been transformed into "allgaps" columns. To check this, trimAl is used. We run the commands in 06_labelled_alignment_corrected. These files are saved such as compressed file fasta.label.trimmed.noallgaps.tar.bz2.
for FILE in *.fna; do
echo $FILE;
bin/trimal-1.4.1/source/trimal -in $FILE -out $FILE.noallgaps.fna -noallgaps -keepheader -keepseqs;
done
2.16) Remove the underrepresented alleles/homeolgos (keep only the homeologs with 10% of representation in the polyploid species) from .trimmed files (see /06_labelled_alignment_corrected/fasta.label.trimmed.tar.bz2)
We removed the following underrepresented alleles/homeologs for each gene (MSAs)
Bmex_D; Bmex_E; Bmex_F; Bmex_G; Bmex_H; Bmex_I
Bboi_D; Bboi_F; Bboi_G; Bboi_H; Bboi_I
Bret_B; Bret_D; Bret_F; Bret_H; Bret_I
Bhyb_A; Bhyb_C; Bhyb_E; Bhyb_F; Bhyb_G; Bhyb_H; Bhyb_I
Brup_A; Brup_B; Brup_C; Brup_D
Bpho_A; Bpho_B; Bpho_C; Bpho_D;
B422_A; B422_B; B422_C; B422_D;
We make a list of taxa includign dipoloids (Osat, Hvul, ...) + polyploid homeologs (Bmex_A, ..) represented (list_taxa.txt) in the new folder 07_labelled_alignment_corrected_filtered.
e.g.
Osat
Hvul
Bsta
Bdis
Barb
Bpin
Bsyl
Bmex_A
...
We copy the files .trimmed from 06_labelled_alignment_corrected directory to 07_labelled_alignment_corrected_filtered directory and execute the folowing lines:
for FILE in *.trimmed.fna; do cat list_taxa.txt \
| awk '{gsub("_","\\_",$0);$0="(?s)^>"$0".*?(?=\\n(\\z|>))"}1' | pcregrep -oM -f - $FILE > $FILE.filtered.fna; done
After this step, we check the MSA to confirm that there is still at least one homologous sequence.
grep -E 'Bmex|Bboi|Bret|Bhyb|Brup|Bpho|B422' *filtered.fna | cut -d":" -f1 | uniq > list_files_with_polyploid_allele_represented.txt
In this case we recover 322 MSA with at least one homeolgos, so all genes (MSAs) have at least one polyploid allele (homeolog) and we do not remove any complete gen. These .filtered.fna files are saved such as compressed file fasta.label.trimmed.filtered.tar.bz2 in the folder 07_labelled_alignment_corrected_filtered
During the process some columns could have been transformed into "allgaps" columns. To check this, trimAl is used.
Run in 07_labelled_alignment_corrected_filtered directory
for FILE in *filtered.fna; do
echo $FILE;
bin/trimal-1.4.1/source/trimal -in $FILE -out $FILE.noallgaps.fna -noallgaps -keepheader -keepseqs;
done
The filtered and corrected files are saved such as compressed fasta.label.trimmed.filtered.noallgaps.tar.bz2 files.
2.18) Phylogenomic analysis of concatenated labelled, filtered and corrected genes/MSAs ("Homeologs' ML consensus tree")
We create the coordinate file and concatenate the MSAs using get_phylomarkers/concat_alignments.pl tool
ls -1 *.filtered.fna.noallgaps.fna > list_genes.txt
bin/concat_alignments.pl list_genes.txt > MSA_labelled_filtered.fna
We format the concatenation_coordinates.txt to crate the partitional file input for iqtree execution.
grep -v -P "^#" concatenation_coordinates.txt | perl -lne 'if(/(\d+-\d+)$/){ print "DNA, part$. = $1" }' > MSA_labelled_filtered_partitions.txt
We run iqtree to infer the "Homeologs' ML consensus tree"
nohup bin/iqtree-1.6.12-Linux/bin/iqtree -s MSA_labelled_filtered.fna -spp MSA_labelled_filtered_partitions.txt -m MFP -AICc -alrt 1000 -bb 1000 -nt AUTO &
**Note:**This result (MSA_labelled_filtered_partitions.txt.treefile) is the "Homeologs' ML consensus tree" (in the paper the labelled homeologs are represented using lower-case letters (a, b,...)
To find out if some of these homeologs (a, b, c ...) refer to the same subgenome, and therefore to amalgamate or collapse them, we use different approaches (chromosome counts, patristic distances, PCoA-MST (Principal coordinates analysis-minimum spanning tree), homeolog grafting distributions (from homeolog grafting frequencies from 100 re-labelled boottrees). If the homeolog types match subgenomes (single-allelic subgenomes), it will be not necessary to amalgamate or collapse the homeolog sequences. However, if multiple homeolog types match one subgenome (compound-allelic subgenomes), it will be necessary to amalgamate these homeologs into one to obtain the corresponding subgenome. According to chromosome counts, patristic distances, PCoA and homeolog distribution, we will compute the "Subgenomic ML consensus tree". Note: The subgenomes are shown such as upper-case letters (A1, A2, B...) in the "Subgenomic ML consensus tree" of the paper. Note2: The files of this stage are saved in the folder 08_labelled_alignment_corrected_filtered_subgenomes.
See previous studies and ourself chromosome counts and genomic size analyses (see paper).
Pairwise patristic distances between the Brachypodium diploid orthologous branches and the polyploid homeologous subgenomic branches of the "Homeologs' ML consensus tree" based on 322 nuclear core genes, and 1,307 orthologous and homeologous sequences. Patristic distances were calculated with Geneious R11.1.5. See Excel file "08_1_patristic distances.xlsx"
**Note:**You can calculate the patristic distances using other softwares, includig open source softwares, such as R packages (e.g. adephylo and ape)
3.3) Principal Coordinate Analysis (PCoA) and superimposed Minimum Spanning Tree (MST) generated from patristic distances and the Homeologs’ ML consensus tree.
Separate Principal Coordinate Analysis (PCoA) was performed using the patristic distance values of the data matrices with NTSYS-pc v2.10j software, and minimum spanning trees (MST) were superimposed on each of the PCoA plots (Figures are shown in the paper).
**Note:**Alternative open source software can be used for this step.
The homeolog grafting distributions are cumputed analysing the re-labelling bootstrapping frequencies (100 boottrees), removing underrepresented alleles, using the R-script script/homeolog_distribution.R.
Rscript scripts/homeolog_distribution.R 10 08_2_input_homeolog_distribution.tsv 08_2_output_homeolog_distribution.txt
Note: The value 10 corresponds to the 10% of the highest frequency value, which represents the lowest threshold allowed to include additional branches in the distribution of that homeolog-type
According to the previous analyses computed using the Subgenome Assignment algorithm, we can amalgamate or collapse the homeologs what correspond to the same subgenome. This step is developed in the folder 08_3_amalgamate_homeologs.
Correspondences between homeologs (lower-case) and subgenomes (upper-case):
Bhyb --> B (b); D (d)
Bmex --> A (a+b+c)
Bboi --> A (a+b+c+e)
Bret --> A (a+c); E_core (e+g)
Brup --> E (e); G (f+g+h+i)
Bpho, B422 --> E (e); G (f+g+h+i)
We extract from 07_labelled_alignment_corrected_filtered/MSA_labellel_filtered.fna the aligned sequences to collapse (consensus) for each compound-allelic subgenomes.
First, we extract the aligment including diploid, outgroup and single allelic subgenomes sequences:
perl -lne 'if(/^(>.*)/){ $head=$1 } else { $fa{$head} .= $_ } END{ foreach $s (keys(%fa)){ print "$s\n$fa{$s}\n" if($s =~ /Osat/ || $s =~ /Hvul/ || $s =~ /Bsta/ || $s =~ /Bdis/ || $s =~ /Barb/ || $s =~ /Bpin/ || $s =~ /Bsyl/ || $s =~ /Bhyb_B/ || $s =~ /Bhyb_D/ || $s =~ /Brup_E/ || $s =~ /Bpho_E/|| $s =~ /B422_E/) }}' MSA_labelled_filtered.fna > Diploids_outgroups_Bhyb_b_Bhyb_d_Brup_e_Bpho_e_B422_e.fna
Second, we extract the alignment for each compound-allelic subgenomes sequences:
perl -lne 'if(/^(>.*)/){ $head=$1 } else { $fa{$head} .= $_ } END{ foreach $s (keys(%fa)){ print "$s\n$fa{$s}\n" if($s =~ /Bmex_A/ || $s =~ /Bmex_B/ || $s =~ /Bmex_C/) }}' MSA_labelled_filtered.fna > Bmex_a_b_c.fna
perl -lne 'if(/^(>.*)/){ $head=$1 } else { $fa{$head} .= $_ } END{ foreach $s (keys(%fa)){ print "$s\n$fa{$s}\n" if($s =~ /Bboi_A/ || $s =~ /Bboi_B/ || $s =~ /Bboi_C/ || $s =~ /Bboi_E/) }}' MSA_labelled_filtered.fna > Bboi_a_b_c_e.fna
perl -lne 'if(/^(>.*)/){ $head=$1 } else { $fa{$head} .= $_ } END{ foreach $s (keys(%fa)){ print "$s\n$fa{$s}\n" if($s =~ /Bret_A/ || $s =~ /Bret_C/) }}' MSA_labelled_filtered.fna > Bret_a_c.fna
perl -lne 'if(/^(>.*)/){ $head=$1 } else { $fa{$head} .= $_ } END{ foreach $s (keys(%fa)){ print "$s\n$fa{$s}\n" if($s =~ /Bret_E/ || $s =~ /Bret_G/) }}' MSA_labelled_filtered.fna > Bret_e_g.fna
perl -lne 'if(/^(>.*)/){ $head=$1 } else { $fa{$head} .= $_ } END{ foreach $s (keys(%fa)){ print "$s\n$fa{$s}\n" if($s =~ /Bpho_F/ || $s =~ /Bpho_G/ || $s =~ /Bpho_H/|| $s =~ /Bpho_I/) }}' MSA_labelled_filtered.fna > Bpho_f_g_h_i.fna
perl -lne 'if(/^(>.*)/){ $head=$1 } else { $fa{$head} .= $_ } END{ foreach $s (keys(%fa)){ print "$s\n$fa{$s}\n" if($s =~ /B422_F/ || $s =~ /B422_G/ || $s =~ /B422_H/|| $s =~ /B422_I/) }}' MSA_labelled_filtered.fna > B422_f_g_h_i.fna
perl -lne 'if(/^(>.*)/){ $head=$1 } else { $fa{$head} .= $_ } END{ foreach $s (keys(%fa)){ print "$s\n$fa{$s}\n" if($s =~ /Brup_F/ || $s =~ /Brup_G/ || $s =~ /Brup_H/|| $s =~ /Brup_I/) }}' MSA_labelled_filtered.fna > Brup_f_g_h_i.fna
We can use the script Consensus.pl:
e.g.
perl bin/Consensus.pl -iupac -in Bmex_a_b_c.fna -out Bmex_A_consensus.fna
We concatenate the diploids, outgroups, single-allelic and compound-allelic (.consensus.fna files) subgenomes sequences.
cat Diploids_outgroups_Bhyb_b_Bhyb_d_Brup_e_Bpho_e_B422_e.fna *consensus.fna > MSA_labelled_filtered_consensus.fna
We check sequences, rename and sort them, and we save the result such as MSA_labelled_filtered_consensus_final.fna in the folder 08_labelled_alignment_corrected_filtered_subgenomes.
Note: the coordinates of partitions are the same as previous step (07_labelled_alignment_corrected_filtered)
nohup bin/iqtree-1.6.12-Linux/bin/iqtree -s MSA_labelled_filtered_consensus_final.fna -spp MSA_labelled_filtered_consensus_partitions.txt -m MFP -AICc -alrt 1000 -bb 1000 -nt AUTO &
The tree is saved such as MSA_labelled_filtered_consensus_partitions.txt.treefile file.
Once we have inferred the subgenomic ML consensus tree, we can optionally perform dating analyses with BEAST2, which should be installed accordingly (not provided). We have conducted this analysis in the folder 09_Beast2_analysis.
The final subgenomes inferred for each polyploid species are:
Bhyb --> B; D
Bmex --> A1
Bboi --> A2
Bret --> A2; E1
Brup --> E2; G
Bpho6, Bpho422 --> E2; G
We format the FASTA file MSA_labelled_filtered_consensus_final.fna to NEXUS file MSA_labelled_filtered_consensus_final.nex (saved such as compressed file nex.gz) and create the XML file MSA_labelled_filtered_consensus_final.xml (saved such as compressed file xml.gz) using BEAUTI software. Thus, we exectute Beast2:
DISPLAY="" nohup java -Xms200g -Xmx400g -jar beast/lib/beast.jar -beagle_SSE -instances 10 -threads 10 MSA_labelled_filtered_consensus_final.xml &
We check the log file to confirm that all parameters show ESS > 200.
java -Xms200g -Xmx300g -jar soft/Tracer_v1.6/lib/tracer.jar file.log
We run Treeannotator to compute the MCC tree.
nohup beast/bin/treeannotator -burnin 10 -heights median -lowMem MSA_labelled_filtered_consensus_final.trees MSA_labelled_filtered_consensus_beast2_phylogram.tre &
The dating MCC tree is shown in the file MSA_labelled_filtered_consensus_beast2_phylogram.tre