Skip to content

Infinitesites

sabifo4 edited this page Nov 13, 2024 · 4 revisions

infinitesites

You can compile the infinitesites program as follows:

cc -o infinitesites -DINFINITESITES -O3 mcmctree.c tools.c -lm

This PAML program generates the limiting posterior distribution when the number of sites in the sequence alignment approaches infinity (Yang and Rannala 2006; Rannala and Yang 2007). Instead of reading and analysing sequence alignments, the program use the estimated branch lengths as the data, considering them to be without errors. For the clock model (clock = 1), the input file is called FixedDsClock1.txt, while for clock = 2 or 3, the file is called FixedDsClock23.txt. There is an example in the examples/DatingSoftBound/ folder, and the mcmctree tutorial explains how to run this program.

With clock = 1, the file FixedDsClock1.txt should have the following format:

9
1.0 0.7 0.2 0.4 0.1 0.8 0.3 0.5
1.5
0.8
1.8

The first number is the number of species, $s = 9$ in the example. The next line has $s – 1$ node ages (the distances from the $s – 1$ internal nodes in the tree to the present time). Here, the implied tree topology must be the same as that in the tree file referred to (i.e., variable treefile) in the control file mcmctree.ctl (i.e., common name used for the control files that execute MCMCtree, you may have renamed your control file to another file name), so that the node numbers stay the same. If you used BASEML or CODEML (with clock = 1 in the control files) to estimate the branch lengths under the clock using the same rooted tree, the output should be in the correct order.

If there are more than one locus, the next lines will have the ages of the root for those loci, again measured by distance. The example above shows 4 loci. The distance from the root to the tips are 1, 1.5, 0.8 and 1.8 at the four loci, respectively (i.e., line 1 (first number), then lines 2, 3, 4 in the example shown above). Note that if the clock holds, the node ages should be proportional between loci, so that additional loci provide no extra information about the relative node ages.

With clock = 2 or clock = 3, the input file FixedDsClock23.txt should have the branch lengths for the unrooted trees at the multiple loci. The following example is for 3 loci, and there are 7 species in the input tree file:

7
((((human: 0.029043, (chimpanzee: 0.014557, bonobo: 0.010908): 0.016729):
0.015344, gorilla: 0.033888): 0.033816, (orangutan: 0.026872, sumatran:
0.022437): 0.069648): 0.073309, gibbon: 0.024637);

((((human: 0.012463, (chimpanzee: 0.002782, bonobo: 0.003835): 0.003331):
0.004490, gorilla: 0.014278): 0.006308, (orangutan: 0.010818, sumatran:
0.008845): 0.030551): 0.004363, gibbon: 0.029246);

((((human: 0.270862, (chimpanzee: 0.066698, bonobo: 0.056883): 0.124104):
0.139082, gorilla: 0.310797): 0.391342, (orangutan: 0.152555, sumatran:
0.114176): 0.696518): 0.017607, gibbon: 1.394718);

The tree topology is rooted, with a bifurcation at the root. The program then collapses the two branches around the root into one branch before doing any analysis. I think the rooted tree should be the same tree as in the tree file referred to by mcmctree.ctl. Every species has to be present at every locus.

Clone this wiki locally