Skip to content
sabifo4 edited this page Nov 1, 2024 · 3 revisions

yn00

The program yn00 implements the method of Yang and Nielsen (2000) for estimating synonymous and nonsynonymous substitution rates between two sequences ($d_{S}$ and $d_{N}$). The method of Nei and Gojobori (1986) is also included. The ad hoc method implemented in the program accounts for the transition/transversion rate bias and codon usage bias, and is an approximation to the ML method accounting for the transition/transversion rate ratio and assuming the "F3x4" codon frequency model. We recommend that you use the ML method (i.e., by specifying runmode= -2 and CodonFreq = 2 in the control file to execute CODEML) as much as possible even for pairwise sequence comparison.

Below, you can find an example of a control file to run yn00, normally named yn00.ctl:

seqfile    = abglobin.nuc * path to input sequence file
outfile    = yn           * path to main output file

verbose    = 0            * 1: detailed output (list sequences), 0: concise output
icode      = 0            * 0:universal code; 1:mammalian mt; 2-10:see below or check the PAML documentation

weighting  = 0            * weighting pathways between codons (0/1)?
commonf3x4 = 0            * use one set of codon freqs for all pairs (0/1)?

In this example, the path to the input sequence file (seqfile) and the path to the main output file (outfile) have been specified. Note that, if the control file is saved in the same folder as the input files or where the output files are to be saved, you can just type the name of such files (i.e., no need for absolute/relative paths, no spaces or special characters in the file name).In addition, sites (codons) involving alignment gaps or ambiguity nucleotides in any sequence are removed from all sequences. As for other programs, variable verbose is used to decide how much information is to be printed in the output file, and variable icode to specify the genetic code (see below for more details)

The variable weighting decides whether equal weighting or unequal weighting will be used when counting differences between codons. The two approaches will be different for divergent sequences, and unequal weighting is much slower computationally. The transition/transversion rate ratio $\kappa$ is estimated for all sequences in the data file and used in subsequent pairwise comparisons. Variable commonf3x4 specifies whether codon frequencies (based on the "F3x4 model" in CODEML) should be estimated for each pair or for all sequences in the data.

Besides the main result file, the program also generates three distance matrices saved in the following files: 2YN.dS file for synonymous rates, 2YN.dNfile for nonsynonymous rates, 2YN.t file for the combined codon rate ($t$ is measured as the number of nucleotide substitutions per codon). Those are lower-diagonal distance matrices and are directly readable by some distance programs such as NEIGHBOR in Felsenstein's PHYLIP package.


The genetic codes implemented in PAML and enabled via variable icode are the following:

  • 0: universal,
  • 1: mammalian mt.
  • 2: yeast mt.
  • 3: mold mt.
  • 4: invertebrate mt.
  • 5: ciliate nuclear
  • 6: echinoderm mt.
  • 7: euplotid mt.
  • 8: alternative yeast nu.
  • 9: ascidian mt.
  • 10: blepharisma nu.

Note

These codes correspond to transl_table 1 to 11 of GENEBANK.

Clone this wiki locally