Replies: 1 comment 1 reply
-
Hi, There may be different ways to approach this. A simple method is to use binary encoding to represent the allele frequency for each marker. In this encoding: 0000 represents 0 copies of reference alleles, This binary representation reflects the cumulative presence of reference alleles, enabling the calculation of evolutionary distances that correlate with differences in allele frequency. For example, consider marker i, where sample X, Y, and Z have 3, 4, and 5 copies of reference alleles, respectively. Using this encoding, the evolutionary distance between sample X and sample Y would likely be shorter than the distance between sample X and sample Z for marker i. This is because the binary encoding preserves the relative differences in allele frequency. After conversion, the data can be analyzed using IQ-TREE under substitution models designed for binary traits. Please let me know if anyone suggests a better method — I would like to learn more. |
Beta Was this translation helpful? Give feedback.
-
Hi,
I am trying to generate a phylogenetic tree for tetraploid species using allele dosage genotype. Below is the format of our data:
marker 1 marker 2 marker 3 …… marker 2000
sample 1 0 3 2 …… 2
sample 2 4 1 3 …… 4
sample 3 2 4 0 …… 0
…… …… …… …… …… ……
Sample 1000 2 0 1 …… 3
Where ‘0, 1, 2, 3 and 4” represent four, three, two, one and zero copies of reference alleles (and 0, 1, 2, 3, and 4 copies of alternative alleles). Could you please provide suggestions on how to prepare the input file for this purpose?
Thank you for your help!
Beta Was this translation helpful? Give feedback.
All reactions