Skip to content

Commit

Permalink
Merge pull request #33 from bioDS/fix_gt16_error_model_annotations
Browse files Browse the repository at this point in the history
Fix gt16 error model annotations
  • Loading branch information
kche309 authored Aug 21, 2022
2 parents fe1e873 + 9996584 commit 40d3df3
Show file tree
Hide file tree
Showing 10 changed files with 137 additions and 35 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# BEAST phylonco
This is a BEAST2 package for Bayesian inference of molecular data for cancer evolution. This package implements error models and substitution models for inference of timed trees in a Bayesian MCMC framework.

[Paper](https://doi.org/10.1101/2021.03.17.435906): Chen et al. "Accounting for errors in data improves timing in single-cell cancer evolution." (2022)
[Paper](https://doi.org/10.1093/molbev/msac143): Chen, K., Moravec, J. C., Gavryushkin, A., Welch, D., & Drummond, A. J. (2022). Accounting for errors in data improves divergence time estimates in single-cell cancer evolution. Molecular biology and evolution, 39(8), msac143.

See [bioDS/beast-phylonco-paper](https://github.com/bioDS/beast-phylonco-paper) for datasets and analyses in the paper.

Expand Down Expand Up @@ -84,7 +84,7 @@ For manual package installation instructions, see [here](http://www.beast2.org/m
## Citations
* BEAST v2.5: [Bouckaert at al. (2019)](https://doi.org/10.1371/journal.pcbi.1006650)

* BEAST2 Error models: [Chen et al. (2022)](https://doi.org/10.1101/2021.03.17.435906)
* BEAST2 Error models: [Chen et al. (2022)](https://doi.org/10.1093/molbev/msac143)

* GT16 model: [Kozlov et al. (2022)](https://doi.org/10.1186/s13059-021-02583-w)

Expand Down
10 changes: 8 additions & 2 deletions phylonco-lphy/doc/distributions/GT16ErrorModel.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,21 @@
GT16ErrorModel distribution
===========================
GT16ErrorModel()
----------------
GT16ErrorModel(Double **epsilon**, Double **delta**, Alignment **alignment**)
-----------------------------------------------------------------------------

An error model distribution on an phased genotype alignment.

### Parameters

- Double **epsilon** - the sequencing and amplification error probability.
- Double **delta** - the allelic drop out probability.
- Alignment **alignment** - the genotype alignment.

### Return type

- Alignment

### Reference

Alexey Kozlov, Joao Alves, Alexandros Stamatakis, David Posada (2021). CellPhy: accurate and fast probabilistic inference of single-cell phylogenies from scDNA-seq data. bioRxiv 2020.07.31.230292.[https://doi.org/10.1101/2020.07.31.230292](https://doi.org/10.1101/2020.07.31.230292)

3 changes: 2 additions & 1 deletion phylonco-lphy/doc/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Phylonco Language Reference (version 0.0.1)
Phylonco Language Reference (version 0.0.2)
===========================================
This an automatically generated language reference of the LinguaPhylo (LPhy) statistical phylogenetic modeling language.

Expand Down Expand Up @@ -32,6 +32,7 @@ Types
- [Taxa](types/Taxa.md)
- [Alignment](types/Alignment.md)
- [ContinuousCharacterData](types/ContinuousCharacterData.md)
- [MetaDataAlignment](types/MetaDataAlignment.md)
- [SiteModel](types/SiteModel.md)
- [TimeTree](types/TimeTree.md)

8 changes: 7 additions & 1 deletion phylonco-lphy/doc/types/Alignment.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ Methods

- **ages**
- gets the ages of these taxa as an array of doubles.
- **canonicalStateCount**
- the number of canonical states excluding ambiguous states in the alignment.
- **canonicalStates**
- the canonical states excluding ambiguous states.
- **dataType**
- get the data type of this alignment.
- **length**
Expand All @@ -14,7 +18,9 @@ Methods
- **nodeCount**
- the total number of nodes (left + internal) in a binary tree with these taxa.
- **stateCount**
- the number of possible states in the alignment.
- the number of possible states including ambiguous states in the alignment.
- **states**
- the possible states including ambiguous states.
- **taxa**
- the taxa of the alignment.
- **taxaNames**
Expand Down
31 changes: 31 additions & 0 deletions phylonco-lphy/doc/types/MetaDataAlignment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
MetaDataAlignment
=================
Methods
-------

- **ages**
- gets the ages of these taxa as an array of doubles.
- **canonicalStateCount**
- the number of canonical states excluding ambiguous states in the alignment.
- **canonicalStates**
- the canonical states excluding ambiguous states.
- **charset**
- return a partition alignment. If the string doesn't match charset's syntax, then check if the string matches a defined name in the nexus file. Otherwise it is an error. The string is referred to one partition at a call, but can be multiple blocks, such as d.charset("2-457\3 660-896\3").
- **dataType**
- get the data type of this alignment.
- **getTaxaNames**
- The names of the taxa.
- **length**
- gets the number of taxa.
- **nchar**
- The number of characters/sites.
- **nodeCount**
- the total number of nodes (left + internal) in a binary tree with these taxa.
- **stateCount**
- the number of possible states including ambiguous states in the alignment.
- **states**
- the possible states including ambiguous states.
- **taxa**
- the taxa of the alignment.
- **taxaNames**
- The names of the taxa.
2 changes: 2 additions & 0 deletions phylonco-lphy/doc/types/TimeTree.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ TimeTree
Methods
-------

- **branchCount**
- the total number of branches in the tree (returns nodeCount() - 1)
- **directAncestorCount**
- the total number of nodes in the tree that are direct ancestors (i.e. have a single parent and a single child, or have one child that is a zero-branch-length leaf).
- **extantCount**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,19 @@
import org.apache.commons.math3.random.RandomGenerator;
import phylonco.lphy.evolution.datatype.PhasedGenotype;

import java.util.Objects;
import java.util.SortedMap;
import java.util.TreeMap;
import java.util.*;

/**
* @author Alexei Drummond
* @author Kylie Chen
*/
@Citation(
value = "Alexey Kozlov, Joao Alves, Alexandros Stamatakis, David Posada (2021). CellPhy: accurate and fast probabilistic inference of single-cell phylogenies from scDNA-seq data. bioRxiv 2020.07.31.230292.",
title = "CellPhy: accurate and fast probabilistic inference of single-cell phylogenies from scDNA-seq data",
authors = {"Kozlov", "Alves", "Stamatakis", "Posada"},
year = 2021,
DOI = "https://doi.org/10.1101/2020.07.31.230292"
)
public class GT16ErrorModel implements GenerativeDistribution<Alignment> {

Value<Double> epsilon;
Expand All @@ -27,27 +33,37 @@ public class GT16ErrorModel implements GenerativeDistribution<Alignment> {

RandomGenerator random;

// Warning: this line below breaks GUI narrative from displaying correctly!
// for testing
public GT16ErrorModel() {}

public GT16ErrorModel(@ParameterInfo(name = epsilonParamName, description = "the sequencing and amplification error probability.") Value<Double> epsilon,
@ParameterInfo(name = deltaParamName, description = "the allelic drop out probability.") Value<Double> delta,
@ParameterInfo(name = alignmentParamName, description = "the genotype alignment.") Value<Alignment> alignment) {

this.epsilon = epsilon;
this.delta = delta;
// public GT16ErrorModel() {}

public GT16ErrorModel(
@ParameterInfo(name = epsilonParamName,
narrativeName = "sequencing and amplification error probability",
description = "the sequencing and amplification error probability.")
Value<Double> epsilon,
@ParameterInfo(name = deltaParamName,
narrativeName = "allelic dropout probability",
description = "the allelic drop out probability.")
Value<Double> delta,
@ParameterInfo(name = alignmentParamName,
narrativeName = "genotype alignment",
description = "the genotype alignment.")
Value<Alignment> alignment) {

if (alignment.value().getSequenceType() != phylonco.lphy.evolution.datatype.PhasedGenotype.INSTANCE) {
throw new RuntimeException("GT16ErrorModel can only be applied alignments of type " + PhasedGenotype.NAME);
}

this.epsilon = epsilon;
this.delta = delta;
this.alignment = alignment;
this.random = RandomUtils.getRandom();
}

@Override
public SortedMap<String, Value> getParams() {
SortedMap<String, Value> map = new TreeMap<>();
public Map<String, Value> getParams() {
Map<String, Value> map = new TreeMap<>();
map.put(epsilonParamName, epsilon);
map.put(deltaParamName, delta);
map.put(alignmentParamName, alignment);
Expand All @@ -58,12 +74,20 @@ public SortedMap<String, Value> getParams() {
public void setParam(String paramName, Value value) {
if (paramName.equals(epsilonParamName)) {
epsilon = value;
} else if (paramName.equals(deltaParamName)) delta = value;
else if (paramName.equals(alignmentParamName)) alignment = value;
} else if (paramName.equals(deltaParamName)) {
delta = value;
} else if (paramName.equals(alignmentParamName)) {
alignment = value;
}
else throw new RuntimeException("Unrecognised parameter name: " + paramName);
}

@GeneratorInfo(name = "GT16ErrorModel", description = "An error model distribution on an phased genotype alignment.")
@GeneratorInfo(
name = "GT16ErrorModel",
verbClause = "has",
narrativeName = "error model",
category = GeneratorCategory.TAXA_ALIGNMENT,
description = "An error model distribution on an phased genotype alignment.")
public RandomVariable<Alignment> sample() {

Alignment original = alignment.value();
Expand All @@ -84,11 +108,11 @@ public RandomVariable<Alignment> sample() {
}

public Value<Double> getEpsilon() {
return Objects.requireNonNull(epsilon);
return getParams().get(epsilonParamName);
}

public Value<Double> getDelta() {
return Objects.requireNonNull(delta);
return getParams().get(deltaParamName);
}

public Alignment getOriginalAlignment() {
Expand Down Expand Up @@ -142,4 +166,4 @@ public double[][] errorMatrix(double epsilon, double delta) {
};
return errorMatrix;
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import lphy.graphicalModel.types.DoubleArray2DValue;

/**
* Created by adru001 on 2/02/20.
* @author Alexei Drummond
*/
@Citation(
value = "Alexey Kozlov, Joao Alves, Alexandros Stamatakis, David Posada (2021). CellPhy: accurate and fast probabilistic inference of single-cell phylogenies from scDNA-seq data. bioRxiv 2020.07.31.230292.",
Expand All @@ -28,7 +28,9 @@ public GT16(@ParameterInfo(name = ratesParamName, narrativeName = "relative rate

super(meanRate);

if (rates.value().length != 6) throw new IllegalArgumentException("Rates must have 6 dimensions.");
if (rates.value().length != 6) {
throw new IllegalArgumentException("Rates must have 6 dimensions.");
}

setParam(ratesParamName, rates);
setParam(freqParamName, freq);
Expand Down Expand Up @@ -114,4 +116,4 @@ public Value<Double[]> getRates() {
public Value<Double[]> getFreq() {
return getParams().get(freqParamName);
}
}
}
Original file line number Diff line number Diff line change
@@ -1,12 +1,20 @@
package phylonco.lphy.evolution.alignment;

import lphy.evolution.Taxa;
import lphy.evolution.alignment.Alignment;
import lphy.evolution.alignment.SimpleAlignment;
import lphy.graphicalModel.Value;
import org.junit.Test;
import phylonco.lphy.evolution.datatype.PhasedGenotype;

import java.util.Arrays;

import static org.junit.Assert.assertArrayEquals;
import static org.junit.Assert.assertEquals;

/**
* @author Kylie Chen
*/
public class GT16ErrorModelTest {

private static double DELTA = 1e-15;
Expand Down Expand Up @@ -46,12 +54,25 @@ public double[][] getExpectedMatrix(double epsilon, double delta) {
return expectedMatrix;
}

private Value<Alignment> getDummySequence() {
int numTaxa = 10;
int numChar = 200;
Taxa taxa = Taxa.createTaxa(numTaxa);
PhasedGenotype seqType = PhasedGenotype.INSTANCE;
Value<Alignment> alignment = new Value<>("alignment", new SimpleAlignment(taxa, numChar, seqType));
return alignment;
}

@Test
public void errorMatrixTestSmallErrors() {
double epsilon = 0.01;
double delta = 0.02;
Double epsilon = 0.01;
Double delta = 0.02;

Value<Double> ep = new Value<>("epsilon", epsilon);
Value<Double> dt = new Value<>("delta", delta);
GT16ErrorModel errorModel = new GT16ErrorModel(ep, dt, getDummySequence());

double[][] observedMatrix = new GT16ErrorModel().errorMatrix(epsilon, delta);
double[][] observedMatrix = errorModel.errorMatrix(epsilon, delta);
double[][] expectedMatrix = getExpectedMatrix(epsilon, delta);

double[] observed = Arrays.stream(observedMatrix).flatMapToDouble(Arrays::stream).toArray();
Expand All @@ -65,7 +86,11 @@ public void errorMatrixTestLargeErrors() {
double epsilon = 0.1;
double delta = 0.5;

double[][] observedMatrix = new GT16ErrorModel().errorMatrix(epsilon, delta);
Value<Double> ep = new Value<>("epsilon", epsilon);
Value<Double> dt = new Value<>("delta", delta);
GT16ErrorModel errorModel = new GT16ErrorModel(ep, dt, getDummySequence());

double[][] observedMatrix = errorModel.errorMatrix(epsilon, delta);
double[][] expectedMatrix = getExpectedMatrix(epsilon, delta);

double[] observed = Arrays.stream(observedMatrix).flatMapToDouble(Arrays::stream).toArray();
Expand All @@ -78,8 +103,13 @@ public void errorMatrixTestLargeErrors() {
public void errorMatrixRowSumsToOne() {
double delta = 0.1;
double epsilon = 0.2;

Value<Double> ep = new Value<>("epsilon", epsilon);
Value<Double> dt = new Value<>("delta", delta);
GT16ErrorModel errorModel = new GT16ErrorModel(ep, dt, getDummySequence());

double expected = 1.0;
double[][] observedMatrix = new GT16ErrorModel().errorMatrix(epsilon, delta);
double[][] observedMatrix = errorModel.errorMatrix(epsilon, delta);
for (int row = 0; row < observedMatrix.length; row++) {
double sum = 0.0;
for (int col = 0; col < observedMatrix[row].length; col++) {
Expand All @@ -88,4 +118,4 @@ public void errorMatrixRowSumsToOne() {
assertEquals(expected, sum, DELTA);
}
}
}
}
2 changes: 1 addition & 1 deletion phylonco-lphybeast/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ plugins {
}

// version has to be manually adjusted to keep same between version.xml and here
//version = "0.0.6-SNAPSHOT"
version = "0.0.7-SNAPSHOT"

java {
sourceCompatibility = JavaVersion.VERSION_17
Expand Down

0 comments on commit 40d3df3

Please sign in to comment.