This repo hosts a minimal implementation of GradME
bme_jax/
: Balanced Minimum Evolution and distance-based optimisation with Phylo2Vec in Jaxcfg/
: Example configuration filesutils/
: Utility functions for manipulation of sequence and tree data.
- Install R (version used here: 4.2.2) if needed. The latest version of R should also work.
- Setup the
gradme
environment using conda/mamba and activate the environment:
conda env create -f env.yml
conda activate gradme
- Optional: if you have GPUs/TPUs, you might need to update your installation of Jax. Follow the instructions at https://github.com/google/jax
- Install
phangorn
in R (4.2.2 or above):
install.packages("phangorn")
The following datasets were used:
Dataset | Sites | Taxa | Type | Taxonomic rank | Access | TreeBASE ID |
---|---|---|---|---|---|---|
DS1 | 1,949 | 27 | rRNA (18S) | Tetrapods | [1] | M2017 |
DS2 | 2,520 | 29 | rRNA (18S) | Acanthocephalans | [1] | M2131 |
DS3 | 1,812 | 36 | mtDNA | Mammals; mainly Lemurs | [1] | M127 |
DS4 | 1,137 | 41 | rDNA (18S) | Fungi; mainly Ascomycota | [1] | M487 |
DS5 | 378 | 50 | DNA | Lepidoptera | [1] | M2907 |
DS6 | 1,133 | 50 | rDNA (28S) | Fungi; mainly Diaporthales | [1] | M220 |
DS7 | 1,824 | 59 | mtDNA | Mammals; mainly Lemurs | [1] | M2449 |
DS8 | 1,008 | 64 | rDNA (28S) | Fungi; mainly Hypocreales | [1] | M2261 |
DS9 | 955 | 67 | DNA | Poaecae (grasses) | [1] | M2389 |
DS10 | 1,098 | 67 | DNA | Fungi; mainly Ascomycota | [1] | M2152 |
DS11 | 1,082 | 71 | DNA | Lichen | [1] | M2274 |
Eutherian | 1,338,678 | 37 | DNA | Eutherian Mammals | [2] | |
Jawed | 1,460-18,406 | 99 | AA | Gnathostomata (jawed vertebrates) | [3] | |
Primates | 232 | 14 | mtDNA | Mammals; mainly Primates | [4] |
- [1] https://bitbucket.org/XavMeyer/coevrj/src/master/data/adaptiveTreeProp/alignments/TreeBase/
- [2] https://datadryad.org/stash/dataset/doi:10.5061/dryad.3629v
- [3] https://datadryad.org/stash/dataset/doi:10.5061/dryad.r2n70
- [4] https://evolution.gs.washington.edu/book/datasets.html
- Note: We provide the fasta conversion directly here in
data/primates.fa
- Note: We provide the fasta conversion directly here in
DS1-DS8 are also available at: https://github.com/zcrabbit/vbpi-gnn/tree/main/data/hohna_datasets_fasta DS1-DS11 should also be available on TreeBASE using the TreeBASE, but the site was down on June 9, 2023.
Sources: see manuscript
- Download the datasets (in the FASTA format) mentioned above and place them in a
data/
folder (e.g., in the repo) - Update the configuration file
cfg/bme_config_v3.yml
, especiallyfasta_path
(the path to the FASTA file you want to analyse)- (You can also create your own configuration file based on the given template)
- Run the main optimisation script:
python -m bme_jax.main --config-path cfg/name_of_your_config_file.yml
or use thedemo.ipynb
notebook