- Install environment:
conda env create --file environment.yaml
- Activate environment:
conda activate snakemake
- Run analysis:
cd workflows/{analysis of choice}
snakemake --forceall --cores
csv-tables containing the similarities between all samples are in the workflows/{comparison}/aggregated_sim
folder.
Conducted comparisons are:
- megago based on real-world data
- string based comparison on real-world data
- megago based on random sets of go terms of various sizes
Input data is from here
according to methods in preprint of metaproteomics tool survey
GO-terms are sampled from SwissProt to resemble real-world data. Two sets of equal size are drawn n times, repeated for different sample sizes.