Analysis for Debelius et al, "The local tumor microbiome is associated with survival in late-stage colorectal cancer patients". medrxiv. (doi: 10.1101/2022.09.16.22279353)
Raw sequence data and mdetata can be found on ENA under accession PRJEB57580.
The stata script is in the stata
folder. The code should be runable on the metadata_paired.tsv
files in the ipynb/data/
folder.
Microbiome data was prepared and processed through Jupyter notebooks.
This code was initially run in a qiime2-2022.11 enviroment with plugins for gemelli, DEICODE, and empress.
To replicate the qiime2 enviroment, install qiime2 2021.11 according to the installation instructions which are most appropriate for your enviroment. (Click through when it warns you there is a new version!)
Then, activate the enviroment and install the qiime2 plugings according to installation instructions:
pip install gemelli
pip install deicode
pip install git+https://github.com/biocore/empress
qiime dev refresh-cache
Jupyter lab may be optionally installed to make Jupyter notebook handling functional, although it should not be required.
Differential ranking requires pystan; we used pystan 3.4. This is operation system dependent; our group was able to get pystan to work with Mac OS 15.2 but unable to coerce it on a linux system. (If you dont want to experiment with the pystan run, the files are included.)
pip install pystan=3.4
The microbiome data was denoised using the standardized pipeline in CTMR bio Amplicon workflow](https://github.com/ctmrbio/Amplicon_workflows). The dada2 denoised text table was used; the original table can be found in the ipynb/data/raw_data
directory.
The notebook parases the dada2 table for qiime2. It also filters the table to match the metadata and removes features with undefined depths.
The notebook performs fragment insertion into the Silva 128 fragment insertion backbone and uses the phylogenetic tree to construct a feature table.
The feature table is rarefied to 2500 sequences/sample. Alpha diveristy (observed features, shannon, and Simpson diversity) and beta diversity (Bray Curtis, jaccard, weighted UniFrac and unweighted UniFrac) were calcualted on the rarefied table. Beta diversity measured through Aitchison distance was calcualted using an unrarefied table with a pseudocount of 1.
We also generated the complex tensor factorization (CTF) via Gemelli and rPCA via DEICODE.
Stan differential rankings are calculated using a linear mixed effects model. If you are unable to get stan to work on your system (or just don't like sleeping with your laptop overnight), output files are in the data/differential_ranking
folder.
To standardize display, we defined a set of colors for taxa.