Novel taxonomy-independent deep learning microbiome approach allows for accurate classification of human epithelial materials
Celia Díez Lópeza, Athina Vidakia, Arwin Ralfa, Diego Montiel Gonzáleza, Djawad Radjabzadehb, Robert Kraaijb,c, André G. Uitterlindenb,c, Cordula Haasd, Oscar Laoe,f, and Manfred Kaysera
- a Department of Genetic Identification, Erasmus MC University Medical Center Rotterdam, Rotterdam, the Netherlands
- b Department of Internal Medicine, Erasmus MC University Medical Center Rotterdam, Rotterdam, the Netherlands
- c Department of Epidemiology, Erasmus MC University Medical Center Rotterdam, Rotterdam, the Netherlands
- d Zurich Institute of Forensic Medicine, University of Zurich, Zurich, Switzerland
- e CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
- f Universitat Pompeu Fabra (UPF), Barcelona, Spain
Operating system: Linux only. Tested on Ubuntu 16.04LTS, but should also work on newer version of Ubuntu. It should be easy to made it work on other Linux distributions.
Install the following dependencies
apt-get install bwa
SAMtools: We recommend the newests versions of SAMtools (e.g. > 1.4.1)
wget https://github.com/samtools/samtools/releases/download/1.4.1/samtools-1.4.1.tar.bz2 -O samtools.tar.bz2
tar -xjvf samtools.tar.bz2
cd samtools-1.4.1/
./configure
make
make install
Required python 3 and Anaconda with following packages (skip if already installed)
conda install -c conda-forge pandas==0.23.4;
conda install -c conda-forge scikit-learn==0.20.0;
conda install -c conda-forge tensorflow==1.10.0;
python TissueID.py
[-fasta sample.fasta] \ file or path directory with one or more samples
[-fastq Sample.fastq] \ file or path directory with one or more samples
-out output.tsv \ output file including probabilities in tsv format
-model Model/ \ folder containing the training 50 training ENSEMBLE models
-pos pos_file.bed \ relevant positions based on E.coli K12
-ref ref/Ecoli_K12_ref.fasta \ reference Genome E.coli K12
[-t 4] \ Number of Cpus to use during alignment
See complete manual at the website: https://www.erasmusmc.nl/genetic_identification/resources/
Please send an email at d.montielgonzalez@erasmusmc.nl for any comment and if there is a problem getting the software up and running.