Skip to content

Machine learning models for 16S rRNA sequence classification

Notifications You must be signed in to change notification settings

Lab-Vankerschaver/16S-ML-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

16S-ML-models

Machine learning models for 16S rRNA sequence classification

This repository contains the code and comparative analyses of 5 machine learning models on different classification tasks and using various preproccessing methods. A list of models used for bacterial taxonomy classification with the curated 16S rRNA gene is as follows:

  • Ribosomal Database Project (RDP) Classifier with k-mer frequency classification

    This model was developed by Wang, Q. et al (2007). Access the github repository and the paper

  • Convolutional Neural Networks (CNN) with k-mer frequency classification

    This model is based on an architecture developed by Fiannaca, A. et al (2018). Access the github repository and the paper

  • Bilateral Long-Short Term Memory NN (BiLSTM) with one-hot-encoded sequence classification

    This model is based on an architecture developed by Philipp Münch. Access the github repository

  • Combined Convolutional BiLSTM (ConvBiLSTM) with one-hot-encoded sequence classification

    This model is based on an architecture developed by Desai, P. et al (2020). Access the paper

  • Attention-based ConvBiLSTM (Read2Pheno) with one-hot-encoded sequence classification

    This model is based on an architecture developed by Zhao, Z. et al (2021). Access the github repository and the paper

These models have been combined in the jupyter notebook file (models_notebook.ipynb). This notebook also contains the scripts required for preprocessing the data and labels, compiling and running the models, and saving and visualising the results.

The seperate data-preprocessing and model-training scripts can be used instead of the full jupyter file when the memory requirements are too high for the user's system.

About

Machine learning models for 16S rRNA sequence classification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published