Skip to content

Latest commit

 

History

History
38 lines (28 loc) · 1.11 KB

README.md

File metadata and controls

38 lines (28 loc) · 1.11 KB

deep-vaccine

Predict multi-epitope vaccine subunit candidates using NLP.

Data

Immune Epitope Database (IEDB)

Construct datasets with data/preprocess.py (notebook format used by mainstream editors).

Environment

Usage

To train, python train.py, or use LSTM.ipynb on Colab. Models are saved in ./runs.

Predict a list of sequences using a model saved at path_to_model as follows:

from api import Predictor

seqs = """
PVAGAAIAAPVAGQQGPQRR
IAADFVEDQEVCKNYTGTVVGFASMVA
ADGAYRFLSGTAAVLAAAETAEAKAAAAAE
GDNLKGIVVIKDRNIGVLGENGSHMPDRCN
""".split()

predictor = Predictor(path_to_model)
predictor.predict_proba(seqs)

An application on the spike protein of SARS-CoV-2 is in example.py.

Misc.

  • Models: (CNN+) LSTM/GRU, Transformer
  • Different tokenizers and pooling
  • Visualize models, data, training: tensorboard --logdir=runs

If you find this helpful, please consider citing (bib).