GitHub - abhinadduri/MASPR: Modeling A-Domain Specificity using Protein Language Models

MASPR

Code for Modeling A-Domain Specificity using Protein Language Models paper.

This repository is under active development. Some major TODOs include allowing users to generate their own training data for new A-domains, batch processing at inference time.

Installing Dependencies

To use MASPR, you need to have conda installed. Once you have conda installed, you will need the following packages:

conda create --name maspr 
conda activate maspr
conda install pip
pip install -r requirements.txt

Benchmarking a MASPR model

To benchmark a MASPR model, you will first need to download the ESM embeddings for the training data (or generate them yourself). You can download these embeddings here.

To reproduce the numbers in the paper:

python train_maspr.py --task ttsplit

To reproduce the generalization benchmark (train on bacteria and test on fungi):

python train_maspr.py --task bacfung

To reproduce the zero-shot learning benchmark (leave-one-substrate-out cross-validation):

python train_maspr.py --task substrate

To train a MASPR model using all the data:

python train_maspr.py --task train --model_path <MODEL_PATH>

Making predictions with MASPR

To predict the specificity for all A-domains in a given gene sequence:

python process_adomain.py -i <GENE_SEQUENCE>

These are typically obtained from a source like MiBiG by clicking on a gene and clicking "Copy AA Sequence".

Zero-shot prediction with MASPR

MASPR can consider novel substrates during inference even if they were not in its training data. To enable this feature, add your desired substrates to the sub_to_smiles dictionary in substrate_smiles.py.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
aden_predict		aden_predict
adomain_context_training_data.tsv		adomain_context_training_data.tsv
maspr_model.pt		maspr_model.pt
maspr_server.py		maspr_server.py
metrics.py		metrics.py
optimal_fingerprint.py		optimal_fingerprint.py
osx_aden_predict		osx_aden_predict
process_adomain.py		process_adomain.py
requirements.txt		requirements.txt
substrate_smiles.py		substrate_smiles.py
train_maspr.py		train_maspr.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MASPR

Installing Dependencies

Benchmarking a MASPR model

Making predictions with MASPR

Zero-shot prediction with MASPR

About

Releases

Packages

Languages

abhinadduri/MASPR

Folders and files

Latest commit

History

Repository files navigation

MASPR

Installing Dependencies

Benchmarking a MASPR model

Making predictions with MASPR

Zero-shot prediction with MASPR

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages