MLSB2021

This project requires ProteinGNN to parse pdb to PyG compactible format. Please follow the installation process there.

To build the datasets, place all AlphaFold2 structures under data/alphafold2/your_dataset, fasta under data/fasta and experiment csv under data/csv.

python build_dataset.py --embedding esm --radii 6 --dataset your_dataset --n_processes N_PROCESSES

To train sequence-only and geometric models and visualize their performances,

bash batch_train.sh
python compare_supervised.py --rootdir esm-6

To further compare with unsupervised predictions, place pssm files under data/pssm and run

python ESM.py --stage preprocess --esm_install_path ESM_INSTALL_DIR
bash benchmark_esm.sh
python ESM.py --stage postprocess
python compare_unsupervised.py

For embedding locality analysis,

python locality.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
imgs		imgs
.gitignore		.gitignore
ESM.py		ESM.py
LICENSE		LICENSE
README.md		README.md
baseline.py		baseline.py
batch_eval.py		batch_eval.py
batch_train.sh		batch_train.sh
build_dataset.py		build_dataset.py
compare_supervised.py		compare_supervised.py
compare_unsupervised.py		compare_unsupervised.py
locality.py		locality.py
prototyping.py		prototyping.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLSB2021

About

Releases

Packages

Languages

License

SimonKitSangChu/MLSB2021

Folders and files

Latest commit

History

Repository files navigation

MLSB2021

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages