BARNACLE

RNA Contact prediction with deep learning

Running scripts

Scripts for running different methods are in the scripts folder
- NOTE: these scripts will likely take a very long time on a desktop computer without a datacenter grade GPU.
- train.py runs the self-supervised pre-training. On four A100 GPUs it runs about 48 hours.
- train_downstream.py runs the downstream regression training. Runs less than one hour.
- xgboost_downstream.py runs the downstream xgboost training. Runs less than 15 minutes.
Source code is in selbstaufsicht

Installation guide

Setup venv: python -m venv venv_name
clone this repo: git clone .....
change directory into the newly cloned repository
Install dependencies from this repo: pip install -e .

Installation shouldnt take more than 5 minutes (dependent on internet connection speed). Please note, that a GPU is highly recommended for running the scripts within. For more information, see the hardware requirements below.

Software dependencies

reproduced from setup.py

python >= 3.9
protobuf < 4.21.0  # higher versions lead to errors with pytorch-lightning (https://github.com/PyTorchLightning/pytorch-lightning/issues/13159)
torch >= 1.10, <= 1.13.1  # range supported by pytorch-lightning 1.8.x
torchvision
tensorboard
tqdm
biopython
axial_attention #get rid of this and replace with optimized custom stuff
pytorch-lightning == 1.8.6  # lightning-bolts breaks for higher versions
lightning-bolts
cvxpy
seaborn
matplotlib
scikit-learn
pandas
xgboost

Singularity Container

We provide a singularity container for those who prefer to use that here.

Hardware requiremenst

it would be best to use multiple GPUs for training. if using one GPU alone, it may take a long time
It is further recommended to have a GPU with at 40 GB of VRAM or more to avoid OOM errors when training
Running the trained models without validation requires less VRAM

Workflows

Training

Export a DATA_PATH environment variable, where the training and test data will be stored.

For self-supervised pre-training, use one of:

python train.py --dataset combined --batch-size 1 --num-workers 8 --num-epochs 10000 --learning-rate 0.00003 --learning-rate-warmup 400 --dropout 0.3 --subsampling-depth 50 --precision 16 --disable-emb-grad-freq-scale --disable-progress-bar --log-every 500 --num-gpus 4 --task-inpainting --num-blocks 10 --num-heads 12 --feature-dim-head 64 --inpainting-masking-p-static 0.0 --inpainting-masking-p-nonstatic 1.0 --inpainting-masking-p-unchanged 0.0 --log-run-name inpainting


python train.py --dataset combined --batch-size 1 --num-workers 8 --num-epochs 10000 --learning-rate 0.00003 --learning-rate-warmup 400 --dropout 0.3 --subsampling-depth 50 --precision 16 --disable-emb-grad-freq-scale --disable-progress-bar --log-every 500 --num-gpus 4 --task-inpainting --task-jigsaw --num-blocks 10 --num-heads 12 --feature-dim-head 64 --inpainting-masking-p-static 0.0 --inpainting-masking-p-nonstatic 1.0 --inpainting-masking-p-unchanged 0.0 --log-run-name jigsaw

python train.py --dataset combined --batch-size 1 --num-workers 8 --num-epochs 10000 --learning-rate 0.00003 --learning-rate-warmup 400 --dropout 0.3 --subsampling-depth 50 --precision 16 --disable-emb-grad-freq-scale --disable-progress-bar --log-every 500 --num-gpus 4 --task-inpainting --task-contrastive --num-blocks 10 --num-heads 12 --feature-dim-head 64 --inpainting-masking-p-static 0.0 --inpainting-masking-p-nonstatic 1.0 --inpainting-masking-p-unchanged 0.0 --log-run-name contrastive_100 --contrastive-temperature 100.0

python train.py --dataset combined --batch-size 1 --num-workers 8 --num-epochs 10000 --learning-rate 0.00003 --learning-rate-warmup 400 --dropout 0.3 --subsampling-depth 50 --precision 16 --disable-emb-grad-freq-scale --disable-progress-bar --log-every 500 --num-gpus 4 --task-inpainting --task-jigsaw-boot --boot-per-token --num-blocks 10 --num-heads 12 --feature-dim-head 64 --inpainting-masking-p-static 0.0 --inpainting-masking-p-nonstatic 1.0 --inpainting-masking-p-unchanged 0.0 --log-run-name bootstrap

For regression downstream training, use;

python train_downstream.py --loss-contact-weight 0.95 --loss focal --precision 16 --num-epochs 10000 --learning-rate-warmup 0 --dropout 0 --cv-num-folds 1 --validation-ratio 0.2 --subsampling-mode diversity --checkpoint <backbone_checkpoint_path> --log-run-name <run name> --disable-progress-bar (--freeze-backbone)

For XGBoost downstream training, use:

python xgboost_downstream.py --subsampling-mode diversity --num-round 300 --learning-rate 1.0 --gamma 0.7 --max-depth 16 --min-child-weight 1 --colsample-bytree 0.7 --colsample-bylevel 0.7 --xgb-subsampling-rate 0.9 --xgb-subsampling-mode g    radient_based --dart-dropout 0.1 --top-l-prec-coeff 1.0 --cv-num-folds 1  --checkpoint <checkpoint_path> --log-run-name <run_name> --disable-progress-bar --monitor-metric <['toplprec', 'toplprecpos', 'f1', 'matthews']>

The checkpoint path given is the one to the pre-trained model checkpoint or to the one finetuned without --freeze-backbone

Testing

For regression:

python test_downstream.py --subsampling-mode diversity --checkpoint <checkpoint_path> --disable-progress-bar

For XGBoost:

python test_xgboost_downstream.py --subsampling-mode diversity --num-k 1 --min-k 1  --checkpoint <checkpoint_path> --xgboost-checkpoint <xgboost_checkpoint>

The checkpoint path given is the on to the final downstream model checkpoint. XGBoost requires one torch model checkpoint and one XGBoost model checkpoint.

Name		Name	Last commit message	Last commit date
Latest commit History 857 Commits
.github/workflows		.github/workflows
scripts		scripts
selbstaufsicht		selbstaufsicht
tests		tests
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BARNACLE

Running scripts

Installation guide

Software dependencies

Singularity Container

Hardware requiremenst

Workflows

Training

Testing

About

Releases

Packages

Contributors 6

Languages

License

KIT-MBS/selbstaufsicht

Folders and files

Latest commit

History

Repository files navigation

BARNACLE

Running scripts

Installation guide

Software dependencies

Singularity Container

Hardware requiremenst

Workflows

Training

Testing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages