Bioformers-BERT

This is the official codebase for the BERT-backboned model in BIOFORMERS: A SCALABLE FRAMEWORK FOR EXPLORING BIOSTATES USING TRANSFORMERS. The model is trained for the gene expression modeling task using the PBMC 4k + 8k datasets and the Adamson Perturbation dataset.

Installation

We recommend using venv and pip to install the required packages for Bioformers-BERT:

Create a Python >= 3.9 virtual environment and activate;
Clone the repository and cd inside;
Install required packages through pip3 install -r requirements.txt

Usage

Before running the scripts, first adjust settings.json to determine the specs for execution:

To use the Adamson Perturbation dataset, set "dataset_name"="adamson" and log_transform=false.
To use the PBMC datasets, set "dataset_name"="PBMC" and 'log_transform=true.
You may adjust other settings such as normalization, tokenization binning, nonzero gene ratio in the mask, model dimensions, and training details through editing other variables. All results reported in the paper on the BERT-backboned model are reproducable through these settings.

Then, run the following commands for preprocessing, training, and evaluation.

python3 data-processing.py
python3 train-random-mask.py
python3 eval-random-mask.py /path/to/saved/checkpoint

Acknowledgements

We would like to express our gratitude to the developers these open-source projects which we utilized:

Citation

@article {Amara-Belgadi2023.11.29.569320,
	author = {Siham Amara-Belgadi and Orion Li and David Yu Zhang and Ashwin Gopinath},
	title = {BIOFORMERS: A SCALABLE FRAMEWORK FOR EXPLORING BIOSTATES USING TRANSFORMERS},
	year = {2023},
	doi = {10.1101/2023.11.29.569320},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2023/12/01/2023.11.29.569320},
	eprint = {https://www.biorxiv.org/content/early/2023/12/01/2023.11.29.569320.full.pdf},
	journal = {bioRxiv}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data-processing.py		data-processing.py
eval-random-mask.py		eval-random-mask.py
requirements.txt		requirements.txt
settings.json		settings.json
train-random-mask.py		train-random-mask.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bioformers-BERT

Installation

Usage

Acknowledgements

Citation

About

Releases

Packages

Languages

License

BiostateAI/Bioformers-BERT

Folders and files

Latest commit

History

Repository files navigation

Bioformers-BERT

Installation

Usage

Acknowledgements

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages