GET: a foundation model of transcription across human cell types
Checkout scripts/setup_env.sh to setup the environment.
bash scripts/setup_env.sh /path/to/project/root
We provide a tutorial on how to prepare the data, finetune the model, and do interpretation analysis here.
To run a basic training job in command line:
python get_model/debug/debug_run_region.py --config-name finetune_tutorial stage=fit
GET uses a transformer-based architecture with several key components:
- Motif Scanner
- ATAC Attention
- Region Embedding
- Transformer Encoder
- Task-specific heads (Expression, Hi-C, etc.)
For more details, check out this Schematic or Model Architecture.
To fine-tune a pre-trained model:
See Fine-tuning Tutorial for more information.
To evaluate a trained model:
python get_model/debug/debug_run_region.py --config-name finetune_tutorial stage=validate
GET uses Hydra for configuration management. Key configuration files:
- Base config:
get_model/config/config.py
- Model configs:
get_model/config/model/*.yaml
- Dataset configs:
get_model/config/dataset/*.yaml
See Configuration Guide for more details.
We welcome contributions! Please see our Contributing Guidelines for more information.
This project is licensed under the CC BY-NC 4.0 License.
If you use GET in your research, please cite our paper:
For questions or support, please open an issue or contact fuxialexander@gmail.com.