This repository provides a simple setting to train a Masked Autoencoder (MAE) on single-cell genomics data with a random masking strategy. The provided code is designed to work with a smaller scale adata object that fits into memory.
data.py
: Module for loading and preprocessing single-cell genomics data.Masking.ipynb
: Jupyter notebook demonstrating the random masking strategy.models.py
: Contains the implementation of the Masked Autoencoder.train.py
: Script for training the Masked Autoencoder model.train.sh
: Bash script for executing the training process.
- Python 3.10
- Dependencies listed in requirements.txt
- Clone the repository:
git clone https://github.com/theislab/sc_mae.git
- Install the required dependencies:
pip install -r requirements.txt
- Prepare the data:
- Download the sample data from the publication mentioned in the citation section or use your own processed adata object.
- Execute the training script:
bash train.sh
To apply this code, follow these steps:
- Download Sample Data: You can download the adata object from the publication mentioned in the citation section or use your own processed h5ad object.
- Prepare Data: If you are using your own data, make sure it is preprocessed and compatible with the provided code. Otherwise, follow the data loading and preprocessing steps in data.py.
- Train the Model: Execute the training script train.py by running bash train.sh. Adjust the hyperparameters and configurations as needed in the script.
This repository is a part of a larger project and serves as a simplified demo. If you use this code in your research, please cite the following paper:
Delineating the Effective Use of Self-Supervised Learning in Single-Cell Genomics
If you use the sample data in your research, please cite the following paper:
COMBATdb: a database for the COVID-19 Multi-Omics Blood ATlas
- The sample data used in this project is sourced from the COMBATdb.
Contributions to improve this codebase are welcome. Please fork the repository and submit a pull request with your changes.
This project is licensed under the MIT License - see MIT License.
Please refer to the main repository for more detailed information and a more elaborate analysis.
sc_mae was written by Till Richter.