CellPLM

This is the official codebase for CellPLM: Pre-training of Cell Language Model Beyond Single Cells. The paper has been accepted by ICLR 2024 conference.

CellPLM is the first single-Cell Pre-trained Language Model that encodes cell-cell relations and it consistently outperforms existing pre-trained and non-pre-trained models in diverse downstream tasks, with 100x higher inference speed compared to existing pre-trained models. You can also find a brilliant blog about the idea of CellPLM here.

Installation

We recommend PyPI for quick installation. We recommend using python 3.9 and cuda>=11.7 but they are adjustable.

Quick Installation with PyPI

Make sure gpu version of pytorch (>=1.13.0) has been installed before installing CellPLM.

pip install cellplm

Full Installation (recommended for HPC users and developers)

conda create -n cellplm python=3.9 -y && conda activate cellplm
conda install cudatoolkit=11.7 -c pytorch -c nvidia
pip install -r requirements.txt

The full installation will install the same environment as we used during development. This includes rapids used to accelerate evaluation.

Tutorials

We offer several notebooks for various downstream tasks as introductory tutorials. Our latest studies demonstrate CellPLM is competitive on cell-type annotation tasks compared to other SOTA methods and pretrained models. The result table is shown below:

Method	PBMC12K	Pancreas	HLCA	Immune	Brain	Liver
SingleCellNet	0.845+-0.0064	0.644+-0.0006	0.811+-0.0046	0.775+-0.0009	0.877+-0.0033	0.872+-0.0023
ACTINN	0.614+-0.0709	0.528+-0.0926	0.218+-0.0440	0.236+-0.0300	0.695+-0.0624	0.614+-0.0349
scANVI	0.930+-0.0148	0.963+-0.0083	0.708+-0.0183	0.851+-0.0133	0.933+-0.0010	0.908+-0.0144
CellTypist	0.883+-0.0055	0.882+-0.0011	0.776+-0.0079	0.822+-0.0020	0.901+-0.0031	0.764+-0.0132
scDiff	0.967+-0.0042	0.968+-0.0143	0.893+-0.0070	0.844+-0.0076	0.947+-0.0074	0.844+-0.0042
scGPT	0.963	0.954	0.863	0.907	0.950	0.864
Geneformer	0.979	-	0.833	0.856	0.934	0.871
CellPLM	0.975	0.983	0.929	0.902	0.967	0.913

(The evaluation follows the setting in scDiff paper)

Pretrained CellPLM Model Checkpoints

The checkpoint can be acquired from our dropbox. We might update our checkpoints from time to time.

[10/10/2023] The latest version is 20230926_85M.

Citation

@article{wen2023cellplm,
  title={CellPLM: Pre-training of Cell Language Model Beyond Single Cells},
  author={Wen, Hongzhi and Tang, Wenzhuo and Dai, Xinnan and Ding, Jiayuan and Jin, Wei and Xie, Yuying and Tang, Jiliang},
  journal={bioRxiv},
  pages={2023--10},
  year={2023},
  publisher={Cold Spring Harbor Laboratory}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
CellPLM		CellPLM
ckpt		ckpt
data		data
tutorials		tutorials
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CellPLM

Installation

Quick Installation with PyPI

Full Installation (recommended for HPC users and developers)

Tutorials

Pretrained CellPLM Model Checkpoints

Citation

About

Releases

Packages

Languages

License

OmicsML/CellPLM

Folders and files

Latest commit

History

Repository files navigation

CellPLM

Installation

Quick Installation with PyPI

Full Installation (recommended for HPC users and developers)

Tutorials

Pretrained CellPLM Model Checkpoints

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages