snmCT-seq Bioinformatics Pipeline

snmCT-seq is a technique to simultaneously profile the DNA methylome (mC) and transcriptome from a single nucleus. Individual nuclei (or cells) are sorted into 384-well plates where the snmCT-seq reaction adds cell barcodes and generates Illumina-compatible multi-modal libraries without any physical DNA/RNA separation.

What's this repo?

Scripts to go from plate-level .fastqs → cell-level alignments (.bam), quality control metrics, and analyzeable mC & RNA features.
For the methylome, the primary features are the methylation counts and coverage for each cytosine in the genome (.allc file), which can then be aggregated across different genomic intervals (e.g., 100kb-bins, genes; .mcds file).
For the transcriptome, the features are gene- and exon-based read counts (full-length transcript coverage, versus 3'/5'-UMI).

What isn't this repo?

Cell-level QC steps, which can be celltype- and study-specific. (Some suggestions on the Detailed Overview page and within past manuscripts.)
Downstream analysis of mC and RNA (e.g., feature selection, clustering, hypothesis testing). allcools and seurat/scanpy are good starting places for exporatory data analysis.

Getting Started

Clone this repo (git clone https://github.com/chooliu/snmCTseq_Pipeline.git) or download via the Releases page. Rename folder to an informative "project directory" name.
Install dependencies listed in Documentation/snmCTseq.yml. Installation and environment management via conda highly recommended.

module load anaconda3 # or otherwise activate conda
conda env create -f Documentation/snmCTseq.yml

Customize snmCT_parameters.env and scheduler submission scripts (Scripts/*.sub, especially A00* and A01*) via a text editor of your choice*, paying special note to:
(i) compatibility with your compute/scheduler infrastrcture and sequencing depth,
(ii) genome/reference organism,
(iii) job array ranges based on the number of 384-well plates profiled. The range is -t 1-Nplates when a job is submitted for each plate (e.g., plate → well demultiplexing), but -t 1-Nbatches for more intensive tasks submitting small sets of wells per job (e.g., alignment; 24 wells per batch by default, so Nbatch = Ncellstotal/24).

* Alternatively, if you can access your server via Juypter, run each .ipynb in the Notebooks folder sequentially for organized script editing & access to extra in-line comments (also viewable in this repo's Notebooks folder on Github).
Submit each submission scripts (.sub extension) in order: A00a, A00b, A00c, A01a, A01b, ... I usually qsub all A00* scripts at once, all A01* at once, etc. For convenience, the full list of submission commands is listed at Documentation/submission_helper.txt

Links

Repo Resources

Detailed Overview (rationale for steps, FAQs, common pitfalls)
Notebooks
Revision History
list of qsub commands

Related Pipelines

This pipeline originated from my experimental updates for processing snmCT-seq (and closely related methylation-only snmC-seq3) data using paired-end alignment and quantification. Its current construction is thus a set of scripts tailored to the UCLA Hoffman2 computing server for accessibility.

Related work to consider:

YAP (Yet Another Pipeline): supports snmCT-seq and additional related assays (e.g., mC, m3C, mCAT-seq). Snakemake. Developed by the Ecker Lab/Dr. Hanqing Liu (Salk Institute).
allcools: Also by Hanqing Liu and typically used by our group for mC downstream analysis. Helpful to review for .allc and .mcds descriptions.
WARP/CEMBA: Broad Institute, WDL-based snmC-seq (methylation-only) pipeline compiled as part of the BRAIN Initiative.

Technology References

Our library structure is described in the Detailed Overview and a seqspec.
Flagship assay paper (where "mCAT" = mCT plus additional NOME-seq for chromatin accessibility profiling, but also is our prefered citation for snmCT-seq)
- snmCAT-seq: Luo, C. et al. Single nucleus multi-omics identifies human cortical cell regulatory genome diversity. Cell Genomics 2, 100107 (2022).
Understanding the underlying mC and RNA reactions:
- snmC-seq2: Luo, C. et al. Robust single-cell DNA methylome profiling with snmC-seq2. Nat. Commun. 9, 7–12 (2018).
- Smart-seq2: Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 9, 171–181 (2014). [Note: we are now on snmC-seq3 but only a wet lab protocol citation exists.]

Acknowledgements

Dr. Chongyuan Luo (original workflow, demultiplexing/filtering scripts)
Dr. Hanqing Liu (@lhqing) for updating allcools for paired-end processing
Luo Lab collaborators/members for pipeline testing and feedback, namely Dr. Katie Eyring (Geschwind Lab), Nasser Elhajjaoui, Kevin Abuhanna, and Terence Li.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Documentation		Documentation
Notebooks		Notebooks
Scripts		Scripts
.gitignore		.gitignore
README.md		README.md
snmCT_parameters.env		snmCT_parameters.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

snmCT-seq Bioinformatics Pipeline

Getting Started

Links

Repo Resources

Related Pipelines

Technology References

Acknowledgements

About

Releases 4

Languages

chooliu/snmCTseq_Pipeline

Folders and files

Latest commit

History

Repository files navigation

snmCT-seq Bioinformatics Pipeline

Getting Started

Links

Repo Resources

Related Pipelines

Technology References

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases 4

Languages