Skip to content

scRNA-seq normalization method, based on the Simple Good-Turing estimator

License

Notifications You must be signed in to change notification settings

Martin-Fahrenberger/GTestimate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GTestimate

GTestimate is a scRNA-seq normalization method. In contrast to other methods it uses the Simple Good-Turing estimator for the per cell relative gene expression estimation. Thereby GTestimate can account for the unobserved genes and avoid overestimation of the observed genes. At default settings it serves as a drop-in replacement for Seurat's NormalizeData().

Preprint

The accompanying preprint is now available on bioRxiv https://doi.org/10.1101/2024.07.02.601501

Installation

You can install the development version of GTestimate like so:

if (!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")
if (!requireNamespace("sparseMatrixStats", quietly = TRUE))
  devtools::install_github("const-ae/sparseMatrixStats")

devtools::install_github("Martin-Fahrenberger/GTestimate")

Seurat Example

This is a basic example of how to use GTestimate to normalize scRNA-seq data in a Seurat workflow.

library(GTestimate)
library(Seurat)
data('pbmc_small')

pbmc_small <- GTestimate(pbmc_small) # Instead of NormalizeData(pbmc_small)
pbmc_small <- FindVariableFeatures(pbmc_small)
pbmc_small <- ScaleData(pbmc_small)
pbmc_small <- RunPCA(pbmc_small)
# and so on

SingleCellExperiment Example

This is a basic example of how to use GTestimate to normalize scRNA-seq data in a SingleCellExperiment workflow using size-factors.

library(GTestimate)
library(Seurat)
library(scran)
data('pbmc_small')

pbmc_sce <- as.SingleCellExperiment(pbmc_small)

pbmc_sce <- computeSumFactors(pbmc_sce)
pbmc_sce <- GTestimate(pbmc_sce, size.factors = sizeFactors(pbmc_sce)) # Instead of logNormCounts(sce)
# and so on

Credit

The core implementation of the Simple Good-Turing estimator in C++ has been adapted from Aaron Lun's implementation for the edgeR R-package. His implementation was in turn based on Geoffrey Sampson's C code acessible at https://www.grsampson.net/D_SGT.c

About

scRNA-seq normalization method, based on the Simple Good-Turing estimator

Topics

Resources

License

Stars

Watchers

Forks