-
Notifications
You must be signed in to change notification settings - Fork 17
2. Input preparation
The input dataset for BASiCS must be stored as an SingleCellExperiment
object (see r Biocpkg("SingleCellExperiment")
package). To use BASiCS on an existing SingleCellExperiment object, please read the section below.
The newBASiCS_Data
function can be used to create the required SingleCellExperiment
object based on the following information:
-
Counts
: a matrix of raw expression counts with dimensions$q$ times$n$ . Within this matrix,$q_0$ rows must correspond to biological genes and$q-q_0$ rows must correspond to technical spike-in genes. Gene names must be stored asrownames(Counts)
. -
Tech
: a vector ofTRUE
/FALSE
elements with length$q$ . IfTech[i] = FALSE
the genei
is biological; otherwise the gene is spike-in. This vector must be specified in the same order of genes as in theCounts
matrix. -
SpikeInfo
: adata.frame
with$q-q_0$ rows. First column must contain the names associated to the spike-in genes (as inrownames(Counts)
). Second column must contain the input number of molecules for the spike-in genes (amount per cell). -
BatchInfo
(optional argument): vector of length$n$ to indicate batch structure in situations where cells have been processed using multiple batches.
For example, the following code simulates a dataset with 50 genes (40 biological and 10 spike-in) and 40 cells.
set.seed(1)
Counts = matrix(rpois(50*40, 2), ncol = 40)
rownames(Counts) <- c(paste0("Gene", 1:40), paste0("Spike", 1:10))
Tech = c(rep(FALSE,40),rep(TRUE,10))
set.seed(2)
SpikeInput = rgamma(10,1,1)
SpikeInfo <- data.frame("SpikeID" = paste0("Spike", 1:10), "SpikeInput" = SpikeInput)
# No batch effect
DataExample = newBASiCS_Data(Counts, Tech, SpikeInfo)
# With batch effect
DataExample = newBASiCS_Data(Counts, Tech, SpikeInfo,
BatchInfo = rep(c(1,2), each = 20))
Single-cell RNA sequencing data typically require filtering (quality control) before performing the analysis. This is in order to remove cells and/or transcripts with very low expression counts. The function BASiCS_Filter
can be used to perform this filtering. For examples, refer to help(BASiCS_Filter)
. Additional tools for this purpose can also be found in the scater Bioconductor package
NOTE: Input number of molecules for spike-in should be calculated using
experimental information. For each spike-in gene
where,
-
$C_i$ is the concentration of the spike$i$ in the ERCC mix (see here) -
$10^{-18}$ is to convert att to mol -
$6.022 \times 10^{23}$ is the Avogadro number (mol$\rightarrow$ molecule) -
$V$ is the volume added into each chamber (in nL) -
$D$ is a dilution factor
For example, for the 96-well plate in the Fluidigm C1 system,
To convert an existing SingleCellExperiment
object (Data
) into one that can be used within BASiCS, meta-information must be stored in the object.
-
SingleCellExperiment::isSpike(Data, SpikeType) <- Tech
: the logical vector indicating biological/technical genes (see above) must be stored in theisSpike
slot. SpikeType is a string containing the name of the spike-ins used (e.g. "ERCC"). Note: IfData
contains more that one type of spike-ins (length(SingleCellExperiment::spikeNames(Data)) > 1
), unused spike-in types should be removed (seehelp(isSpike, package = SingleCellExperiment)
). -
colData(Data)$BatchInfo <- BatchInfo
: the vector indicating the batch structure (see above) must be stored in thecolData
slot. -
metadata(Data)
: theSpikeInfo
object is stored in themetadata
slot of theSummarizedExperiment
object:metadata(Data)=list(SpikeInput = SpikeInfo[,2])
. Once the additional information is included, the object can be used within BASiCS.
In many cases (e.g. droplet-based scRNA-Seq data), spike-in genes are not present. To run BASiCS on a SingleCellExperiment
object, one needs to solely store the BatchInfo metadata in the object. Here is an example on how to create a SingleCellExperiment
object which does not contain spike-in genes:
set.seed(1)
Counts <- matrix(rpois(50*40, 2), ncol = 40)
rownames(Counts) <- c(paste0("Gene", 1:50))
# Create SingleCellExperiment object containing batch information
library(SingleCellExperiment)
DataExample <- SingleCellExperiment(assays = list(counts = Counts),
colData = data.frame(BatchInfo = rep(c(1,2), each = 20)))