-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path0-Seurat_PBMC3k_tutorial_TENxPBMCData.Rmd
223 lines (178 loc) · 7.41 KB
/
0-Seurat_PBMC3k_tutorial_TENxPBMCData.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
---
title: "Seurat PBMC 3k tutorial using TENxPBMCData"
author: "Kevin Rue-Albrecht"
date: "`r BiocStyle::doc_date()`"
editor_options:
chunk_output_type: inline
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo=TRUE)
knitr::opts_chunk$set(cache=TRUE)
stopifnot({
require(BiocStyle)
})
```
# Overview
In this example, we use count data for 2,700 peripheral blood mononuclear cells (PBMC) obtained using the [10X Genomics](www.10xgenomics.com) platform, and process it following the [Guided Clustering Tutorial](https://satijalab.org/seurat/pbmc3k_tutorial_1_4.html) of the `r CRANpkg("Seurat")` package.
# Getting the data
First, fetch the data as a `r Biocpkg("SingleCellExperiment")` object using the `r Biocpkg("TENxPBMCData")` package.
The first time that the following code chunk is run, users should expect it to take additional time as it downloads data from the web and caches it on their local machine; subsequent evaluations of the same code chunk should only take a few seconds as the data set is then loaded from the local cache.
```{r, message=FALSE}
library(TENxPBMCData)
tenx_pbmc3k <- TENxPBMCData(dataset="pbmc3k")
colnames(tenx_pbmc3k) <- paste0("Cell", seq_len(ncol(tenx_pbmc3k)))
tenx_pbmc3k
```
# Preparing the data
Next, prepare a sparse matrix that emulates the first section of the [Guided Clustering Tutorial](https://satijalab.org/seurat/pbmc3k_tutorial_1_4.html).
```{r, message=FALSE}
library(Matrix)
pbmc.data <- as(counts(tenx_pbmc3k), "Matrix")
pbmc.data <- as(pbmc.data, "dgTMatrix")
```
# Seurat - Guided Clustering Tutorial
From here on, follow the [Guided Clustering Tutorial](https://satijalab.org/seurat/pbmc3k_tutorial_1_4.html) to the letter (code obtained on 2018-11-24).
```{r, message=FALSE}
library(Seurat)
library(dplyr)
# Initialize the Seurat object with the raw (non-normalized data). Keep all
# genes expressed in >= 3 cells (~0.1% of the data). Keep all cells with at
# least 200 detected genes
pbmc <- CreateSeuratObject(raw.data=pbmc.data, min.cells=3, min.genes=200, project="10X_PBMC")
pbmc
```
```{r}
# The number of genes and UMIs (nGene and nUMI) are automatically calculated
# for every object by Seurat. For non-UMI data, nUMI represents the sum of
# the non-normalized values within a cell We calculate the percentage of
# mitochondrial genes here and store it in percent.mito using AddMetaData.
# We use object@raw.data since this represents non-transformed and
# non-log-normalized counts The % of UMI mapping to MT-genes is a common
# scRNA-seq QC metric.
mito.genes <- subset(rowData(tenx_pbmc3k), grepl("^MT-", Symbol_TENx), "ENSEMBL_ID", drop=TRUE)
mito.genes <- intersect(mito.genes, rownames(pbmc@raw.data))
percent.mito <- Matrix::colSums(pbmc@raw.data[mito.genes, ])/Matrix::colSums(pbmc@raw.data)
# AddMetaData adds columns to object@meta.data, and is a great place to
# stash QC stats
pbmc <- AddMetaData(object=pbmc, metadata=percent.mito, col.name="percent.mito")
VlnPlot(object=pbmc, features.plot=c("nGene", "nUMI", "percent.mito"), nCol=3)
```
```{r}
# GenePlot is typically used to visualize gene-gene relationships, but can
# be used for anything calculated by the object, i.e. columns in
# object@meta.data, PC scores etc. Since there is a rare subset of cells
# with an outlier level of high mitochondrial percentage and also low UMI
# content, we filter these as well
par(mfrow=c(1, 2))
GenePlot(object=pbmc, gene1="nUMI", gene2="percent.mito")
GenePlot(object=pbmc, gene1="nUMI", gene2="nGene")
```
```{r}
# We filter out cells that have unique gene counts over 2,500 or less than
# 200 Note that low.thresholds and high.thresholds are used to define a
# 'gate'. -Inf and Inf should be used if you don't want a lower or upper
# threshold.
pbmc <- FilterCells(object=pbmc, subset.names=c("nGene", "percent.mito"),
low.thresholds=c(200, -Inf), high.thresholds=c(2500, 0.05))
```
```{r}
pbmc <- NormalizeData(object=pbmc, normalization.method="LogNormalize",
scale.factor=10000)
```
```{r}
pbmc <- FindVariableGenes(object=pbmc, mean.function=ExpMean, dispersion.function=LogVMR,
x.low.cutoff=0.0125, x.high.cutoff=3, y.cutoff=0.5)
```
```{r}
length(x=pbmc@var.genes)
```
```{r}
pbmc <- ScaleData(object=pbmc, vars.to.regress=c("nUMI", "percent.mito"))
```
```{r}
pbmc <- RunPCA(object=pbmc, pc.genes=pbmc@var.genes, do.print=TRUE, pcs.print=1:5,
genes.print=5)
```
```{r}
# Examine and visualize PCA results a few different ways
PrintPCA(object=pbmc, pcs.print=1:5, genes.print=5, use.full=FALSE)
```
```{r}
VizPCA(object=pbmc, pcs.use=1:2)
```
```{r}
PCAPlot(object=pbmc, dim.1=1, dim.2=2)
```
```{r}
# ProjectPCA scores each gene in the dataset (including genes not included
# in the PCA) based on their correlation with the calculated components.
# Though we don't use this further here, it can be used to identify markers
# that are strongly correlated with cellular heterogeneity, but may not have
# passed through variable gene selection. The results of the projected PCA
# can be explored by setting use.full=T in the functions above
pbmc <- ProjectPCA(object=pbmc, do.print=FALSE)
```
```{r}
PCHeatmap(object=pbmc, pc.use=1, cells.use=500, do.balanced=TRUE, label.columns=FALSE)
```
```{r}
PCHeatmap(object=pbmc, pc.use=1:12, cells.use=500, do.balanced=TRUE,
label.columns=FALSE, use.full=FALSE)
```
Small deviation from the tutorial. Skip the lengthy JackStraw computation.
```{r}
# NOTE: This process can take a long time for big datasets, comment out for
# expediency. More approximate techniques such as those implemented in
# PCElbowPlot() can be used to reduce computation time
if (FALSE) {
pbmc <- JackStraw(object=pbmc, num.replicate=100, display.progress=FALSE)
JackStrawPlot(object=pbmc, PCs=1:12)
}
```
```{r}
PCElbowPlot(object=pbmc)
```
```{r}
# save.SNN=T saves the SNN so that the clustering algorithm can be rerun
# using the same graph but with a different resolution value (see docs for
# full details)
pbmc <- FindClusters(object=pbmc, reduction.type="pca", dims.use=1:10,
resolution=0.6, print.output=0, save.SNN=TRUE)
```
```{r}
PrintFindClustersParams(object=pbmc)
```
```{r}
pbmc <- RunTSNE(object=pbmc, dims.use=1:10, do.fast=TRUE)
```
```{r}
# note that you can set do.label=T to help label individual clusters
TSNEPlot(object=pbmc)
```
Save the `seurat` object
```{r}
saveRDS(pbmc, file="pbmc3k_tutorial.rds")
```
Save the original `SingleCellExperiment` object, after:
- removing the cells excluded by quality metrics during the Seurat workflow
- adding the cluster assignments
- adding the PCA and t-SNE dimensionality reduction results
```{r}
tenx_pbmc3k <- tenx_pbmc3k[, pbmc@cell.names]
colData(tenx_pbmc3k)[["seurat.ident"]] <- as.factor(pbmc@ident)
reducedDim(tenx_pbmc3k, "PCA") <- pbmc@dr$pca@cell.embeddings
reducedDim(tenx_pbmc3k, "TSNE") <- pbmc@dr$tsne@cell.embeddings
colnames(tenx_pbmc3k) <- NULL # remove the dummy cell names before exporting
saveRDS(tenx_pbmc3k, file="pbmc3k_tutorial.sce.rds")
```
Export the cluster identity vector.
It will be included in the `r Githubpkg("kevinrue/hancock")` package, and mapped using the dummy cell names defined at the start of this notebook.
```{r}
ident <- pbmc@ident
names(ident) <- pbmc@cell.names
saveRDS(ident, "pbmc3k.ident.rds")
```
# See also
- [Applying Gene Signatures to the Seurat PBMC 3k Tutorial](1-proportion_signature.html)
- [Learning Gene Signatures from the Seurat PBMC 3k Tutorial](2-learn-signatures.html)
<button onclick="topFunction()" id="myBtn" title="Go to top">Back to top</button>