This repository has been archived by the owner on Aug 12, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 9
/
12-enrichplot.Rmd
369 lines (227 loc) · 14.1 KB
/
12-enrichplot.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
# Visualization of Functional Enrichment Result {#chapter12}
The `r Biocpkg("enrichplot")` package implements several visualization methods
to help interpreting enrichment results. It supports visualizing enrichment
results obtained from `r Biocpkg("DOSE")` [@yu_dose_2015],
`r Biocpkg("clusterProfiler")` [@yu2012],
`r Biocpkg("ReactomePA")` [@yu_reactomepa_2016] and `r Biocpkg("meshes")`. Both
over representation analysis (ORA) and gene set enrichment analysis (GSEA) are
supported.
Many of these visualization methods were first implemented in `r Biocpkg("DOSE")` and rewrote from scratch using `ggplot2`. If you want to use old methods^[<https://www.biostars.org/p/375555>], you can use the [doseplot](https://github.com/GuangchuangYu/doseplot) package.
## Bar Plot
Bar plot is the most widely used method to visualize enriched terms. It depicts
the enrichment scores (*e.g.* p values) and gene count or ratio as bar height
and color.
```{r}
library(DOSE)
data(geneList)
de <- names(geneList)[abs(geneList) > 2]
edo <- enrichDGN(de)
```
(ref:Barplotscap) Bar plot of enriched terms.
(ref:Barplotcap) **Bar plot of enriched terms.**
```{r Barplot, fig.height=5, fig.width=8, fig.cap="(ref:Barplotcap)", fig.scap="(ref:Barplotscap)"}
library(enrichplot)
barplot(edo, showCategory=20)
```
## Dot plot
Dot plot is similar to bar plot with the capability to encode another score as
dot size.
(ref:Dotplotscap) Dot plot of enriched terms.
(ref:Dotplotcap) **Dot plot of enriched terms.**
```{r Dotplotcap, fig.width=12, fig.height=8, fig.cap="(ref:Dotplotcap)", fig.scap="(ref:Dotplotscap)"}
edo2 <- gseNCG(geneList, nPerm=10000)
p1 <- dotplot(edo, showCategory=30) + ggtitle("dotplot for ORA")
p2 <- dotplot(edo2, showCategory=30) + ggtitle("dotplot for GSEA")
plot_grid(p1, p2, ncol=2)
```
## Gene-Concept Network
Both the `barplot` and `dotplot` only displayed most significant enriched terms,
while users may want to know which genes are involved in these significant
terms.
In order to consider the potentially biological complexities in which a gene may belong to multiple annotation categories and provide information of numeric changes if available, we developed `cnetplot` function to extract the complex association.
The `cnetplot` depicts the linkages of genes and biological concepts (*e.g.* GO terms or KEGG pathways) as a network. GSEA result is also supported
with only core enriched genes displayed.
(ref:Networkplotscap) Network plot of enriched terms.
(ref:Networkplotcap) **Network plot of enriched terms.**
```{r Networkplot, fig.width=20, fig.height=8, fig.cap="(ref:Networkplotcap)", fig.scap="(ref:Networkplotscap)"}
## convert gene ID to Symbol
edox <- setReadable(edo, 'org.Hs.eg.db', 'ENTREZID')
p1 <- cnetplot(edox, foldChange=geneList)
## categorySize can be scaled by 'pvalue' or 'geneNum'
p2 <- cnetplot(edox, categorySize="pvalue", foldChange=geneList)
p3 <- cnetplot(edox, foldChange=geneList, circular = TRUE, colorEdge = TRUE)
cowplot::plot_grid(p1, p2, p3, ncol=3, labels=LETTERS[1:3], rel_widths=c(.8, .8, 1.2))
```
If you would like label subset of the nodes, you can use the `node_label` parameter, which supports 4 possible selections (i.e. "category", "gene", "all" and "none"), as demonstrated in Figure \@ref(fig:cnetNodeLabel).
(ref:cnetNodeLabelscap) Labelling nodes by selected subset.
(ref:cnetNodeLabelcap) **Labelling nodes by selected subset.** gene category (A), gene name (B), both gene category and gene name (C, default) and not to label at all (D).
```{r cnetNodeLabel, fig.height=12, fig.width=16, fig.cap="(ref:cnetNodeLabelcap)", fig.scap="(ref:cnetNodeLabelscap)"}
p1 <- cnetplot(edox, node_label="category")
p2 <- cnetplot(edox, node_label="gene")
p3 <- cnetplot(edox, node_label="all")
p4 <- cnetplot(edox, node_label="none")
cowplot::plot_grid(p1, p2, p3, p4, ncol=2, labels=LETTERS[1:4])
```
## Heatmap-like functional classification
The `heatplot` is similar to `cnetplot`, while displaying the relationships as a
heatmap. The gene-concept network may become too complicated if user want to
show a large number significant terms. The `heatplot` can simplify the result
and more easy to identify expression patterns.
(ref:Heatplotscap) Heatmap plot of enriched terms.
(ref:Heatplotcap) **Heatmap plot of enriched terms.** default (A), `foldChange=geneList` (B)
```{r Heatplot, fig.width=16, fig.height=8, fig.cap="(ref:Heatplotcap)", fig.scap="(ref:Heatplotscap)"}
p1 <- heatplot(edox)
p2 <- heatplot(edox, foldChange=geneList)
cowplot::plot_grid(p1, p2, ncol=1, labels=LETTERS[1:2])
```
## Enrichment Map
Enrichment map organizes enriched terms into a network with edges connecting
overlapping gene sets. In this way, mutually overlapping gene sets are tend to
cluster together, making it easy to identify functional module.
The `emapplot` function supports results obtained from hypergeometric test and gene set enrichment analysis. The `pie_scale` parameter can be used to resize nodes, as demonstrated in Figure \@ref(fig:Enrichment) B, and the `layout` parameter can adjust the layout, as demonstrated in Figure \@ref(fig:Enrichment) C and D.
(ref:Enrichmentscap) Plot for results obtained from hypergeometric test and gene set enrichment analysis.
(ref:Enrichmentcap) **Plot for results obtained from hypergeometric test and gene set enrichment analysis.** default (A), `pie_scale=1.5` (B), `layout="kk"` (C) and `pie_scale=1.5,layout="kk"` (D).
```{r Enrichment, fig.height=12, fig.width=16, fig.cap="(ref:Enrichmentcap)", fig.scap="(ref:Enrichmentscap)"}
p1 <- emapplot(edo)
p2 <- emapplot(edo, pie_scale=1.5)
p3 <- emapplot(edo,layout="kk")
p4 <- emapplot(edo, pie_scale=1.5,layout="kk")
cowplot::plot_grid(p1, p2, p3, p4, ncol=2, labels=LETTERS[1:4])
```
The `emapplot` function also supports results obtained from `compareCluster` function of `clusterProfiler` package. In addition to `pie_scale` and `layout` parameters, the number of circles in the bottom left corner can be adjusted using the `legend_n` parameteras, as demonstrated in Figure \@ref(fig:Enrichment2) B. And proportion of clusters in the pie chart can be adjusted using the `pie` parameter, when `pie="count"`, the proportion of clusters in the pie chart is determined by the number of genes, as demonstrated in Figure \@ref(fig:Enrichment2) C and D.
(ref:Enrichment2scap) Plot for results obtained from `compareCluster` function of `clusterProfiler` package.
(ref:Enrichment2cap) **Plot for results obtained from `compareCluster` function of `clusterProfiler` package.** default (A), `legend_n=2` (B), `pie="count"` (C) and `pie="count", pie_scale=1.5, layout="kk"` (D).
```{r Enrichment2, fig.height=12, fig.width=16, fig.cap="(ref:Enrichment2cap)", fig.scap="(ref:Enrichment2scap)"}
library(clusterProfiler)
data(gcSample)
xx <- compareCluster(gcSample, fun="enrichKEGG",
organism="hsa", pvalueCutoff=0.05)
p1 <- emapplot(xx)
p2 <- emapplot(xx,legend_n=2)
p3 <- emapplot(xx,pie="count")
p4 <- emapplot(xx,pie="count", pie_scale=1.5, layout="kk")
cowplot::plot_grid(p1, p2, p3, p4, ncol=2, labels=LETTERS[1:4])
```
## UpSet Plot
The `upsetplot` is an alternative to `cnetplot` for visualizing the complex
association between genes and gene sets. It emphasizes the gene overlapping
among different gene sets.
(ref:upsetORAscap) Upsetplot for over-representation analysis.
(ref:upsetORAcap) **Upsetplot for over-representation analysis.**
```{r upsetORA, fig.width=12, fig.height=5, fig.cap="(ref:upsetORAcap)", fig.scap="(ref:upsetORAscap)"}
upsetplot(edo)
```
For over-representation analysis, `upsetplot` will calculate the overlaps among different gene sets as demonstrated in Figure \@ref(fig:upsetORA). For GSEA result, it will plot the fold change distributions of different categories (e.g. unique to pathway, overlaps among different pathways).
(ref:upsetGSEAscap) Upsetplot for gene set enrichment analysis.
(ref:upsetGSEAcap) **Upsetplot for gene set enrichment analysis.**
```{r upsetGSEA, fig.height=5, fig.width=12, fig.cap="(ref:upsetGSEAcap)", fig.scap="(ref:upsetGSEAscap)"}
upsetplot(kk2)
```
## ridgeline plot for expression distribution of GSEA result
The `ridgeplot` will visualize expression distributions of core enriched genes
for GSEA enriched categories. It helps users to interpret up/down-regulated pathways.
(ref:ridgeplotscap) Ridgeplot for gene set enrichment analysis.
(ref:ridgeplotcap) **Ridgeplot for gene set enrichment analysis.**
```{r ridgeplot, fig.width=12, fig.height=8, message=FALSE, fig.cap="(ref:ridgeplotcap)", fig.scap="(ref:ridgeplotscap)"}
ridgeplot(edo2)
```
## running score and preranked list of GSEA result
Running score and preranked list are traditional methods for visualizing GSEA
result. The `r Biocpkg("enrichplot")` package supports both of them to visualize
the distribution of the gene set and the enrichment score.
(ref:gseaplotscap) gseaplot for GSEA result(`by = "runningScore"`).
(ref:gseaplotcap) **gseaplot for GSEA result(`by = "runningScore"`).** `by = "runningScore"` (A), `by = "preranked"` (B), default (C)
```{r gseaplot, fig.width=12, fig.height=16, fig.cap="(ref:gseaplotcap)", fig.scap="(ref:gseaplotscap)"}
p1 <- gseaplot(edo2, geneSetID = 1, by = "runningScore", title = edo2$Description[1])
p2 <- gseaplot(edo2, geneSetID = 1, by = "preranked", title = edo2$Description[1])
p3 <- gseaplot(edo2, geneSetID = 1, title = edo2$Description[1])
cowplot::plot_grid(p1, p2, p3, ncol=1, labels=LETTERS[1:3])
```
Another method to plot GSEA result is the `gseaplot2` function:
(ref:gseaplot2scap) Gseaplot2 for GSEA result.
(ref:gseaplot2cap) **Gseaplot2 for GSEA result.**
```{r gseaplot2, fig.width=12, fig.height=8, fig.cap="(ref:gseaplot2cap)", fig.scap="(ref:gseaplot2scap)"}
gseaplot2(edo2, geneSetID = 1, title = edo2$Description[1])
```
The `gseaplot2` also supports multile gene sets to be displayed on the same figure:
(ref:gseaplot22scap) Gseaplot2 for GSEA result of multile gene sets.
(ref:gseaplot22cap) **Gseaplot2 for GSEA result of multile gene sets.**
```{r gseaplot22, fig.width=12, fig.height=8, fig.cap="(ref:gseaplot22cap)", fig.scap="(ref:gseaplot22scap)"}
gseaplot2(edo2, geneSetID = 1:3)
```
User can also displaying the pvalue table on the plot via `pvalue_table`
parameter:
(ref:gseaplot23scap) Gseaplot2 for GSEA result of multile gene sets(add pvalue_table).
(ref:gseaplot23cap) **Gseaplot2 for GSEA result of multile gene sets(add pvalue_table).**
```{r gseaplot23, fig.width=12, fig.height=8, fig.cap="(ref:gseaplot23cap)", fig.scap="(ref:gseaplot23scap)"}
gseaplot2(edo2, geneSetID = 1:3, pvalue_table = TRUE,
color = c("#E495A5", "#86B875", "#7DB0DD"), ES_geom = "dot")
```
User can specify `subplots` to only display a subset of plots:
(ref:gseaplot24scap) Gseaplot2 for GSEA result of multile gene sets(add subplots).
(ref:gseaplot24cap) **Gseaplot2 for GSEA result of multile gene sets(add subplots).** `subplots = 1` (A),`subplots = 1:2` (B)
```{r gseaplot24, fig.width=12, fig.height=8, fig.cap="(ref:gseaplot24cap)", fig.scap="(ref:gseaplot24scap)"}
p1 <- gseaplot2(edo2, geneSetID = 1:3, subplots = 1)
p2 <- gseaplot2(edo2, geneSetID = 1:3, subplots = 1:2)
cowplot::plot_grid(p1, p2, ncol=1, labels=LETTERS[1:2])
```
The `gsearank` function plot the ranked list of genes belong to the specific
gene set.
(ref:gsearankscap) Ranked list of genes belong to the specific gene set.
(ref:gsearankcap) **Ranked list of genes belong to the specific gene set.**
```{r gsearank, fig.width=8, fig.height=4, fig.cap="(ref:gsearankcap)", fig.scap="(ref:gsearankscap)"}
gsearank(edo2, 1, title = edo2[1, "Description"])
```
Multiple gene sets can be aligned using `cowplot`:
(ref:gsearank2scap) Gsearank for multiple gene sets.
(ref:gsearank2cap) **Gsearank for multiple gene sets.**
```{r gsearank2, fig.width=8, fig.height=12, fig.cap="(ref:gsearank2cap)", fig.scap="(ref:gsearank2scap)"}
library(ggplot2)
library(cowplot)
pp <- lapply(1:3, function(i) {
anno <- edo2[i, c("NES", "pvalue", "p.adjust")]
lab <- paste0(names(anno), "=", round(anno, 3), collapse="\n")
gsearank(edo2, i, edo2[i, 2]) + xlab(NULL) +ylab(NULL) +
annotate("text", 0, edo2[i, "enrichmentScore"] * .9, label = lab, hjust=0, vjust=0)
})
plot_grid(plotlist=pp, ncol=1)
```
## pubmed trend of enriched terms
One of the problem of enrichment analysis is to find pathways for further
investigation. Here, we provide `pmcplot` function to plot the number/proportion
of publications trend based on the query result from PubMed Central. Of course,
users can use `pmcplot` in other scenarios. All text that can be queried on PMC
is valid as input of `pmcplot`.
(ref:pmcplotscap) Pmcplot of enrichment analysis.
(ref:pmcplotcap) **Pmcplot of enrichment analysis.**
```{r pmcplot, fig.width=14, fig.height=6, fig.cap="(ref:pmcplotcap)", fig.scap="(ref:pmcplotscap)"}
terms <- edo$Description[1:3]
p <- pmcplot(terms, 2010:2017)
p2 <- pmcplot(terms, 2010:2017, proportion=FALSE)
plot_grid(p, p2, ncol=2)
```
## goplot
`goplot` can accept output of `enrichGO` and visualized the enriched GO induced graph.
(ref:goplotscap) Goplot of enrichment analysis.
(ref:goplotcap) **Goplot of enrichment analysis.**
```{r goplot, fig.height=12, fig.width=8, eval=FALSE, fig.cap="(ref:goplotcap)", fig.scap="(ref:goplotscap)"}
goplot(ego)
```
## browseKEGG
To view the KEGG pathway, user can use `browseKEGG` function, which will open web browser and highlight enriched genes.
```{r browseKEGG, eval=FALSE}
browseKEGG(kk, 'hsa04110')
```
![](figures/browseKEGG.png)
## pathview from pathview package
[clusterProfiler](https://www.bioconductor.org/packages/clusterProfiler) users can also use `pathview` from the [pathview](https://www.bioconductor.org/packages/pathview)[@luo_pathview] to visualize KEGG pathway.
The following example illustrate how to visualize "hsa04110" pathway, which was enriched in our previous analysis.
```{r eval=FALSE}
library("pathview")
hsa04110 <- pathview(gene.data = geneList,
pathway.id = "hsa04110",
species = "hsa",
limit = list(gene=max(abs(geneList)), cpd=1))
```
![](figures/hsa04110_pathview.png)
For further information, please refer to the vignette of [pathview](https://www.bioconductor.org/packages/pathview)[@luo_pathview].