-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlecture7_review_intro_to_GSEA.html
343 lines (266 loc) · 11.6 KB
/
lecture7_review_intro_to_GSEA.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
<!DOCTYPE html>
<html lang="" xml:lang="">
<head>
<title>BCB420 - Computational Systems Biology</title>
<meta charset="utf-8" />
<meta name="author" content="Ruth Isserlin" />
<meta name="date" content="2020-02-24" />
<link href="libs/remark-css-0.0.1/default.css" rel="stylesheet" />
<link rel="stylesheet" href="libs/example.css" type="text/css" />
</head>
<body>
<textarea id="source">
class: center, middle, inverse, title-slide
# BCB420 - Computational Systems Biology
## Lecture 7 - Recap and GSEA
### Ruth Isserlin
### 2020-02-24
---
## Before we start
It is not too late to fill this out (If you haven't filled this out yet please do):
[<font size=8>Mid course Feedback : https://forms.gle/maGA529V7pxvgBXC6</font>](https://forms.gle/maGA529V7pxvgBXC6)
---
class: left
# Journal (Clarifications)
* **Main purpose** : to develop good habits.
* Things to remember:
* Often the person we are writing notes for is our future selves when we revisit a project, need to write up all the details of a given project for publication.
* data transformations, parameters, code version are good details to include.
* Errors that you encountered and how you fixed them! **They will come up again. I guarentee it!**
* See [journal course prepratory material](https://bcb420-2020.github.io/General_course_prep/journal.html) for more details and template of a journal entry.
* What should be in your journal? (minimally:)
* Plaigarism unit
* work associated with Assignment
* attempts at using docker
* annotation source homework
* gprofiler homework
* any future homework
* If there are assigned readings - enter your notes on the article as a journal entry.
---
<img src=./images/img_lecture7/plots_plots.png>
---
<img src=./images/img_lecture7/course_overview_chart_expanded.png>
---
<img src=./images/img_lecture7/data_exploration.png>
---
<img src=./images/img_lecture7/data_clean1.png>
---
<img src=./images/img_lecture7/data_clean2.png>
---
<img src=./images/img_lecture7/data_normalize_density.png>
---
<img src=./images/img_lecture7/data_normalize_boxplot.png>
---
<img src=./images/img_lecture7/data_score1.png>
---
<img src=./images/img_lecture7/data_score2.png>
---
<img src=./images/img_lecture7/data_score3.png>
---
<img src=./images/img_lecture7/data_score4.png>
---
<img src=./images/img_lecture7/data_score5.png>
---
<img src=./images/img_lecture7/plots_explained.png>
---
<img src=./images/img_lecture7/gprofiler1.png>
---
<img src="./images/img_lecture6/waldron_ora_methods.png">
<font size=2>Geistlinger L, Csaba G, Santarelli M, Ramos M, Schiffer L, Turaga N, Law C,Davis S, Carey V, Morgan M, Zimmer R, Waldron L. Toward a gold standard for
benchmarking gene set enrichment analysis. Brief Bioinform. 2020 Feb 6 [PMID](https://www.ncbi.nlm.nih.gov/pubmed/32026945)</font>
---
## Homework from last week
Use this list of genes:[genelist.txt](https://github.com/bcb420-2020/Student_Wiki/blob/master/genelist.txt) as your query set and run a [g:profiler](https://biit.cs.ut.ee/gprofiler/gost) enrichment analysis with the following parameters:
1.Data sources : Reactome, Go biologoical process, and Wiki pathways
1.Multiple hypothesis testing - Benjamini hochberg
Answer the questions below:
1. What is the top term returned in each data source?
1. How many genes are in each of the above genesets returned?
1. How many genes from our query are found in the above genesets?
1. Change g:profiler settings so that you limit the size of the returned genesets. Make sure the returned genesets are between 5 and 200 genes in size. Did that change the results?
1. Which of the 4 ovarian cancer expression subtypes do you think this list represents?
1. **Bonus**: The top gene returned for this comparison is TFEC (ensembl gene id:ENSG00000105967). Is it found annotated in any of the pathways returned by g:profiler for our query? What terms is it associated with in g:profiler?
---
#Let's go through the answers
[<font size=8>www.kahoot.it</font>](www.kahoot.it)
---
**Bonus**: The top gene returned for this comparison is TFEC (ensembl gene id:ENSG00000105967). Is it found annotated in any of the pathways returned by g:profiler for our query? What terms is it associated with in g:profiler?
---
<img src=./images/img_lecture7/gsea1.png>
---
<img src=./images/img_lecture7/gsea2.png>
---
<img src=./images/img_lecture7/gsea3.png>
---
<img src=./images/img_lecture7/gsea4.png>
---
<img src=./images/img_lecture7/gsea5.png>
---
<img src=./images/img_lecture7/gsea6.png>
---
<img src=./images/img_lecture7/gsea7.png>
---
<img src=./images/img_lecture7/gsea8.png>
---
<img src=./images/img_lecture7/gsea9.png>
---
<img src=./images/img_lecture7/msigdb.png>
---
<img src=./images/img_lecture7/msigdb2.png>
---
<img src=./images/img_lecture7/genesets1.png>
---
## Bader lab genesets
[http://download.baderlab.org/EM_Genesets/](http://download.baderlab.org/EM_Genesets/)
* Automatically download the latest geneset file for your analysis
---
```r
gmt_url = "http://download.baderlab.org/EM_Genesets/current_release/Human/symbol/"
# list all the files on the server
filenames = getURL(gmt_url)
tc = textConnection(filenames)
contents = readLines(tc)
close(tc)
# get the gmt that has all the pathways and does not include terms inferred from
# electronic annotations(IEA) start with gmt file that has pathways only
rx = gregexpr("(?<=<a href=\")(.*.GOBP_AllPathways_no_GO_iea.*.)(.gmt)(?=\">)", contents,
perl = TRUE)
gmt_file = unlist(regmatches(contents, rx))
dest_gmt_file <- file.path(data_dir, gmt_file)
download.file(paste(gmt_url, gmt_file, sep = ""), destfile = dest_gmt_file)
```
---
<img src="./images/img_lecture6/waldron_enrichment_methods.png">
<font size=2>Geistlinger L, Csaba G, Santarelli M, Ramos M, Schiffer L, Turaga N, Law C,Davis S, Carey V, Morgan M, Zimmer R, Waldron L. Toward a gold standard for
benchmarking gene set enrichment analysis. Brief Bioinform. 2020 Feb 6 [PMID](https://www.ncbi.nlm.nih.gov/pubmed/32026945)</font>
---
## Running and Exploring GSEA
---
## Assignment #2
* differentail gene expression and preliminary ORA
* <font size=5> Due March 3, 2020! @ 20:00 </font>
## What to hand in?
* **html rendered RNotebook** - you should submit this through quercus
* Make sure the notebook and all associated code is checked into your github repo as I will be pulling all the repos at the deadline and using them to compile your code. - Your checked in code must replicate the handed in notebook.
* Document your work and your code directly in the notebook.
* **Reference the paper associated with your data!**
* **Introduce your paper and your data again**
* You are allowed to use helper functions or methods but make sure when you source those files the paths to them are relative and that they are checked into your repo as well.
---
## Homework for next week
Practise using GSEA.
Given the ranked list comparing mesenchymal and immunoreactive ovarian cancer (mesenchymal genes have positive scores, immunoreactive have negative scores). perform a GSEA preranked analysis using the following parameters:
* genesets from the baderlab geneset collection from February 1, 2020 containing GO biological process, no IEA and pathways.
* maximum geneset size of 200
* minimum geneset size of 15
* gene set permutation
and answer the following questions in your journal:
1. What is the top gene set returned for the Mesenchymal sub type? What is the top gene set returned for the Immunoreactive subtype?
1. What is its pvalue, ES, NES and FDR associated with it.
1. How many genes in its leading edge?
1. What is the top gene associated with this geneset
</textarea>
<style data-target="print-only">@media screen {.remark-slide-container{display:block;}.remark-slide-scaler{box-shadow:none;}}</style>
<script src="https://remarkjs.com/downloads/remark-latest.min.js"></script>
<script>var slideshow = remark.create({
"highlightStyle": "github",
"highlightLines": true,
"highlightSpans": true,
"countIncrementalSlides": false
});
if (window.HTMLWidgets) slideshow.on('afterShowSlide', function (slide) {
window.dispatchEvent(new Event('resize'));
});
(function(d) {
var s = d.createElement("style"), r = d.querySelector(".remark-slide-scaler");
if (!r) return;
s.type = "text/css"; s.innerHTML = "@page {size: " + r.style.width + " " + r.style.height +"; }";
d.head.appendChild(s);
})(document);
(function(d) {
var el = d.getElementsByClassName("remark-slides-area");
if (!el) return;
var slide, slides = slideshow.getSlides(), els = el[0].children;
for (var i = 1; i < slides.length; i++) {
slide = slides[i];
if (slide.properties.continued === "true" || slide.properties.count === "false") {
els[i - 1].className += ' has-continuation';
}
}
var s = d.createElement("style");
s.type = "text/css"; s.innerHTML = "@media print { .has-continuation { display: none; } }";
d.head.appendChild(s);
})(document);
// delete the temporary CSS (for displaying all slides initially) when the user
// starts to view slides
(function() {
var deleted = false;
slideshow.on('beforeShowSlide', function(slide) {
if (deleted) return;
var sheets = document.styleSheets, node;
for (var i = 0; i < sheets.length; i++) {
node = sheets[i].ownerNode;
if (node.dataset["target"] !== "print-only") continue;
node.parentNode.removeChild(node);
}
deleted = true;
});
})();
// adds .remark-code-has-line-highlighted class to <pre> parent elements
// of code chunks containing highlighted lines with class .remark-code-line-highlighted
(function(d) {
const hlines = d.querySelectorAll('.remark-code-line-highlighted');
const preParents = [];
const findPreParent = function(line, p = 0) {
if (p > 1) return null; // traverse up no further than grandparent
const el = line.parentElement;
return el.tagName === "PRE" ? el : findPreParent(el, ++p);
};
for (let line of hlines) {
let pre = findPreParent(line);
if (pre && !preParents.includes(pre)) preParents.push(pre);
}
preParents.forEach(p => p.classList.add("remark-code-has-line-highlighted"));
})(document);</script>
<script>
(function() {
var links = document.getElementsByTagName('a');
for (var i = 0; i < links.length; i++) {
if (/^(https?:)?\/\//.test(links[i].getAttribute('href'))) {
links[i].target = '_blank';
}
}
})();
</script>
<script>
slideshow._releaseMath = function(el) {
var i, text, code, codes = el.getElementsByTagName('code');
for (i = 0; i < codes.length;) {
code = codes[i];
if (code.parentNode.tagName !== 'PRE' && code.childElementCount === 0) {
text = code.textContent;
if (/^\\\((.|\s)+\\\)$/.test(text) || /^\\\[(.|\s)+\\\]$/.test(text) ||
/^\$\$(.|\s)+\$\$$/.test(text) ||
/^\\begin\{([^}]+)\}(.|\s)+\\end\{[^}]+\}$/.test(text)) {
code.outerHTML = code.innerHTML; // remove <code></code>
continue;
}
}
i++;
}
};
slideshow._releaseMath(document);
</script>
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
var script = document.createElement('script');
script.type = 'text/javascript';
script.src = 'https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML';
if (location.protocol !== 'file:' && /^https?:/.test(script.src))
script.src = script.src.replace(/^https?:/, '');
document.getElementsByTagName('head')[0].appendChild(script);
})();
</script>
</body>
</html>