Molecular subtypes (MB) summary notebook #743

komalsrathi · 2020-08-21T18:28:53Z

Purpose/implementation Section

What scientific question is your analysis addressing?

Summarizes the following:

Medulloblastoma subtype classification using the two classifiers MM2S and medulloPackage on batch corrected and uncorrected expression matrix along with the expected subtypes from pathology reports.
Associated performance metric in terms of % accuracy for each classifier.
Assignment of consensus subtype between the two classifiers as the molecular_subtype.

What was your approach?

As suggested in #742, I have added some details to the notebook:

% Accuracy is currently being calculated by matching observed and expected subtypes where expected subtype info is available. In case of ambiguous subtypes, we treat it as a match if the observed subtype matches with any one of the expected subtypes
Consensus tabs: Molecular subtype labels are assigned only when the two classifiers agree, leaving all other samples as unclassified.

What GitHub issue does your pull request address?

#742

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Accuracy calculation and consensus molecular subtype assignment.

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

.html output containing summary tables

What is your summary of the results?

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

Documentation Checklist

This analysis module has a README and it is up to date.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

CI related edits to MB subtype steps

Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

…ilter

Skip filtering and batch correction. in CI

Update sva on Docker image

jaclyn-taroni · 2020-08-21T20:22:49Z

Looks like the total accuracy for the consensus calls is the same on the corrected vs. uncorrected matrices. Was the single poly-A sample one of the samples where the classifiers disagreed?

komalsrathi · 2020-08-22T19:54:48Z

Some more details:

# 25 samples with consensus classes matching pathology (corrected matrix)
corrected.diff <- consensus.corrected[which(consensus.corrected$match  == TRUE),'Kids_First_Biospecimen_ID']
length(corrected.diff)
[1] 25

# 25 samples with consensus classes matching pathology (uncorrected matrix)
uncorrected.diff <- consensus.uncorrected[which(consensus.uncorrected$match == TRUE),'Kids_First_Biospecimen_ID']
length(uncorrected.diff)
[1] 25

# what's the difference? in each case, a different sample does not agree
# consensus corrected matrix has a match but uncorrected does not
sample1 <- setdiff(corrected.diff, uncorrected.diff) # BS_HB03GSHF

# check sample1 in consensus uncorrected output
consensus.uncorrected %>%
  filter(Kids_First_Biospecimen_ID %in% sample1) %>%
  dplyr::select(Kids_First_Biospecimen_ID, pathology_subtype,  MM2S_best_fit, medulloPackage_best_fit, match)
Kids_First_Biospecimen_ID pathology_subtype MM2S_best_fit medulloPackage_best_fit match
1               BS_HB03GSHF               WNT           SHH                     WNT    NA

# consensus uncorrected matrix has a match but corrected does not
sample2 <- setdiff(uncorrected.diff, corrected.diff) # BS_V96WVE3Z

# check sample2 in consensus corrected output
consensus.corrected %>%
  filter(Kids_First_Biospecimen_ID %in% sample2) %>%
  dplyr::select(Kids_First_Biospecimen_ID, pathology_subtype,  MM2S_best_fit, medulloPackage_best_fit, match)
Kids_First_Biospecimen_ID pathology_subtype MM2S_best_fit medulloPackage_best_fit match
1               BS_V96WVE3Z               SHH        Group3                     SHH    NA

In both cases, the medulloPackage classification matches the pathology report but MM2S does not. Both samples are stranded

jaclyn-taroni

Hi @komalsrathi, thanks for sending this and providing a bit more background information! I had a few comments in service of making these results easier to revisit in the future if necessary, preparing the subtype labels for consumption elsewhere, and DRYing up the notebook a bit.

analyses/molecular-subtyping-MB/02-compare-classes.Rmd

jaclyn-taroni · 2020-08-24T14:47:36Z

analyses/molecular-subtyping-MB/02-compare-classes.Rmd

+  mutate(pathology_subtype = replace(pathology_subtype, 
+                                     pathology_subtype == "Group 3 or 4", "Group3, Group4")) %>%
+  mutate(pathology_subtype = gsub(" ", "", pathology_subtype))


I think this would accomplish the same thing but I'm not sure if there are other subtypes separated by , in pathology_subtype

Suggested change

mutate(pathology_subtype = replace(pathology_subtype,

pathology_subtype == "Group 3 or 4", "Group3, Group4")) %>%

mutate(pathology_subtype = gsub(" ", "", pathology_subtype))

mutate(pathology_subtype = replace(pathology_subtype,

pathology_subtype == "Group 3 or 4", "Group3,Group4"))

So for this, there is Group 3 and Group 4 in path reports and we have Group3 and Group4 which is why the gsub.

Ah, so pathology_subtype can have the values Group 3 or 4 or Group 3, Group 4 before you do any mutating - is that correct?

These are the unique values for expected types (pathology):

unique(dat$pathology_subtype)
[1] NA "WNT" "Group 3 or 4" "SHH" "non-WNT"
[6] "Group 4"

So I convert Group 3 or 4 to Group3, Group4 using replace (so that we can match either Group3 or Group4 predicted subtype) and gsub converts Group 4 to Group4 (because for observed types we have SHH, WNT, Group3 and Group4 i.e. no spaces)

jaclyn-taroni · 2020-08-24T15:30:49Z

analyses/molecular-subtyping-MB/02-compare-classes.Rmd

+
+```{r, echo = TRUE, warning = FALSE, message = FALSE}
+# merge observed and expected subtypes
+mm2s.corrected <- obs.class[[1]]


Using an index here means we're relying on an analyst to know or remember what order the observed results come back in. There are no names in obs.class at the moment. I'd recommend making alterations to the upstream step (01-classify-mb.R) such that there is information about the method and dataset used in this object. That way someone who did not author the module could use this object "off-the-shelf" without much digging.

Yes totally makes sense - will add names to the list items.

added names to the list items:
https://github.com/komalsrathi/OpenPBTA-analysis/blob/mb-class-nb/analyses/molecular-subtyping-MB/01-classify-mb.R#L49

jaclyn-taroni · 2020-08-24T15:33:40Z

analyses/molecular-subtyping-MB/02-compare-classes.Rmd

+}
+```
+
+#### Details:


Can you add information about how many samples have pathology subtype labels and (briefly) what the process is for obtaining those labels (for you, not what goes into that for pathology! 🙂 ) please?

I would 'mirror' this information in the module README as you've done with these other points, too.

Updated README with breakdown of samples and subtypes from path report:
https://github.com/komalsrathi/OpenPBTA-analysis/tree/mb-class-nb/analyses/molecular-subtyping-MB#02-compare-classesrmd

Just unsure of what the second part means: process of obtaining those labels. I am using a file that @jharenza provided as input.

I basically meant how do we get the labels from what I assume is a pathology report to the file you got and does it come from another database, for example. But you state it's from a pathology report, so I think that's sufficient for now. Thank you for the update!

jaclyn-taroni · 2020-08-24T15:39:20Z

analyses/molecular-subtyping-MB/02-compare-classes.Rmd

+```{r, echo = TRUE, warning = FALSE, message = FALSE}
+# merge observed and expected subtypes
+medullo.classifier.corrected <- obs.class[[3]]
+medullo.classifier.corrected <- exp.class %>%
+  inner_join(medullo.classifier.corrected, by = c('Kids_First_Biospecimen_ID' = 'sample')) %>%
+  mutate(match = str_detect(pathology_subtype, best.fit))
+
+# % accuracy
+medullo.classifier.corrected.acc <- medullo.classifier.corrected %>%
+  filter(!is.na(pathology_subtype)) %>%
+  group_by(match) %>%
+  summarise(n = n()) %>%
+  mutate(Accuracy = paste0(n/sum(n)*100, '%')) %>%
+  filter(match) %>%
+  .$Accuracy
+print(paste0("Accuracy: ", medullo.classifier.corrected.acc))
+
+# output table
+viewDataTable(medullo.classifier.corrected)


A lot of this code is repeated between different classifier-dataset combinations. That suggests to me that it may be useful to wrap up joining with the expected subtype information and accuracy calculation in a custom function.

I have added individual functions here:
https://github.com/komalsrathi/OpenPBTA-analysis/blob/mb-class-nb/analyses/molecular-subtyping-MB/02-compare-classes.Rmd#L18

jaclyn-taroni · 2020-08-24T15:45:48Z

analyses/molecular-subtyping-MB/02-compare-classes.Rmd

+
+# output table
+viewDataTable(consensus.uncorrected)
+```


I would expect the output from this notebook to include a table of the consensus labels (preferably as a TSV).

In modules for subtyping other histologies, we've joined all the identifiers (DNA and RNA) into a single table: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/e8fbc7dc6aa8a36d7236fbe10a038657e3782e09/analyses/molecular-subtyping-HGG/results/HGG_molecular_subtype.tsv This particular step can come in a subsequent notebook if you'd prefer. But I do not expect this notebook to be overly long, particularly with the custom function changes.

Updated markdown to save consensus subtype output merged with RNA+DNA id to tsv files: https://github.com/komalsrathi/OpenPBTA-analysis/blob/mb-class-nb/analyses/molecular-subtyping-MB/02-compare-classes.Rmd#L256
and
https://github.com/komalsrathi/OpenPBTA-analysis/blob/mb-class-nb/analyses/molecular-subtyping-MB/02-compare-classes.Rmd#L279

Also updated the README at the analyses and module level.

komalsrathi · 2020-08-24T17:05:17Z

Hi @jaclyn-taroni thanks again, ~~I will incorporate these and update latest by tomorrow.~~ was just handed over some high priority work, will try to update asap (latest by end of this week).

Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

komalsrathi · 2020-08-27T13:41:21Z

@jaclyn-taroni I am doing a bit of a QC just to make sure what I have is correct - I will update you shortly when this is ready to review.

…s into mb-class-nb

komalsrathi · 2020-08-27T16:05:03Z

@jaclyn-taroni I think this is ready now. I just added another tab with consensus comparison with some details. Total accuracy of consensus output remains the same for batch corrected and uncorrected consensus output with 25/32 matches to the reported pathology subtypes. Between the two consensus outputs, there are 24/25 matches. BS_V96WVE3Z is correctly predicted by consensus uncorrected output but not by batch-corrected output and BS_HB03GSHF is correctly predicted by consensus batch-corrected output but not by uncorrected output.

Added the above info to module README as well.

The only question remains how do we pick batch corrected vs uncorrected - I am not sure on what basis because the total accuracy remains the same.

jaclyn-taroni

Looks good to me! Thanks for the updates! I agree that we have to make a somewhat arbitrary choice between uncorrected and batch-corrected here, but hopefully the review of the pathology reports may inform that decision in the future (related: #746, #747).

komalsrathi and others added 30 commits August 18, 2020 09:07

add molecular subtype classification for MB

c85fcdc

Merge branch 'master' into mb-subtypes

31de9e5

remove old output

ca3fcef

add analyses README

77a1cee

add MM2S as alt method

fc91cc8

update path file

041dd95

log transform input

5f95637

move classify function to util

b0b44c6

add packages to dockerfile

194a20e

update analyses/README

8b22f79

update analyses/README

a8e4a32

add scripts to ci

098fccc

Merge remote-tracking branch 'komalsrathi/mb-subtypes' into mb-subtypes

1ecf7f7

Changes to get the ComBat step to run in Docker

9b4d7f9

Add idea for filter and ComBat script

adf3287

Merge pull request #1 from jaclyn-taroni/mb-subtypes

880736a

CI related edits to MB subtype steps

create single script

9a85941

create single script

e2e8378

add output files to gitignore

f30ec54

remove exprs data

ad5ce8b

Update Dockerfile

f888160

Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

add input data and move module up

97d7d35

Merge remote-tracking branch 'komalsrathi/mb-subtypes' into skip-mb-f…

05c4544

…ilter

Skip filtering step in CI

2be30c6

Merge pull request #2 from jaclyn-taroni/skip-mb-filter

9dd13c8

Skip filtering and batch correction. in CI

Merge branch 'master' into mb-subtypes

ae26569

Merge remote-tracking branch 'komalsrathi/mb-subtypes' into mb-sva-devel

51891e7

Install sva-devel on Docker image

0ce2bdc

Rerun module in Docker container

48c2bab

Set upgrade = FALSE for medullo classifier install

245529a

komalsrathi and others added 6 commits August 21, 2020 08:07

Merge pull request #3 from jaclyn-taroni/mb-sva-devel

6fc4333

Update sva on Docker image

update README and remove rmd

2aedf68

update analyses README

a6a30a3

add notebook with details on classifier performance

d8d6a6e

merge with master

eb4079f

updated output

078bc0b

komalsrathi changed the title ~~Mb class nb~~ Molecular subtypes (MB) summary notebook Aug 21, 2020

jaclyn-taroni reviewed Aug 24, 2020

View reviewed changes

komalsrathi and others added 6 commits August 26, 2020 14:25

Update analyses/molecular-subtyping-MB/02-compare-classes.Rmd

b3d5d07

Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

refactor markdown code

1868a69

update html

7617862

save output to tsv files

e4857e0

update analyses README

c5fa846

Merge branch 'master' into mb-class-nb

5ff3df5

komalsrathi added 2 commits August 27, 2020 11:58

add consensus comparison

0758f14

Merge branch 'mb-class-nb' of github.com:komalsrathi/OpenPBTA-analysi…

c11066a

…s into mb-class-nb

Alphabetical order

0244ee2

jaclyn-taroni approved these changes Aug 27, 2020

View reviewed changes

komalsrathi mentioned this pull request Aug 27, 2020

Updated analysis: medulloblastoma consensus subtypes #747

Closed

jaclyn-taroni merged commit 9efbddf into AlexsLemonade:master Aug 27, 2020

This was referenced Aug 28, 2020

Planned Data Release: V17 #732

Closed

Medulloblastoma pathology subtypes: file for data release #746

Closed

jaclyn-taroni mentioned this pull request Aug 28, 2020

Updated analysis: Take consensus of two classifiers for medulloblastoma subtype labels #742

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Molecular subtypes (MB) summary notebook #743

Molecular subtypes (MB) summary notebook #743

komalsrathi commented Aug 21, 2020

jaclyn-taroni commented Aug 21, 2020

komalsrathi commented Aug 22, 2020 •

edited

Loading

jaclyn-taroni left a comment

jaclyn-taroni Aug 24, 2020

komalsrathi Aug 26, 2020

jaclyn-taroni Aug 26, 2020

komalsrathi Aug 26, 2020 •

edited

Loading

jaclyn-taroni Aug 24, 2020

komalsrathi Aug 26, 2020

komalsrathi Aug 26, 2020

jaclyn-taroni Aug 24, 2020

jaclyn-taroni Aug 24, 2020

komalsrathi Aug 26, 2020

jaclyn-taroni Aug 26, 2020

jaclyn-taroni Aug 24, 2020

komalsrathi Aug 26, 2020

jaclyn-taroni Aug 24, 2020

komalsrathi Aug 27, 2020

komalsrathi commented Aug 24, 2020 •

edited

Loading

komalsrathi commented Aug 27, 2020

komalsrathi commented Aug 27, 2020 •

edited

Loading

jaclyn-taroni left a comment

Molecular subtypes (MB) summary notebook #743

Molecular subtypes (MB) summary notebook #743

Conversation

komalsrathi commented Aug 21, 2020

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

jaclyn-taroni commented Aug 21, 2020

komalsrathi commented Aug 22, 2020 • edited Loading

jaclyn-taroni left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

komalsrathi Aug 26, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

komalsrathi commented Aug 24, 2020 • edited Loading

komalsrathi commented Aug 27, 2020

komalsrathi commented Aug 27, 2020 • edited Loading

jaclyn-taroni left a comment

Choose a reason for hiding this comment

komalsrathi commented Aug 22, 2020 •

edited

Loading

komalsrathi Aug 26, 2020 •

edited

Loading

komalsrathi commented Aug 24, 2020 •

edited

Loading

komalsrathi commented Aug 27, 2020 •

edited

Loading