-
Notifications
You must be signed in to change notification settings - Fork 67
Proposed Analysis: medulloblastoma subtyping #731
Comments
@jharenza wasn't this done already? Also - please make this 1 week instead of 1-2 days. |
@komalsrathi - yes, it was done early on, and since it was not within the repository, in discussions with @jaclyn-taroni, we thought we should add it as an official analysis PR to add transparency to the subtyping that ended up in the histologies file. Thank you - not a rush, but something we want to add! |
Ok makes sense. I'll finish it this week, thank you. |
@jharenza @jaclyn-taroni Just want to share the results for review before finalizing the code and creating a PR: I am using the following filter to get 122 MB samples (1 polyA and 121 rRNA depleted) as input for the classification:
The classifier was developed to take at least 2 samples as input, so I only have the results for the rRNA depleted 121 samples which I merged with the clinical findings: |
@komalsrathi thanks for this. I think if possible, we would like to add the poly-A sample's subtype. Out of curiosity, why are two samples required if they are analyzed independently? I know @adamcresnick will prefer to have complete information when we can. A way to get around this could be to duplicate it (so long as each entry is independently analyzed) or perform on the entire poly-A matrix and only retain that sample. Neither are ideal, but both would give us the result. I do have two comments on the output:
|
Hi,
What I will do is use some additional couple approaches and compare the
output:
1. Batch correct and merge the polyA and stranded Medullo data into a
single dataset to be used as input.
2. Use the entire matrix (non-MB samples as well) as input and do the
classification individually on polyA and stranded datasets.
I’ll also convert the 0s to <2e-16 because that’s what it represents.
If we can get a “clean” version of the clinical findings, we can also do a
statistical test between the two categorical values of observed and
expected subtypes.
I’ll update you soon.
Thanks!!
On Thu, Jul 23, 2020 at 2:47 PM Jo Lynne Rokita ***@***.***> wrote:
@komalsrathi <https://github.com/komalsrathi> thanks for this. I think if
possible, we would like to add the poly-A sample's subtype. Out of
curiosity, why are two samples required if they are analyzed independently?
I know @adamcresnick <https://github.com/adamcresnick> will prefer to
have complete information when we can. A way to get around this could be to
duplicate it (so long as each entry is independently analyzed) or perform
on the entire poly-A matrix and only retain that sample. Neither are ideal,
but both would give us the result.
I do have two comments on the output:
1. Can you change the "0" p-values to scientific notation so they are
not zero? They are probably 2e-16?
2. Regarding the clinical data - this was put together by @jenn0307
<https://github.com/jenn0307> 's team and I harmonized the fields a
bit. If this is going into the PR, @jaclyn-taroni
<https://github.com/jaclyn-taroni>, do we want to release this as a
file in the data release or @allisonheath
<https://github.com/allisonheath> capture this somehow in the
histologies file? It is still a bit messy (free text) because it was
information pulled from pathology reports, but the comparison is definitely
worth doing, as I can see multiple instances of subtypes changed due to the
classifier. cc @adamcresnick <https://github.com/adamcresnick> (input)
and @yuankunzhu <https://github.com/yuankunzhu> (data release) as well.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#731 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABVNEJ5P7377ZB4O63P4Q3LR5CAULANCNFSM4PAEJHLQ>
.
--
*Komal S Rathi* | Bioinformatics Scientist II, DBHi, The Children's
Hospital of Philadelphia | rathik@email.chop.edu
|
@jharenza @jaclyn-taroni I did the following:
The results are exactly the same for rRNA depleted samples in 1 and 2. Also, the polyA sample from batch corrected input was correctly classified as: SHH (pathology comment: Here is the output: I have added a column called Matches at the end of the output which informs if the best fit matches the clinical data.
If you do have concerns reg. batch correction: I did some QC on house keeping genes and following are the t-SNE plots. Just to reiterate: only MB samples (polya = 1, stranded = 121) were combined and batch corrected and I used 6 house keeping genes for the tSNE:
Please let me know your thoughts. |
@komalsrathi - this looks great, and I am glad the results were the same with batch correction. Regarding:
I would not say we mis-classified those samples - I think what this means is we have to go back to the pathologists and have them re-review. I am more inclined to think that the initial clinical information could have been wrong, ambiguous, or pathology shows one thing and expression shows another, which would make these good cases to focus on in the paper (cc: @adamcresnick). I am OK with this method - just want to be sure @jaclyn-taroni is also OK with it, since the batch correction is something we have not yet added to any of the analyses yet, and asking for her advice on how to handle - whether that should be a separate PR (I think we discussed that as a future goal, but that there might still be center batch issues not corrected yet), or lumped into this MB classifier PR. |
Any idea with how those putative misclassifications (based on clinical data) might square (or not) with some of what I saw with unsupervised analysis (#730)? I can provide some output of the training materials if that's helpful. |
Ok, I just redid the calculation on just combined (without batch correction) polyA and rRNA depleted MB samples (n = 122) and the results are identical to batch corrected data. So in case there are reservations about batch correction, we don't have to use it. |
@komalsrathi - would you be able to see if those 11 are some of the samples that don't cluster as expected in the unsupervised analysis? |
@jharenza would it be easier for the person who performed the unsupervised analysis to check that? the attached file has information on which ones were misclassified (
|
For ticket [here](AlexsLemonade/OpenPBTA-analysis#731)
What are the scientific goals of the analysis?
Subtype medulloblastoma samples into SHH, WNT, Group 3, and Group 4
What methods do you plan to use to accomplish the scientific goals?
https://github.com/d3b-center/medullo-classifier-package
Additionally, summarize whether the subtypes agree with clinical pathology where reported here.
What input data are required for this analysis?
RNA-Seq FPKM
How long do you expect is needed to complete the analysis? Will it be a multi-step analysis?
1 week
Who will complete the analysis (please add a GitHub handle here if relevant)?
@komalsrathi has completed the analysis. @komalsrathi will you please create a PR with the analysis when you have a chance?
What relevant scientific literature relates to this analysis?
4 medulloblastoma subgroups: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4334443/
The text was updated successfully, but these errors were encountered: