-
Notifications
You must be signed in to change notification settings - Fork 83
Proposed Analysis: Molecularly subtype ependymoma tumors #245
Comments
I will work on this. I will do this after the release of v13 (next week). |
I will be working on this ticket starting next week. |
@cansavvy there is the following note above:
Is there any output you can include (e.g., summary statistic) as part of |
I can file a PR that saves total chromosomal breakpoint numbers per biospecimen to TSV file. But perhaps to make it more interpretable we need it to be divided by the size of the effectively surveyed region of the genome (i.e. WGS vs WXS)? |
Moving discussion about the specifics over to #394 to keep this focused, but to close the loop - yes, we will want this information, divided by the size of the effectively surveyed genome, saved as a TSV. |
Hi @tkoganti, as we discussed in person I am filling in a bit of the how behind this ticket. Now that I have revisited this ticket, I think Continuous integrationContinuous integration (CI) has some special considerations when we are working in the context of the molecular subtyping tickets. For continuous integration, we use a set of files that only contains a limited number of participants to save on download time and the amount of RAM needed to run the analyses (Continuous Integration (CI) section of the README). What this means for this issue (and any other subtyping issue) is that there often will not be a large number of or sometimes any of the relevant samples used in continuous integration and that will cause things to fail in continuous integration. To get around this, I suggest that the first thing you add is a script that subsets all the files you will need for subtyping to only the ependymoma samples you are interested in subsetting. You will need to add and commit these files to the repository (perhaps in The Here's the ATRT example following the passing variables only in CI instructions: OpenPBTA-analysis/analyses/molecular-subtyping-ATRT/run-molecular-subtyping-ATRT.sh Line 13 in d143e1a
And then subsetting will be run by default ( OpenPBTA-analysis/analyses/molecular-subtyping-ATRT/run-molecular-subtyping-ATRT.sh Line 22 in b4b5230
But it is skipped in CI with OpenPBTA-analysis/.circleci/config.yml Line 102 in d143e1a
Where to find specific featuresNow we need to figure out which files you will need to satisfy the goals of this issue. Some of these files you will need to subset; others you will not. More on that below. Files you will not need to subsetSome files in CI are small enough that we don't generate OR they are committed in the repository, so they are available in full.
Files you will need to subsetI think the main information you will need to generate subset files for are the instances that say
So you will need to use the expression files (the collapsed expression files to be consistent with other subtyping modules): Other gotchas that come to mind
If you want to see an example of what the output of one of these subtyping tickets look like in terms of the final table and presentation, see what is linked in #435 (comment) Let me know if you have any questions! |
@jaclyn-taroni Thank you so much for the information!!! This really helps to get me started on this ticket I am listing all the files I am going to be using along with the exact filters. Please review this and rectify if any changes need to be made. Also @jharenza if you could answer the questions I wrote here, that would be helpful! ST-EPN-RELA subtype criterion -
ST-EPN-YAP1 subtype criterion(Either needs one of the fusion or the CNA in 11q??) -
PF-EPN-A
PF-EPN-B
|
Hi @tkoganti - I'll clarify a couple things I'm able to comment on below.
There should be a row for each sample ( For
I would use the collapsed files:
The 749 BSIDs correspond to more than the ependymoma samples. (There is a conversation over on #410 about that.) What you will want to do to identify ependymoma samples is to filter the For things like:
I would caution that the GISTIC results that are available were run on the output of one CNV method (CNVkit) and not on the consensus, so there may be instances right now where GISTIC doesn't call something as neutral but will when using the consensus file (#453). Something to keep in mind. |
Hi @jaclyn-taroni and @jharenza! There are 93 ependymoma samples in total(from BS_0BXY0F9N', |
@tkoganti can you check if those samples are in the Arriba and STARFusion files? I believe they would need to be in both of the original files to make it to the summary file. If they are not in both, I would consider that data to be missing, rather than the absence of fusions in those samples. |
I checked for a few samples and they are present in both |
Can you file a data issue please @tkoganti and describe what you found? We should dig into if there’s an issue with the fusion summary file. I suspect what happened is that these samples are not represented in the putative oncogenic file — i.e., the have 0 fusions that meet the filtering criteria — so it’s the equivalent of having a zero for EPN-relevant fusions if my suspicions are correct. |
Hi @tkoganti, now that #478 is merged, if you update the branch you are working on to be in sync with this master branch (see the command line instructions here) you can use the |
@jharenza I am not finding |
AHhh, yes, it is |
Upon exploration of fusion data for another analysis @kgaonkar6 and I found that the meningioma sample |
Scientific goals
What are the scientific goals of the analysis?
Subtype ependymomas into ST-EPN-RELA, ST-EPN-YAP1, PF-EPN-A, and PF-EPN-B. Note: The publication listed below contains 9 subtypes of ependymomas, but only the 4 listed here are relevant to the OpenPBTA dataset (due to age at diagnosis) and will be discussed below.
Proposed methods
What methods do you plan to use to accomplish the scientific goals?
ST-EPN-RELA (Supratentorial Ependymoma, RELA fused)
ST-EPN-YAP1 (Supratentorial Ependymoma, YAP1 fused)
PF-EPN-A (Posterior Fossa Ependymoma, Type A)
PF-EPN-B (Posterior Fossa Ependymoma, Type B)
May be able to determine brain regions using the
primary_site
from pbta_histologies.tsv - Ref:Required input data
What input data will you use for this analysis?
RNA fusions, RNA expression, copy number, SVs, histologies file
Proposed timeline
What is the timeline for the analysis?
1 week
Relevant literature
If there is relevant scientific literature, put links to those items here.
Link to Molecular Classification of Ependymal Tumors
across All CNS Compartments, Histopathological
Grades, and Age Groups
Link to C11orf95-RELA fusions drive oncogenic NF-κB signalling in ependymoma.
The text was updated successfully, but these errors were encountered: