-
Notifications
You must be signed in to change notification settings - Fork 67
Planned Analysis: Copy number plot showing recurrently amplified/deleted regions in different PBTA cancers #8
Comments
@jharenza : is this still blocked? It seems like something that could be done now and if the CNV calls get improved, then this would just get re-run. |
I think I may have added it that way thinking consensus calls @gonzolgarcia creates would go into this, but perhaps consensus for CNV and SV would be separate issues? |
Ok! |
I have begun to tackle this issue. Thus far, I believe the package |
Related: #128 |
Hi @cbethell ! Thanks for working on this! I also just created an issue #128 for creation of consensus calls for CN to improve our accuracy for downstream analytics like this issue. I imagine the output would still be a SEG file, such that you can start with any of the SEGs we provide. I previously have used GISTIC, but it does not work well for small cohorts (ie you need a large N to call recurrence within a caner). We may want to focus this issue on histologies with a specified >= N so that we have enough power to detect recurrent changes already known in the literature. For example, we have a large cohort of low-grade gliomas, we have a good number of medulloblastomas (could see how many of each subtype we have since we know each subtype has different CNAs - Figure 2), high-grade gliomas. I found this recent paper describing |
Summarizing my understanding of the status of this issue:
Also I would note if anyone is going to continue development prior to #128, probably use CNVkit and not ControlFREEC (see caveat in README and the caveat in action in this plot). |
GISTIC 2.0 is available as a GenePattern module. This requires a reference file:
That description makes it seem like it might not be the most straightforward to obtain/derive? Also requires:
I think if we could run this in something like GenePattern Notebook so it's something public facing we can link to in a README, that could be good. Alternatively, something publicly facing on CAVATICA that can be linked to in the same way. This requires a little bit more research before we know if it's feasible but wanted to report what I know so far. |
@jaclyn-taroni I may be able to run GISTIC tomorrow and provide in the data release, once I get the updated seg files. |
I am working on getting this GISTIC plot set up in preparation for when the GISTIC data should be added to the release on v12. |
Did we want this plot to be exactly like the example above? Here is what I have so far: |
@jharenza , do you have thoughts about this rough draft GISTIC plot? What changes would you like to see? |
### release-v12-20191217 - release date: 2019-12-17 - status: available - changes: - Add `data-file-descriptions.md` with data release to better track file types, origins, and workflows per [#334](#334) and [#336](#336) - Add stranded RNA-Seq for 23 PNOC samples and 21 CBTTC samples previously sequenced using a polyA library prep. Files updated: - pbta-fusion-arriba.tsv.gz - pbta-fusion-starfusion.tsv.gz - pbta-gene-expression-rsem-tpm.stranded.rds - pbta-gene-expression-rsem-fpkm.stranded.rds - pbta-isoform-expression-rsem-tpm.stranded.rds - pbta-isoform-counts-rsem-expected_count.stranded.rds - pbta-gene-counts-rsem-expected_count.stranded.rds - pbta-gene-expression-kallisto.stranded.rds - pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds - Add recurrently-fused genes by histology and matrix of recurrently-fused genes by biospecimen from [fusion filtering and prioritization analysis](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/fusion_filtering) - Update consensus TMB files and MAF [#333]](#333) - Add RNA-Seq [collapsed matrices](#287) - wrong files (tables of transcripts removed) were included with [V10](#273) - Rename `WGS.hg38.mutect2.unpadded.bed` to `WGS.hg38.mutect2.vardict.unpadded.bed` to better reflect usage - Update `pbta-histologies.tsv` to add new RNA-Seq samples listed above, [#222 harmonize disease separators](#222), and reran [medulloblastoma classifier](https://github.com/d3b-center/medullo-classifier-package) using V12 RSEM fpkm collapsed files - BS_2Z1MKS84, BS_5VQP0E6K re-classified from Group4 to WNT and BS_3BDAG9YN, BS_8T7DZV2F, and BS_JTMXAMB7 re-classified from Group3 to WNT - Add CNVkit GISTIC results focal CN analyses, eg: [#244](#244) and [#8](#8)
* Release V12 data ### release-v12-20191217 - release date: 2019-12-17 - status: available - changes: - Add `data-file-descriptions.md` with data release to better track file types, origins, and workflows per [#334](#334) and [#336](#336) - Add stranded RNA-Seq for 23 PNOC samples and 21 CBTTC samples previously sequenced using a polyA library prep. Files updated: - pbta-fusion-arriba.tsv.gz - pbta-fusion-starfusion.tsv.gz - pbta-gene-expression-rsem-tpm.stranded.rds - pbta-gene-expression-rsem-fpkm.stranded.rds - pbta-isoform-expression-rsem-tpm.stranded.rds - pbta-isoform-counts-rsem-expected_count.stranded.rds - pbta-gene-counts-rsem-expected_count.stranded.rds - pbta-gene-expression-kallisto.stranded.rds - pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds - Add recurrently-fused genes by histology and matrix of recurrently-fused genes by biospecimen from [fusion filtering and prioritization analysis](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/fusion_filtering) - Update consensus TMB files and MAF [#333]](#333) - Add RNA-Seq [collapsed matrices](#287) - wrong files (tables of transcripts removed) were included with [V10](#273) - Rename `WGS.hg38.mutect2.unpadded.bed` to `WGS.hg38.mutect2.vardict.unpadded.bed` to better reflect usage - Update `pbta-histologies.tsv` to add new RNA-Seq samples listed above, [#222 harmonize disease separators](#222), and reran [medulloblastoma classifier](https://github.com/d3b-center/medullo-classifier-package) using V12 RSEM fpkm collapsed files - BS_2Z1MKS84, BS_5VQP0E6K re-classified from Group4 to WNT and BS_3BDAG9YN, BS_8T7DZV2F, and BS_JTMXAMB7 re-classified from Group3 to WNT - Add CNVkit GISTIC results focal CN analyses, eg: [#244](#244) and [#8](#8) * Update release-notes.md fix link * Update data-files-description.md fix GISTIC table sectioning * Update data-files-description.md fix spacing on data description table * Update data-files-description.md fix more spacing in data file description file * Update download-data.sh add new release date to download script * Update the TMB file descriptions * Update TMB file formats section * Update fusion section of data formats Also more specific description of the by sample file * Add GISTIC file to data-formats * Update download-data.sh * Update download-data.sh * data description md is also included in md5sum * TMB exon -> coding sequence * Coding TMB CDS, not exon
Per @jharenza , I will update this graph to be split up by specific histologies and annotate the chromosomes more clearly, perhaps like this example: |
@jharenza It doesn't look like the GISTIC results are split up by histology or have scores by samples, so I won't be able to do the individually plots by histology unless you run it separately. That being said, here's a re-done version of the GISTIC plot with the other fixes. I can functionalize this and apply it to histology groups if we get that data. |
@cansavvy yeah, that's right - I can't really do it by histology because there will be too low of an N for many histologies. For that reason, what I had done in the past is just subset the samples from the seg file per each histology and plot LRR as in the NBL figure above, pointing out key regions of amp/del, so there was no statistical test for those plots. I wonder if there is a way to use GISTIC The plot looks better! Is there a way to scale by chr size such that chr 1 is the largest and 22 the smallest? Are 23 and 24 X and Y? |
Ah good point, I’ll try to make the chromosomes scaled and change 23 and 24 to X and Y. I’ll look into making LRR plots by histology. |
@jharenza , do you have code associated with the graph above that you would be able to send me so I can see how the y axis data was calculated? |
Hi @cansavvy - I added it here but could not get the |
@jharenza , Per our discussion in Slack, it appears the difference between the code you linked above (which used ControlFreeC) and our data we are using here (CNVKit derived) is that we do not have I am not familiar with what is standard practice for reporting these data so if there is a preferred alternative to median copy number, let me know and I should be able to fairly easily switch out the calculation. I will look at some papers and see if I can figure out something better as well. |
It appears that That being said I'm going to move forward with using |
We have a consensus SEG file that includes the The Updating the GISTIC plots portion of that module depends on #453. |
I believe this issue can be closed now? |
The idea would be to visualize trends of deletions / amplifications in each of the cancer types (where relevant). Pointing out highly amplified/deleted regions with oncogenes/tumor suppressors may be interesting.
The text was updated successfully, but these errors were encountered: