Skip to content

Latest commit

 

History

History
 
 

molecular-subtyping-embryonal

Molecular Subtyping non-MB, non-ATRT Embryonal Tumors

Module authors: Chante Bethell (@cbethell), Stephanie J. Spielman (@sjspielman) and Jaclyn Taroni (@jaclyn-taroni)

Note: The files in the subset-files directory were generated via 02-generate-subset-files.R using the the files in the version 17 data release.

Table of Contents generated with DocToc

Usage

When re-running this module, you may want to regenerate the subset files using the most recent data release. Files will be regenerated using the symlinked files in data by default when running from the command line as follows:

bash run-embryonal-subtyping.sh

00-embryonal-select-pathology-dx.Rmd is not run via this module's shell script, as it should be run locally, tied to release-v17-20200908, and should not be re-rendered when there are changes to the underlying pbta-histologies.tsv file in future releases (see Folder content and #748).

Molecular subtyping HGG workflow

embryonal molecular subtyping workflow

Molecular subtyping criterias

First, samples was selected based on the pathology-diagnosis and pathologies_free_text-diagnosis in this script. To molecular subtype embryonal tumors, following criteria is used based on WHO cancer guidebook:

  • ETMR, C19MC-altered: samples contain chromosome 19 amplification or the overexpression of LIN28A and contain TTYH1 gene fusion;
  • ETMR, NOS: samples contain NO TTYH1 gene sufion and the overexpression of LIN28A.
  • CNS HGNET-MN1: samples contain MN1 gene fusion.
  • CNS NB-FOXR2: samples contain FOXR2 gene fusion or the overexpression of FOXR2.
  • CNS Embryonal, NOS: Pathology diagnosis is "Neuroblastoma" and do NOT have high confidence methylation subtypes.
  • For all the other samples with Pathology diagnosis of "Neuroblastoma" and have high confidence methylation subtypes, molecular subtypes of these samples follow their methylation molecular subtypes.

Folder Content

00-embryonal-select-pathology-dx.R in this script we gather relevant strings from the summarized results above and histology updates review and save in subset-files/embryonal_subtyping_path_dx_strings.json, which is used downstream in 01-samples-to-subset.Rmd to identify the samples to include in the subset files.

01-samples-to-subset.Rmd is a notebook written to identify samples to include in subset files for the purpose of molecularly subtyping non-MB and non-ATRT embryonal tumors. The samples are identified using the following criteria:

  1. An RNA-seq biospecimen sample includes a TTYH1 fusion (5' partner) per this comment.
  2. An RNA-seq biospecimen sample includes a MN1 fusion (5' partner) per this comment. Note that the MN1--PATZ1 fusion is excluded as it is an entity separate of CNS HGNET-MN1 tumors per this comment.
  3. Any sample with "Supratentorial or Spinal Cord PNET" or "Embryonal Tumor with Multilayered Rosettes" in the pathology_diagnosis column of the metadata pbta-histologies.tsv per this comment and #1030.
  4. Any sample with "Neuroblastoma" in the pathology_diagnosis column, where primary_site does not contain "Other locations NOS", pathology_free_text_diagnosis does not contain "peripheral" or "metastatic" per the same comment as above and this comment.
  5. Any sample with "Other" in the pathology_diagnosis column of the metadata, and with "embryonal tumor with multilayer rosettes, ros (who grade iv)", "embryonal tumor, nos, congenital type", "ependymoblastoma" or "medulloepithelioma" in the pathology_free_text_diagnosis column per this comment. The output of this notebook is a TSV file, named biospecimen_ids_embryonal_subtyping.tsv, containing the biospeciemen IDs identified based on the above criteria (stored in the results directory of this module.

02-generate-subset-files.R is a script written to subset the files required for the subtyping of non-MB and non-ATRT embryonal tumors, using the output of 01-samples-to-subset.Rmd. The output of this script includes the subsets of the structural variant, poly-A RNA-seq, and stranded RNA-seq data files (stored in the subset-files directory of this module).

03-clean-c19mc-data.Rmd is a notebook written to clean copy number data related to C19MC amplifications in non-MB, non-ATRT embryonal tumors. Specifically, the goal of this notebook is to identify embryonal tumors with multilayered rosettes (ETMR), C19MC-altered tumors. This is done by filtering the consensus copy number calls (found at analyses/copy_number_consensus_call/results/pbta-cnv-consensus.seg.gz) to segments on chromosome 19 with a positive seg.mean. We are interested in the focal amplifications here because the amplification of C19MC, the miRNA cluster on chromosome 19, is a suggested characteristic of ETMRs as noted here. In this notebook, we also visualize the width of the focal amplifications found to ensure that they overlap as there is disagreement about the genomic location of C19MC per this comment. The output of this notebook is a cleaned TSV file, named cleaned_chr19_cn.tsv, containing a binary column indicating whether or not each biospecimen ID identified in 01-samples-to-subset.Rmd is associated with chromosome 19 amplification (stored in the results directory of this module).

04-table-prep.Rmd is a notebook written to construct tables that summarize the data relevant to the molecular subtyping of non-MB and non-ATRT embryonal tumors per the information provided on the reference GitHub issue #251. The output of this notebook includes two TSV files, both found in the results directory of this module. The first output file, embryonal_tumor_subtyping_relevant_data.tsv, contains a summary table of the data subsetted in 02-generate-subset-files.R, as well as relevant fusion and copy number data for the purpose of molecular subtyping embryonal tumors. The second output file, embryonal_tumor_molecular_subtypes.tsv, contains the molecular subtype information of the identified biospecimen IDs based on the summarized relevant data as described in the original comment and in this comment both found on the reference GitHub issue. The information in this file is represented in table with the following columns:

Kids_First_Participant_ID sample_id Kids_First_Biospecimen_ID_DNA Kids_First_Biospecimen_ID_RNA Kids_First_Biospecimen_ID_Methyl molecular_subtype molecular_subtype_methyl

Folder Structure

├── 00-embryonal-select-pathology-dx.Rmd
├── 00-embryonal-select-pathology-dx.nb.html
├── 01-samples-to-subset.Rmd
├── 01-samples-to-subset.nb.html
├── 02-generate-subset-files.R
├── 03-clean-c19mc-data.Rmd
├── 03-clean-c19mc-data.nb.html
├── 04-table-prep.Rmd
├── 04-table-prep.nb.html
├── README.md
├── results
│   ├── biospecimen_ids_embryonal_subtyping.tsv
│   ├── cleaned_chr19_cn.tsv
│   ├── embryonal_tumor_molecular_subtypes.tsv
│   └── embryonal_tumor_subtyping_relevant_data.tsv
\u2502   \u2514\u2500\u2500  methyl_embryonal_subtyping.tsv
├── run-embryonal-subtyping.sh
└── subset-files
    ├── embryonal_manta_sv.tsv
    ├── embryonal_subtyping_path_dx_strings.json
    ├── embryonal_zscored_exp.polya.rds
    └── embryonal_zscored_exp.stranded.rds