This directory contains various analysis modules in the OpenPBTA project. See the README of an individual analysis modules for more information about that module.
The table below is intended to help project organizers quickly get an idea of what files (and therefore types of data) are consumed by each analysis module, what the module does, and what output files it produces that can be consumed by other analysis modules.
This is in service of documenting interdependent analyses.
Note that nearly all modules use the harmonized clinical data file (pbta-histologies.tsv
) even when it is not explicitly included in the table below.
Module | Input Files | Brief Description | Output Files Consumed by Other Analyses |
---|---|---|---|
chromosomal-instability |
pbta-histologies.tsv pbta-sv-manta.tsv.gz pbta-cnv-cnvkit.seg.gz |
Evaluates chromosomal instability by calculating chromosomal breakpoint densities and by creating circular plot visuals | analyses/chromosomal-instability/breakpoint-data/cnv_breaks_densities.tsv analyses/chromosomal-instability/breakpoint-data/sv_breaks_densities.tsv |
chromothripsis |
pbta-sv-manta.tsv.gz pbta-cnv-consensus.seg.gz independent-specimens.wgs.primary-plus.tsv figures/palettes/histology_label_color_table.tsv analyses/chromosomal-instability/breakpoint-data/cnv_breaks_densities.tsv analyses/chromosomal-instability/breakpoint-data/sv_breaks_densities.tsv |
In progress; chromothripsis analysis per #1007 | N/A |
cnv-chrom-plot |
pbta-cnv-consensus-gistic.zip analyses/copy_number_consensus_call/results/pbta-cnv-consensus.seg |
Plots genome wide visualizations relating to copy number results | N/A |
cnv-comparison |
Earlier version of SEG files | Deprecated; compared earlier version of the CNV methods. | N/A |
collapse-rnaseq |
pbta-gene-expression-rsem-fpkm.polya.rds pbta-gene-expression-rsem-fpkm.stranded.rds gencode.v27.primary_assembly.annotation.gtf.gz |
Collapses RSEM FPKM matrices such that gene symbols are de-duplicated. | results/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds (included in data download; too large for tracking via GitHub) results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds (included in data download; too large for tracking via GitHub) |
comparative-RNASeq-analysis |
pbta-gene-expression-rsem-tpm.polya.rds pbta-gene-expression-rsem-tpm.stranded.rds pbta-histologies.tsv pbta-mend-qc-manifest.tsv pbta-mend-qc-results.tar.gz |
In progress; will produce expression outlier profiles per #229 | N/A |
compare-gistic |
analyses/run-gistic/results/pbta-cnv-consensus-gistic.zip analyses/run-gistic/results/pbta-cnv-consensus-hgat-gistic.zip analyses/run-gistic/results/pbta-cnv-consensus-lgat-gistic.zip analyses/run-gistic/results/pbta-cnv-consensus-medulloblastoma-gistic.zip |
Comparison of the GISTIC results of the entire cohort with the GISTIC results of three individual histolgies, namely, LGAT, HGAT and medulloblastoma (#547 | N/A |
copy_number_consensus_call |
pbta-cnv-cnvkit.seg.gz pbta-cnv-controlfreec.tsv.gz pbta-sv-manta.tsv.gz |
Produces consensus copy number calls per #128 and a set of excluded regions where CNV calls are not made | results/cnv_consensus.tsv results/pbta-cnv-consensus.seg.gz (included in data download) ref/cnv_excluded_regions.bed ref/cnv_callable.bed |
create-subset-files |
All files | This module contains the code to create the subset files used in continuous integration | All subset files for continuous integration |
focal-cn-file-preparation |
pbta-cnv-cnvkit.seg.gz pbta-cnv-controlfreec.tsv.gz pbta-gene-expression-rsem-fpkm-collapsed.polya.rds pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds analyses/copy_number_consensus_call/results/pbta-cnv-consensus.seg.gz |
Maps from copy number variant caller segments to gene identifiers; will be updated to take into account changes that affect entire cytobands, chromosome arms (#186) | results/cnvkit_annotated_cn_autosomes.tsv.gz results/cnvkit_annotated_cn_x_and_y.tsv.gz results/controlfreec_annotated_cn_autosomes.tsv.gz results/controlfreec_annotated_cn_x_and_y.tsv.gz results/consensus_seg_annotated_cn_autosomes.tsv.gz (included in data download) results/consensus_seg_annotated_cn_x_and_y.tsv.gz (included in data download) |
fusion_filtering |
pbta-fusion-arriba.tsv.gz pbta-fusion-starfusion.tsv.gz |
Standardizes, filters, and prioritizes fusion calls | results/pbta-fusion-putative-oncogenic.tsv (included in data download) results/pbta-fusion-recurrent-fusion-byhistology.tsv (included in data download) results/pbta-fusion-recurrent-fusion-bysample.tsv (included in data download) |
fusion-summary |
pbta-histologies.tsv pbta-fusion-putative-oncogenic.tsv pbta-fusion-arriba.tsv.gz pbta-fusion-starfusion.tsv.gz |
Generate summary tables from fusion files (#398; #623) | results/fusion_summary_embryonal_foi.tsv (included in data download) results/fusion_summary_ependymoma_foi.tsv (included in data download) results/fusion_summary_ewings_foi.tsv |
gene-set-enrichment-analysis |
analyses/collapse-rnaseq/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds analyses/collapse-rnaseq/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds |
In progress. Updated gene set enrichment analysis with appropriate RNA-seq expression data | results/gsva_scores_stranded.tsv results/gsva_scores_polya.tsv for stranded, polya expression data respectively |
hotspot-detection |
pbta-snv-strelka2.vep.maf.gz pbta-snv-mutect2.vep.maf.gz pbta-snv-vardict.vep.maf.gz pbta-snv-lancet.vep.maf.gz |
Scavenges cancer any hotspot calls from each caller and merges with consensus (3/3) calls if it was missed in snv-caller workflow. | pbta-snv-hotspots-mutation.maf.tsv.gz |
immune-deconv |
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds |
Immune/Stroma characterization across PBTA (part of #15) | results/deconv-output.RData |
independent-samples |
pbta-histologies.tsv |
Generates independent specimen lists for WGS/WXS samples | results/independent-specimens.wgs.primary.tsv (included in data download) results/independent-specimens.wgs.primary-plus.tsv (included in data download) results/independent-specimens.wgswxs.primary.tsv (included in data download) results/independent-specimens.wgswxs.primary-plus.tsv (included in data download) |
interaction-plots |
independent-specimens.wgs.primary-plus.tsv pbta-snv-consensus-mutation.maf.tsv.gz |
Creates interaction plots for mutation mutual exclusivity/co-occurrence #13; may be updated to include other data types (e.g., fusions) | N/A |
molecular-subtyping-ATRT |
analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz pbta-snv-consensus-mutation-tmb-all.tsv pbta-cnv-consensus-gistic.zip |
Summarizing data into tabular format in order to molecularly subtype ATRT samples #244; this analysis did not work | N/A |
molecular-subtyping-CRANIO |
pbta-histologies-base.tsv pbta-snv-consensus-mutation.maf.tsv.gz pbta-snv-scavenged-hotspots.maf.tsv.gz |
Molecular subtyping of craniopharyngiomas samples #810 | results/CRANIO_molecular_subtype.tsv |
molecular-subtyping-EPN |
pbta-histologies-base.tsv analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds pbta-cnv-consensus-gistic.zip analyses/chromosomal-instability/breakpoint-data/union_of_breaks_densities.tsv analyses/fusion-summary/results/fusion_summary_ependymoma_foi.tsv analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv |
In progress; molecular subtyping of ependymoma tumors | results/EPN_all_data_withsubgroup.tsv |
molecular-subtyping-EWS |
pbta-histologies-base.tsv analyses/fusion-summary/results/fusion_summary_ewings_foi.tsv |
Reclassification of tumors based on the presence of defining fusions for Ewing Sarcoma per #623 | results/EWS_samples.tsv |
molecular-subtyping-HGG |
pbta-histologies-base.tsv pbta-snv-consensus-mutation.maf.tsv.gz pbta-snv-scavenged-hotspots.maf.tsv.gz analyses/focal-cn-preparation/results/cnvkit_annotated_cn_autosomes.tsv.gz analyses/fusion_filyering/results/pbta-fusion-putative-oncogenic.tsv pbta-cnv-consensus-gistic.zip analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds |
Molecular subtyping of high-grade glioma samples #249 | results/HGG_molecular_subtype.tsv |
molecular-subtyping-LGAT |
pbta-histologies-base.tsv pbta-snv-consensus-mutation.maf.tsv.gz pbta-snv-scavenged-hotspots.maf.tsv.gz analyses/fusion_filtering/results/pbta-fusion-putative-oncogenic.tsv pbta-fusion-recurrently-fused-genes-bysample.tsv |
Molecular subtyping of Low-grade astrocytic tumor samples #631 | results/lgat_subtyping.tsv |
molecular-subtyping-MB |
pbta-histologies-base.tsv analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds |
Molecular classification of Medulloblastoma subtypes (part of #731) | results/MB_molecular_subtype.tsv results/MB_batchcorrected_molecular_subtype.tsv for uncorrected and batch-corrected input matrix |
molecular-subtyping-SHH-tp53 |
pbta-histologies pbta-snv-consensus-mutation.maf.tsv.gz |
Deprecated; Identify the SHH-classified medulloblastoma samples that have TP53 mutations #247 | N/A |
molecular-subtyping-chordoma |
analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds |
In progress; identifying poorly-differentiated chordoma samples per #250 | N/A |
molecular-subtyping-embryonal |
pbta-histologies-base.tsv analyses/fusion-summary/fusion_summary_embryonal_foi.tsv pbta-sv-manta.tsv.gz analyses/focal-cn-file-preparation/consensus_seg_annotated_cn_x_and_y.tsv.gz analyses/focal-cn-file-preparation/cnvkit_annotated_cn_x_and_y.tsv.gz analyses/focal-cn-file-preparation/controlfreec_annotated_cn_x_and_y.tsv.gz analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds |
Molecular subtyping of non-medulloblastoma, non-ATRT embryonal tumors #251 | results/embryonal_tumor_molecular_subtypes.tsv |
molecular-subtyping-integrate |
pbta-histologies-base.tsv results/compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv |
Add molecular subtype information to base histology | results/pbta-histologies.tsv |
molecular-subtyping-neurocytoma |
pbta-histologies-base.tsv |
Molecular subtyping of Neurocytoma samples #805 | results/neurocytoma_subtyping.tsv |
molecular-subtyping-pathology |
analyses/molecular-subtyping-CRANIO/results/CRANIO_molecular_subtype.tsv analyses/molecular-subtyping-EPN/results/CRANIO_molecular_subtype.tsv analyses/molecular-subtyping-MB/results/MB_molecular_subtype.tsv analyses/molecular-subtyping-neurocytoma/results/neurocytoma_subtyping.tsv analyses/molecular-subtyping-EWS/results/EWS_samples.tsv analyses/molecular-subtyping-HGG/results/HGG_molecular_subtype.tsv analyses/molecular-subtyping-LGAT/results/lgat_subtyping.tsv analyses/molecular-subtyping-embryonal/results/embryonal_tumor_molecular_subtypes.tsv analyses/molecular-subtyping-chordoma/results/chordoma_smarcb1_status.tsv |
Compile output from other molecular subtyping modules and incorporate pathology feedback #645 | results/compiled_molecular_subtyping_with_clinical_feedback.tsv results/compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv results/compiled_molecular_subtypes_with_clinical_pathology_feedback_and_report_info.tsv |
mutational-signatures |
pbta-snv-consensus-mutation.maf.tsv.gz |
Performs three separate analyses of mutational signatures: 1) Analyzes COSMIC and Alexandrov et al. mutational signatures using the consensus SNV data; 2) Performs de novo signature extraction using only the WGS samples from the consensus SNV data; 3) Fits known CNS signatures to the WGS samples from the consensus SNV data | N/A |
mutect2-vs-strelka2 |
pbta-snv-mutect2.vep.maf.gz pbta-snv-strelka2.vep.maf.gz |
Deprecated; comparison of only two SNV callers, subsumed by snv-callers |
N/A |
oncoprint-landscape |
pbta-snv-consensus-mutation.maf.tsv.gz pbta-fusion-putative-oncogenic.tsv analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_x_and_y.tsv.gz independent-specimens.* |
Combines mutation, copy number, and fusion data into an OncoPrint plot (#6); will need to be updated as all data types are refined | N/A |
rna-seq-composition |
pbta-gene-expression-rsem-tpm.stranded.rds pbta-histologies.tsv pbta-mend-qc-results.tar.gz pbta-mend-qc-manifest.tsv pbta-star-log-manifest.tsv pbta-star-log-final.tar.gz |
Analyzes the fraction of read types that comprise each RNA-Seq sample; flags samples with unusual composition | N/A |
run-gistic |
pbta-histologies.tsv pbta-cnv-consensus.seg.gz |
Runs GISTIC 2.0 on SEG files | pbta-cnv-consensus-gistic.zip (included in data download) |
sample-distribution-analysis |
pbta-histologies.tsv |
Produces plots and tables that illustrate the distribution of different histologies in the PBTA data | N/A |
selection-strategy-comparison |
pbta-gene-expression-rsem-fpkm.polya.rds pbta-gene-expression-rsem-fpkm.stranded.rds |
Deprecated; Comparison of RNA-seq data from different selection strategies | N/A |
sex-prediction-from-RNASeq |
pbta-gene-expression-kallisto.stranded.rds pbta-histologies.tsv |
In progress; predicts genetic sex using RNA-seq data (#84) | N/A |
snv-callers |
pbta-snv-lancet.vep.maf.gz pbta-snv-mutect2.vep.maf.gz pbta-snv-strelka2.vep.maf.gz pbta-snv-vardict.vep.maf.gz tcga-snv-lancet.vep.maf.gz tcga-snv-mutect2.vep.maf.gz tcga-snv-strelka2.vep.maf.gz |
Generates consensus SNV and indel calls for PBTA and TCGA data; calculates tumor mutation burden using the consensus calls | results/consensus/pbta-snv-consensus-mutation.maf.tsv.gz (included in data download; too large for tracking via GitHub) results/consensus/pbta-snv-consensus-mutation-tmb-all.tsv results/consensus/pbta-snv-consensus-mutation-tmb-coding.tsv (included in data download; too large for tracking via GitHub) results/consensus/tcga-snv-consensus-mutation.maf.tsv.gz results/consensus/tcga-snv-mutation-tmb.tsv results/consensus/tcga-snv-mutation-tmb-coding.tsv |
ssgsea-hallmark |
pbta-gene-counts-rsem-expected_count.stranded.rds |
Deprecated; performs GSVA using Hallmark gene sets | N/A |
survival-analysis |
TBD | In progress; will eventually contain functions for various types of survival analysis (#18) | N/A |
telomerase-activity-prediction |
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds pbta-gene-expression-rsem-fpkm-collapsed.polya.rds pbta-gene-counts-rsem-expected_count.stranded.rds pbta-gene-counts-rsem-expected_count.polya.rds |
Quantify telomerase activity across pediatric brain tumors (part of #148) | results/TelomeraseScores_PTBAPolya_counts results/TelomeraseScores_PTBAPolya_FPKM.txt results/TelomeraseScores_PTBAStranded_counts.txt results/TelomeraseScores_PTBAStranded_FPKM.txt results/EXTENDScores_{broad_histology}.tsv |
tmb-compare |
pbta-snv-consensus-mutation-tmb-coding.tsv |
Compares PBTA tumor mutation burden to adult TCGA data. The D3B TMB calculations (TMB_d3b_code ) and its comparison notebook (compare-tmb-calculations.Rmd ) are deprecated. |
N/A |
tp53_nf1_score |
pbta-snv-consensus-mutation.maf.tsv.gz pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds pbta-gene-expression-rsem-fpkm-collapsed.polya.rds |
Applies TP53 inactivation, NF1 inactivation, and Ras activation classifiers to RNA-seq data #165 | N/A |
transcriptomic-dimension-reduction |
pbta-gene-expression-rsem-fpkm.polya.rds pbta-gene-expression-rsem-fpkm.stranded.rds pbta-gene-expression-kallisto.polya.rds pbta-gene-expression-kallisto.stranded.rds |
Dimension reduction and visualization of RNA-seq data (part of #9) | N/A |
tcga-capture-kit-investigation |
pbta-snv-lancet.vep.maf.gz pbta-snv-mutect2.vep.maf.gz pbta-snv-strelka2.vep.maf.gz tcga-snv-lancet.vep.maf.gz tcga-snv-mutect2.vep.maf.gz tcga-snv-strelka2.vep.maf.gz pbta-histologies.tsv pbta-tcga-manifest.tsv WGS.hg38.lancet.unpadded.bed WGS.hg38.strelka2.unpadded.bed WGS.hg38.mutect2.vardict.unpadded.bed |
Investigation of the TMB discrepancy between PBTA and TCGA data | results/*.bed |