Skip to content

Latest commit

 

History

History
 
 

analyses

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Analysis Modules

This directory contains various analysis modules in the OpenPBTA project. See the README of an individual analysis modules for more information about that module.

Modules at a glance

The table below is intended to help project organizers quickly get an idea of what files (and therefore types of data) are consumed by each analysis module, what the module does, and what output files it produces that can be consumed by other analysis modules. This is in service of documenting interdependent analyses. Note that nearly all modules use the harmonized clinical data file (pbta-histologies.tsv) even when it is not explicitly included in the table below.

Module Input Files Brief Description Output Files Consumed by Other Analyses
chromosomal-instability pbta-histologies.tsv
pbta-sv-manta.tsv.gz
pbta-cnv-cnvkit.seg.gz
Evaluates chromosomal instability by calculating chromosomal breakpoint densities and by creating circular plot visuals analyses/chromosomal-instability/breakpoint-data/cnv_breaks_densities.tsv
analyses/chromosomal-instability/breakpoint-data/sv_breaks_densities.tsv
chromothripsis pbta-sv-manta.tsv.gz
pbta-cnv-consensus.seg.gz
independent-specimens.wgs.primary-plus.tsv
figures/palettes/histology_label_color_table.tsv
analyses/chromosomal-instability/breakpoint-data/cnv_breaks_densities.tsv
analyses/chromosomal-instability/breakpoint-data/sv_breaks_densities.tsv
In progress; chromothripsis analysis per #1007 N/A
cnv-chrom-plot pbta-cnv-consensus-gistic.zip
analyses/copy_number_consensus_call/results/pbta-cnv-consensus.seg
Plots genome wide visualizations relating to copy number results N/A
cnv-comparison Earlier version of SEG files Deprecated; compared earlier version of the CNV methods. N/A
collapse-rnaseq pbta-gene-expression-rsem-fpkm.polya.rds
pbta-gene-expression-rsem-fpkm.stranded.rds
gencode.v27.primary_assembly.annotation.gtf.gz
Collapses RSEM FPKM matrices such that gene symbols are de-duplicated. results/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds (included in data download; too large for tracking via GitHub)
results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds (included in data download; too large for tracking via GitHub)
comparative-RNASeq-analysis pbta-gene-expression-rsem-tpm.polya.rds
pbta-gene-expression-rsem-tpm.stranded.rds
pbta-histologies.tsv
pbta-mend-qc-manifest.tsv
pbta-mend-qc-results.tar.gz
In progress; will produce expression outlier profiles per #229 N/A
compare-gistic analyses/run-gistic/results/pbta-cnv-consensus-gistic.zip
analyses/run-gistic/results/pbta-cnv-consensus-hgat-gistic.zip
analyses/run-gistic/results/pbta-cnv-consensus-lgat-gistic.zip
analyses/run-gistic/results/pbta-cnv-consensus-medulloblastoma-gistic.zip
Comparison of the GISTIC results of the entire cohort with the GISTIC results of three individual histolgies, namely, LGAT, HGAT and medulloblastoma (#547 N/A
copy_number_consensus_call pbta-cnv-cnvkit.seg.gz
pbta-cnv-controlfreec.tsv.gz
pbta-sv-manta.tsv.gz
Produces consensus copy number calls per #128 and a set of excluded regions where CNV calls are not made results/cnv_consensus.tsv
results/pbta-cnv-consensus.seg.gz (included in data download)
ref/cnv_excluded_regions.bed
ref/cnv_callable.bed
create-subset-files All files This module contains the code to create the subset files used in continuous integration All subset files for continuous integration
focal-cn-file-preparation pbta-cnv-cnvkit.seg.gz
pbta-cnv-controlfreec.tsv.gz
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
analyses/copy_number_consensus_call/results/pbta-cnv-consensus.seg.gz
Maps from copy number variant caller segments to gene identifiers; will be updated to take into account changes that affect entire cytobands, chromosome arms (#186) results/cnvkit_annotated_cn_autosomes.tsv.gz
results/cnvkit_annotated_cn_x_and_y.tsv.gz
results/controlfreec_annotated_cn_autosomes.tsv.gz
results/controlfreec_annotated_cn_x_and_y.tsv.gz
results/consensus_seg_annotated_cn_autosomes.tsv.gz (included in data download)
results/consensus_seg_annotated_cn_x_and_y.tsv.gz (included in data download)
fusion_filtering pbta-fusion-arriba.tsv.gz
pbta-fusion-starfusion.tsv.gz
Standardizes, filters, and prioritizes fusion calls results/pbta-fusion-putative-oncogenic.tsv(included in data download)
results/pbta-fusion-recurrent-fusion-byhistology.tsv (included in data download)
results/pbta-fusion-recurrent-fusion-bysample.tsv (included in data download)
fusion-summary pbta-histologies.tsv
pbta-fusion-putative-oncogenic.tsv
pbta-fusion-arriba.tsv.gz
pbta-fusion-starfusion.tsv.gz
Generate summary tables from fusion files (#398; #623) results/fusion_summary_embryonal_foi.tsv (included in data download)
results/fusion_summary_ependymoma_foi.tsv (included in data download)
results/fusion_summary_ewings_foi.tsv
gene-set-enrichment-analysis analyses/collapse-rnaseq/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
analyses/collapse-rnaseq/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
In progress. Updated gene set enrichment analysis with appropriate RNA-seq expression data results/gsva_scores_stranded.tsv
results/gsva_scores_polya.tsv
for stranded, polya expression data respectively
hotspot-detection pbta-snv-strelka2.vep.maf.gz
pbta-snv-mutect2.vep.maf.gz
pbta-snv-vardict.vep.maf.gz
pbta-snv-lancet.vep.maf.gz
Scavenges cancer any hotspot calls from each caller and merges with consensus (3/3) calls if it was missed in snv-caller workflow. pbta-snv-hotspots-mutation.maf.tsv.gz
immune-deconv pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
Immune/Stroma characterization across PBTA (part of #15) results/deconv-output.RData
independent-samples pbta-histologies.tsv Generates independent specimen lists for WGS/WXS samples results/independent-specimens.wgs.primary.tsv (included in data download)
results/independent-specimens.wgs.primary-plus.tsv (included in data download)
results/independent-specimens.wgswxs.primary.tsv (included in data download)
results/independent-specimens.wgswxs.primary-plus.tsv (included in data download)
interaction-plots independent-specimens.wgs.primary-plus.tsv
pbta-snv-consensus-mutation.maf.tsv.gz
Creates interaction plots for mutation mutual exclusivity/co-occurrence #13; may be updated to include other data types (e.g., fusions) N/A
molecular-subtyping-ATRT analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz
pbta-snv-consensus-mutation-tmb-all.tsv
pbta-cnv-consensus-gistic.zip
Summarizing data into tabular format in order to molecularly subtype ATRT samples #244; this analysis did not work N/A
molecular-subtyping-CRANIO pbta-histologies-base.tsv
pbta-snv-consensus-mutation.maf.tsv.gz
pbta-snv-scavenged-hotspots.maf.tsv.gz
Molecular subtyping of craniopharyngiomas samples #810 results/CRANIO_molecular_subtype.tsv
molecular-subtyping-EPN pbta-histologies-base.tsv
analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
pbta-cnv-consensus-gistic.zip
analyses/chromosomal-instability/breakpoint-data/union_of_breaks_densities.tsv
analyses/fusion-summary/results/fusion_summary_ependymoma_foi.tsv
analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv
In progress; molecular subtyping of ependymoma tumors results/EPN_all_data_withsubgroup.tsv
molecular-subtyping-EWS pbta-histologies-base.tsv
analyses/fusion-summary/results/fusion_summary_ewings_foi.tsv
Reclassification of tumors based on the presence of defining fusions for Ewing Sarcoma per #623 results/EWS_samples.tsv
molecular-subtyping-HGG pbta-histologies-base.tsv
pbta-snv-consensus-mutation.maf.tsv.gz
pbta-snv-scavenged-hotspots.maf.tsv.gz
analyses/focal-cn-preparation/results/cnvkit_annotated_cn_autosomes.tsv.gz
analyses/fusion_filyering/results/pbta-fusion-putative-oncogenic.tsv
pbta-cnv-consensus-gistic.zip
analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
Molecular subtyping of high-grade glioma samples #249 results/HGG_molecular_subtype.tsv
molecular-subtyping-LGAT pbta-histologies-base.tsv
pbta-snv-consensus-mutation.maf.tsv.gz
pbta-snv-scavenged-hotspots.maf.tsv.gz
analyses/fusion_filtering/results/pbta-fusion-putative-oncogenic.tsv
pbta-fusion-recurrently-fused-genes-bysample.tsv
Molecular subtyping of Low-grade astrocytic tumor samples #631 results/lgat_subtyping.tsv
molecular-subtyping-MB pbta-histologies-base.tsv
analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
Molecular classification of Medulloblastoma subtypes (part of #731) results/MB_molecular_subtype.tsv
results/MB_batchcorrected_molecular_subtype.tsv
for uncorrected and batch-corrected input matrix
molecular-subtyping-SHH-tp53 pbta-histologies
pbta-snv-consensus-mutation.maf.tsv.gz
Deprecated; Identify the SHH-classified medulloblastoma samples that have TP53 mutations #247 N/A
molecular-subtyping-chordoma analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
In progress; identifying poorly-differentiated chordoma samples per #250 N/A
molecular-subtyping-embryonal pbta-histologies-base.tsv
analyses/fusion-summary/fusion_summary_embryonal_foi.tsv
pbta-sv-manta.tsv.gz
analyses/focal-cn-file-preparation/consensus_seg_annotated_cn_x_and_y.tsv.gz
analyses/focal-cn-file-preparation/cnvkit_annotated_cn_x_and_y.tsv.gz
analyses/focal-cn-file-preparation/controlfreec_annotated_cn_x_and_y.tsv.gz
analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
Molecular subtyping of non-medulloblastoma, non-ATRT embryonal tumors #251 results/embryonal_tumor_molecular_subtypes.tsv
molecular-subtyping-integrate pbta-histologies-base.tsv
results/compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv
Add molecular subtype information to base histology results/pbta-histologies.tsv
molecular-subtyping-neurocytoma pbta-histologies-base.tsv Molecular subtyping of Neurocytoma samples #805 results/neurocytoma_subtyping.tsv
molecular-subtyping-pathology analyses/molecular-subtyping-CRANIO/results/CRANIO_molecular_subtype.tsv
analyses/molecular-subtyping-EPN/results/CRANIO_molecular_subtype.tsv
analyses/molecular-subtyping-MB/results/MB_molecular_subtype.tsv
analyses/molecular-subtyping-neurocytoma/results/neurocytoma_subtyping.tsv
analyses/molecular-subtyping-EWS/results/EWS_samples.tsv
analyses/molecular-subtyping-HGG/results/HGG_molecular_subtype.tsv
analyses/molecular-subtyping-LGAT/results/lgat_subtyping.tsv
analyses/molecular-subtyping-embryonal/results/embryonal_tumor_molecular_subtypes.tsv
analyses/molecular-subtyping-chordoma/results/chordoma_smarcb1_status.tsv
Compile output from other molecular subtyping modules and incorporate pathology feedback #645 results/compiled_molecular_subtyping_with_clinical_feedback.tsv
results/compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv
results/compiled_molecular_subtypes_with_clinical_pathology_feedback_and_report_info.tsv
mutational-signatures pbta-snv-consensus-mutation.maf.tsv.gz Performs three separate analyses of mutational signatures: 1) Analyzes COSMIC and Alexandrov et al. mutational signatures using the consensus SNV data; 2) Performs de novo signature extraction using only the WGS samples from the consensus SNV data; 3) Fits known CNS signatures to the WGS samples from the consensus SNV data N/A
mutect2-vs-strelka2 pbta-snv-mutect2.vep.maf.gz
pbta-snv-strelka2.vep.maf.gz
Deprecated; comparison of only two SNV callers, subsumed by snv-callers N/A
oncoprint-landscape pbta-snv-consensus-mutation.maf.tsv.gz
pbta-fusion-putative-oncogenic.tsv
analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz
analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_x_and_y.tsv.gz
independent-specimens.*
Combines mutation, copy number, and fusion data into an OncoPrint plot (#6); will need to be updated as all data types are refined N/A
rna-seq-composition pbta-gene-expression-rsem-tpm.stranded.rds
pbta-histologies.tsv
pbta-mend-qc-results.tar.gz
pbta-mend-qc-manifest.tsv
pbta-star-log-manifest.tsv
pbta-star-log-final.tar.gz
Analyzes the fraction of read types that comprise each RNA-Seq sample; flags samples with unusual composition N/A
run-gistic pbta-histologies.tsv
pbta-cnv-consensus.seg.gz
Runs GISTIC 2.0 on SEG files pbta-cnv-consensus-gistic.zip (included in data download)
sample-distribution-analysis pbta-histologies.tsv Produces plots and tables that illustrate the distribution of different histologies in the PBTA data N/A
selection-strategy-comparison pbta-gene-expression-rsem-fpkm.polya.rds
pbta-gene-expression-rsem-fpkm.stranded.rds
Deprecated; Comparison of RNA-seq data from different selection strategies N/A
sex-prediction-from-RNASeq pbta-gene-expression-kallisto.stranded.rds
pbta-histologies.tsv
In progress; predicts genetic sex using RNA-seq data (#84) N/A
snv-callers pbta-snv-lancet.vep.maf.gz
pbta-snv-mutect2.vep.maf.gz
pbta-snv-strelka2.vep.maf.gz
pbta-snv-vardict.vep.maf.gz
tcga-snv-lancet.vep.maf.gz
tcga-snv-mutect2.vep.maf.gz
tcga-snv-strelka2.vep.maf.gz
Generates consensus SNV and indel calls for PBTA and TCGA data; calculates tumor mutation burden using the consensus calls results/consensus/pbta-snv-consensus-mutation.maf.tsv.gz (included in data download; too large for tracking via GitHub)
results/consensus/pbta-snv-consensus-mutation-tmb-all.tsv
results/consensus/pbta-snv-consensus-mutation-tmb-coding.tsv(included in data download; too large for tracking via GitHub)
results/consensus/tcga-snv-consensus-mutation.maf.tsv.gz
results/consensus/tcga-snv-mutation-tmb.tsv
results/consensus/tcga-snv-mutation-tmb-coding.tsv
ssgsea-hallmark pbta-gene-counts-rsem-expected_count.stranded.rds Deprecated; performs GSVA using Hallmark gene sets N/A
survival-analysis TBD In progress; will eventually contain functions for various types of survival analysis (#18) N/A
telomerase-activity-prediction pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
pbta-gene-counts-rsem-expected_count.stranded.rds
pbta-gene-counts-rsem-expected_count.polya.rds
Quantify telomerase activity across pediatric brain tumors (part of #148) results/TelomeraseScores_PTBAPolya_counts
results/TelomeraseScores_PTBAPolya_FPKM.txt
results/TelomeraseScores_PTBAStranded_counts.txt
results/TelomeraseScores_PTBAStranded_FPKM.txt
results/EXTENDScores_{broad_histology}.tsv
tmb-compare pbta-snv-consensus-mutation-tmb-coding.tsv Compares PBTA tumor mutation burden to adult TCGA data. The D3B TMB calculations (TMB_d3b_code) and its comparison notebook (compare-tmb-calculations.Rmd) are deprecated. N/A
tp53_nf1_score pbta-snv-consensus-mutation.maf.tsv.gz
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds
Applies TP53 inactivation, NF1 inactivation, and Ras activation classifiers to RNA-seq data #165 N/A
transcriptomic-dimension-reduction pbta-gene-expression-rsem-fpkm.polya.rds
pbta-gene-expression-rsem-fpkm.stranded.rds
pbta-gene-expression-kallisto.polya.rds
pbta-gene-expression-kallisto.stranded.rds
Dimension reduction and visualization of RNA-seq data (part of #9) N/A
tcga-capture-kit-investigation pbta-snv-lancet.vep.maf.gz
pbta-snv-mutect2.vep.maf.gz
pbta-snv-strelka2.vep.maf.gz
tcga-snv-lancet.vep.maf.gz
tcga-snv-mutect2.vep.maf.gz
tcga-snv-strelka2.vep.maf.gz
pbta-histologies.tsv
pbta-tcga-manifest.tsv
WGS.hg38.lancet.unpadded.bed
WGS.hg38.strelka2.unpadded.bed
WGS.hg38.mutect2.vardict.unpadded.bed
Investigation of the TMB discrepancy between PBTA and TCGA data results/*.bed