Molecular subtyping nbl #264

adilahiri · 2022-10-06T20:06:33Z

Purpose/implementation Section

What scientific question is your analysis addressing?

To molecularly subtype neuroblastoma, ganglioneuroblastoma, and ganglioneuroma samples into MYCN amplified or MYCN non-amplified.

What was your approach?

To obtain the NBL samples, we first filtered the histology file based on pathology_free_text_diagnosis , sample, and
experimental_strategy. We only consider the following values in each column

Pathology_Diagnosis	sample_type	experimental_strategy
Neuroblastoma	Tumor	WGS
Ganglioneuroblastoma		WXS
Ganglioneuroblastoma, nodular		Targeted Sequencing
Ganglioneuroblastoma, intermixed		RNA-Seq
Ganglioneuroma, maturing subtype OR Ganglioneuroblastoma, well differentiated

Next we filter the consensus_wgs_plus_cnvkit_wxs.tsv.gz and gene-expression-rsem-tpm-collapsed.rds for the gene symbol MYCN and then join it with the filtered histology file. We use this composite file to get the DNA and RNA biospecimen IDs for the records and then subtype them based on the following criteria:

Subtyping criteria:

case 1:
If pathology_free_text_diagnosis is amplified and status is amplified assign subtype as NBL, MYCN amplified

case 2:
If pathology_free_text_diagnosis is non-amp and status is amplified assign subtype as NBL, MYCN amplified

case 3:
If pathology_free_text_diagnosis is non-amp and status is non-amp assign subtype as NBL, MYCN non-amplified

case 4:
If pathology_free_text_diagnosis is amplified and status is non-amp
For such samples if there exists a TPM value, evaluate if the TPM is above or below the Suggested_Cutoff established in the image results/TPM_Biospecimen_All_Samples_With_TMP.png assign subtype as NBL, MYCN amplified or NBL, MYCN non-amplified respectively. In case there is no TPM values then assign the subtype as NA.

case 5:
If there are samples that are not yet subtyped but have a TPM value, assign them a subtype based on the Suggested_Cutoff.

case 6:
Other remaining samples are not subtyped and the subtype field is left as NA.

What GitHub issue does your pull request address?

Issue#417

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

The code and logic are explained throughout the 00-Analysis-RMD file , additional information is also provided in the module README. Please review the data filtering steps in Lines 77-224 and Lines 257-329. Basically, we would like to make sure we are not missing any NBL samples.

In the plot plot/TPM_Biospecimen_All_Samples_With_TMP.png, we establish a Suggested_Cutoff for TPM values, we use this value for subtyping samples that fall under case 4 and 5. Please ensure if this cutoff is appropriate.

Also review the results in the table NBL_MYCN_Subtype.tsv and the QC
results in QC_table.tsv

Is there anything that you want to discuss further?

When finding samples that have both DNA and RNA IDs we encountered the following 2 issues with repeating records:

Some of the biospecimen have same DNA and RNA IDs but differing copy numbers and status as mentioned in the following Issue#436 and comment , for the cases mentioned in the issue we retained the record with higher copy number.
In addition we also found, two other repeating records with same DNA and RNA ID but differing aliquot_id
These records are

DNA_ID	RNA_ID	aliquot_id
BS_0XC02E11	BS_2HM4AE24	ET_FD9T78QE_DGD_STNGS_29
BS_0XC02E11	BS_2HM4AE24	ET_FD9T78QE_DGD_STNGS_64
BS_1F8J25Q1	BS_2HM4AE24	ET_FD9T78QE_DGD_STNGS_29
BS_1F8J25Q1	BS_2HM4AE24	ET_FD9T78QE_DGD_STNGS_64

We retained the copies with aliquot_id ET_FD9T78QE_DGD_STNGS_64, to tackle the issue of duplicates. Please review this and provide your feedback. The issues 1 and 2 are tackled in lines 196-224 in the 00-Analysis-RMD file.

results/Alteration_Table.tsv: This table is similar to the table NBL_MYCN_Subtype.tsv. However, this table has
additional columns which contain information on MYCN_TPM, copy_number, status, and pathology_free_text_diagnosis. Furthermore, the column subtype in this table provides more insights into the samples in NBL_MYCN_Subtype.tsv which had a subtype NA (samples which could not be subtyped). If samples fell into case 4 but didn't have a TPM value, those are subtyped as Pathology-amp,Status-non-amp,TPM-NA. If a sample did not fall into any of the above cases they are subtyped as Unclassified due to insufficient info.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

Tables and Figures

What is your summary of the results?

The table NBL_MYCN_Subtype.tsv has 1168 samples of which only 509 were assigned a subtype. 107 of the samples are non-amplified, 402 are amplified and the rest could not be subtyped due to missing information.

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
[X ] This analysis has been added to continuous integration.

Documentation Checklist

[X ] This analysis module has a README and it is up to date.
[X ] This analysis is recorded in the table in analyses/README.md and the entry is up to date.
[ X] The analytical code is documented and contains comments.

q

… call

…he table

adilahiri · 2023-01-12T18:23:53Z

@jharenza: Thank you for your feedback. I have cleaned up the result folder and moved some of the intermediate files (tables) to the input directory. These intermediate files can deleted if required. I have further updated the module readme and the individual scripts to include information regarding MYCN being on 2p, TPM cutoff and qc checks. The NA subtypes are relabeled to NBL, to be classified.

@ewafula: Thank you for pointing out the GA errors and your feedback.

analyses/molecular-subtyping-NBL/00-subset-for-NBL.Rmd

- also condensed code in script 01-subset-for-NBL

ewafula · 2023-02-09T08:50:15Z

@jharenza, @afarrel, this is ready to review.

afarrel

Thanks for working on this. Code looks much cleaner, results look reasonable after recent changes, the added code to fix the previously discussed issues looks okay. the code ran well on EC2.

adilahiri added 3 commits September 30, 2022 01:09

Filtering MYCN with respect to diagnosis

cd47b1e

Adding script

18818da

exploring the input files and independent_rna_samples

c910129

adilahiri added the work in progress label Oct 6, 2022

adilahiri added 9 commits October 7, 2022 03:17

Working on matching DNA/RNA

e8dcf40

initial README

9bd9ed3

Creating matched samples

f3046c2

cleaned the function for matching

fb764e9

IDs not yielding match with cnv datasets

0af5091

Issue with match IDs persist

a9e8ecc

No RNA-Seq data ??

e2e402f

No RNA SEQ Samples

fe0d707

q

update the branch

e77326e

adilahiri linked an issue Oct 17, 2022 that may be closed by this pull request

Proposed Analysis: Neuroblastoma (NBL) molecular subtyping d3b-center/ticket-tracker-OPC#417

Closed

adilahiri added 16 commits October 17, 2022 17:03

Adding the html file

46bbfed

updated the issue with replicates,and created plot

6347fa9

Added some of the subtypes

9555988

Used Histology-base file

836c743

add focal amp plots

4d4ba69

Added code to handle case 2 of step 4

d03bdfe

Added code to handle loss and gain

ee822a1

Add the code for handling pathology NA cases and Non amplified status…

d4733da

… call

generate new alteration table for cases when pathology text is NA

bf8d874

Update the image

8cf09a2

group image by status calls

0d121e6

Rearranged the barplot by asc, added extra info from input files

816f838

Print cases when clinical file says amp but status call says non amp

c953c30

Print plots for case when clincal is amp and status is non amp

b7f4561

Arranged the final plot in asc

5244882

Add the plot with horizontal line and its corresponding code, write t…

f5fba38

…he table

adilahiri and others added 10 commits January 11, 2023 07:35

split ggplt cmd and ggsave

2182590

remove ggsave and check if CI works

94b22cc

Split code to further analyze GA errors

11e6052

Check for empty rows of data in GA

4844bfe

Adding the code chunk to check length for CI

1c1d1dc

update the readme

a7af0b7

regorgarnize module content

cada427

clean script 03 and 04.

c6df805

rerun module

876ae55

Merge branch 'dev' into molecular-subtyping-NBL

55257b0

ewafula and others added 2 commits January 26, 2023 13:59

Merge branch 'dev' into molecular-subtyping-NBL

c70efbc

Merge branch 'dev' into molecular-subtyping-NBL

b811693

jharenza reviewed Feb 1, 2023

View reviewed changes

analyses/molecular-subtyping-NBL/00-subset-for-NBL.Rmd Outdated Show resolved Hide resolved

jharenza added 3 commits February 1, 2023 21:43

separate subset script to use terms and renumber scripts

1f93517

- also condensed code in script 01-subset-for-NBL

condense code in scripts 01/02, rerun

9e2d232

Merge branch 'dev' into molecular-subtyping-NBL

70dfd8b

jharenza mentioned this pull request Feb 4, 2023

Updated analysis: Update mol subtyping integrate to include ATRT + NBL d3b-center/ticket-tracker-OPC#504

Closed

ewafula added 3 commits February 5, 2023 01:37

clean up and reorganize Rmd 05-04

1b2b783

clean up and reorganize Rmd 05-04

72ad08e

clean up, updated and reorganize Rmd 01-03

a83f4ad

ewafula and others added 2 commits February 9, 2023 10:12

Merge branch 'dev' into molecular-subtyping-NBL

9faaf02

replaced pivot_longer with deprecated gather to work in OPC docker

36e87d2

zzgeng mentioned this pull request Feb 9, 2023

Updated analysis: Update mol subtyping integrate to include ATRT + NBL #315

Merged

5 tasks

afarrel approved these changes Feb 10, 2023

View reviewed changes

jharenza merged commit 6b30d21 into dev Feb 10, 2023

This was referenced Feb 13, 2023

Proposed Analysis: MYCN AMP/Gain Thresholding using SNP Array data d3b-center/ticket-tracker-OPC#233

Closed

Updated analysis: assess CNV amplification and deep deletion threshold in TARGET and GMKF data d3b-center/ticket-tracker-OPC#113

Closed

jharenza deleted the molecular-subtyping-NBL branch February 19, 2023 02:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Molecular subtyping nbl #264

Molecular subtyping nbl #264

adilahiri commented Oct 6, 2022 •

edited

Loading

adilahiri commented Jan 12, 2023

ewafula commented Feb 9, 2023

afarrel left a comment

Molecular subtyping nbl #264

Molecular subtyping nbl #264

Conversation

adilahiri commented Oct 6, 2022 • edited Loading

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

adilahiri commented Jan 12, 2023

ewafula commented Feb 9, 2023

afarrel left a comment

Choose a reason for hiding this comment

adilahiri commented Oct 6, 2022 •

edited

Loading