Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tp53_nf1_score module (3/11) #106

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Conversation

komalsrathi
Copy link
Collaborator

@komalsrathi komalsrathi commented Jun 20, 2024

Created this branch off of update-merge branch.

Data Preparation

The tp53_nf1_score module uses input files from data folder so I manually copied files that I generated by merge-files module to data/v3 and then soft-linked those files under data/ . So, these files point to v3:

Hope-cnv-controlfreec-tumor-only.rds -> v3/Hope-cnv-controlfreec-tumor-only.rds
Hope-cnv-controlfreec.rds -> v3/Hope-cnv-controlfreec.rds
Hope-fusion-putative-oncogenic.rds -> v3/Hope-fusion-putative-oncogenic.rds
Hope-gene-expression-rsem-tpm-collapsed.rds -> v3/Hope-gene-expression-rsem-tpm-collapsed.rds
Hope-gene-expression-rsem-tpm.rds -> v3/Hope-gene-expression-rsem-tpm.rds
Hope-tumor-only-snv-mutect2.maf.tsv.gz -> v3/Hope-tumor-only-snv-mutect2.maf.tsv.gz
Hope-gene-counts-rsem-expected_count-collapsed.rds -> v3/Hope-gene-counts-rsem-expected_count-collapsed.rds
Hope-gene-counts-rsem-expected_count.rds -> v3/Hope-gene-counts-rsem-expected_count.rds
Hope-snv-consensus-plus-hotspots.maf.tsv.gz -> v3/Hope-snv-consensus-plus-hotspots.maf.tsv.gz 
Hope-GBM-histologies-base.tsv -> v3/Hope-GBM-histologies-base.tsv

And, these point to v2:

release-notes.md -> v2/release-notes.md
Hope-sv-manta.tsv.gz -> v2/Hope-sv-manta.tsv.gz
Hope-methyl-m-values.rds -> v2/Hope-methyl-m-values.rds
Hope-methyl-beta-values.rds -> v2/Hope-methyl-beta-values.rds
Hope-and-CPTAC-GBM.splice-events-rmats.tsv.gz -> v2/Hope-and-CPTAC-GBM.splice-events-rmats.tsv.gz
Hope-and-CPTAC-GBM-gene-expression-rsem-tpm-collapsed.rds -> v2/Hope-and-CPTAC-GBM-gene-expression-rsem-tpm-collapsed.rds
Hope-GBM-histologies.tsv -> v2/Hope-GBM-histologies.tsv

Debugging scripts

When I was re-running the module using docker, I got a ValueError for the roc_auc_score function within the 06-evaluate-classifier.py script as follows:

ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

So I added a try-catch block and assigned 0 where this error is encountered. This fixed the error but could you check the outputs if it makes sense?

Additionally, there is no file results/gene-expression-rsem-tpm-collapsed-poly-A_classifier_scores.tsv so I had to remove the command that calls 06-evaluate-classifier.py on this file in the run_classifier.sh script. I replaced it with the command to run on results/gene-expression-rsem-tpm-collapsed-poly-A-stranded_classifier_scores.tsv because that file is present.

Output files

There were a lot of files not updated when running the bash script, and I am unsure why:

# updated files
-rw-r--r--  1 rathik  1768498755  11920 Jun 20 17:05 stranded_TP53_roc_threshold_results_shuffled.tsv
-rw-r--r--  1 rathik  1768498755  11911 Jun 20 17:05 stranded_TP53_roc_threshold_results.tsv
-rw-r--r--  1 rathik  1768498755    219 Jun 20 17:05 exome_capture_TP53_roc_threshold_results_shuffled.tsv
-rw-r--r--  1 rathik  1768498755    225 Jun 20 17:05 exome_capture_TP53_roc_threshold_results.tsv
-rw-r--r--  1 rathik  1768498755    683 Jun 20 17:05 polya_stranded_TP53_roc_threshold_results_shuffled.tsv
-rw-r--r--  1 rathik  1768498755    698 Jun 20 17:05 polya_stranded_TP53_roc_threshold_results.tsv
-rw-r--r--  1 rathik  1768498755  16848 Jun 20 17:05 tp53_altered_status.tsv
-rw-r--r--  1 rathik  1768498755     99 Jun 20 17:04 sv_overlap_tp53.tsv
-rw-r--r--  1 rathik  1768498755    503 Jun 20 17:04 fusion_bk_tp53_loss.tsv
-rw-r--r--  1 rathik  1768498755    112 Jun 20 17:00 loss_overlap_domains_tp53.tsv
-rw-r--r--  1 rathik  1768498755  10939 Jun 20 16:58 combined_scores.tsv
-rw-r--r--  1 rathik  1768498755   2109 Jun 20 16:58 gene-expression-rsem-tpm-collapsed-poly-A-stranded_classifier_scores.tsv
-rw-r--r--  1 rathik  1768498755    590 Jun 20 16:58 gene-expression-rsem-tpm-collapsed-exome_capture_classifier_scores.tsv
-rw-r--r--  1 rathik  1768498755  20711 Jun 20 16:57 gene-expression-rsem-tpm-collapsed-stranded_classifier_scores.tsv
-rw-r--r--  1 rathik  1768498755   9338 Jun 20 16:56 TP53_NF1_snv_alteration.tsv

# not updated
-rw-r--r--  1 rathik  1768498755  57694 Jan 10 10:47 tp53_score+reported_gender+mol_wo_tp53+binned_age.RDS
-rw-r--r--  1 rathik  1768498755  52425 Jan 10 10:47 tp53_score+reported_gender+mol_subtype_without_tp53+binned_age.RDS
-rw-r--r--  1 rathik  1768498755  58122 Jan 10 10:47 tp53_score+reported_gender+mol_subtype+binned_age.RDS
-rw-r--r--  1 rathik  1768498755  50948 Jan 10 10:47 tp53_score+reported_gender+broad_mol+binned_age.RDS
-rw-r--r--  1 rathik  1768498755  30707 Jan 10 10:47 HGG_H3WT_score+reported_gender+molecular_subtype+binned_age.RDS
-rw-r--r--  1 rathik  1768498755  23594 Jan 10 10:47 DGM_score+reported_gender+molecular_subtype+binned_age.RDS
-rw-r--r--  1 rathik  1768498755  24808 Jan 10 10:47 DGM_score+reported_gender+molecular_subtype*tp53+binned_age.RDS

Similarly, a lot of plots did not get updated as well:

# updated
-rw-r--r--  1 rathik  1768498755   30531 Jun 20 17:05 stranded_TP53_roc.png
-rw-r--r--  1 rathik  1768498755   24199 Jun 20 17:05 exome_capture_TP53_roc.png
-rw-r--r--  1 rathik  1768498755   24199 Jun 20 17:05 polya_stranded_TP53_roc.png

# not updated
-rw-r--r--  1 rathik  1768498755    6286 Jan 10 10:47 KM_reported_gender.pdf
-rw-r--r--  1 rathik  1768498755    7122 Jan 10 10:47 KM_race.pdf
-rw-r--r--  1 rathik  1768498755    7278 Jan 10 10:47 KM_mol_subtype_without_tp53.pdf
-rw-r--r--  1 rathik  1768498755    6783 Jan 10 10:47 KM_broad_mol.pdf
-rw-r--r--  1 rathik  1768498755    6860 Jan 10 10:47 KM_binned_tp53.pdf
-rw-r--r--  1 rathik  1768498755    6633 Jan 10 10:47 KM_binned_age.pdf
-rw-r--r--  1 rathik  1768498755  231538 Jan 10 10:47 HR_molecular_subtype.png
-rw-r--r--  1 rathik  1768498755  201062 Jan 10 10:47 HR_mol_subtype_without_tp53.png
-rw-r--r--  1 rathik  1768498755  176529 Jan 10 10:47 HR_broad_mol_subtype.png
-rw-r--r--  1 rathik  1768498755  159420 Jan 10 10:47 HR_HGG_WT.png
-rw-r--r--  1 rathik  1768498755  188041 Jan 10 10:47 HR_DGM_tp53_interaction.png
-rw-r--r--  1 rathik  1768498755  143568 Jan 10 10:47 HR_DGM.png
-rw-r--r--  1 rathik  1768498755   31620 Sep 11  2023 polya_TP53_roc.png

@komalsrathi komalsrathi self-assigned this Jun 20, 2024
@komalsrathi komalsrathi changed the base branch from master to update-merge July 15, 2024 17:40
@komalsrathi komalsrathi changed the title Update tp53_nf1_score module Update tp53_nf1_score module (3/11) Jul 15, 2024
@jharenza jharenza requested a review from naqvia August 16, 2024 16:57
Copy link

@naqvia naqvia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to manually install this library txdbmaker. May need to add to our Dockerfile and image. After installation I was able to generate the same files as @komalsrathi.

I am not sure what we expect, but polya_stranded and exome_capture plots indicate an AUC = 0, which means theres some flaw in the classification method. I am not too familiar with the classifier, but I suspect its happening because we are assigning 0 in the modified code (06-evaluate-classifier.py). I tried to do some digging, but it seems the error that occurs indicates that there only one class present (y_true). This would make it impossible to compute ROC AUC. To handle single-class I found this:

Handle Single-Class Cases:
If you encounter a single-class situation due to an imbalanced dataset, consider using other metrics that do not require both classes, such as Precision-Recall AUC (average_precision_score), or focusing on metrics that can handle imbalance better, such as F1-score or accuracy.

@jharenza can you advise what to do in this case?

@komalsrathi
Copy link
Collaborator Author

I had to manually install this library txdbmaker

That's interesting. I think I ran all the modules on Docker and didn't face any issues. Let me check.

@komalsrathi
Copy link
Collaborator Author

I also reran this with the data.table::fread's tmpdir fix, luckily doesn't change any output files.

Copy link
Member

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 127-129 script 01, Loss needs to be lowercase here @komalsrathi - can you update and rerun?

Copy link
Member

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@komalsrathi this looks perfect now, same results as before, bringing in those two samples. Can you now merge this down the line into your other prs and rerun those? Thanks!

@jharenza jharenza changed the base branch from update-merge to master August 20, 2024 18:19
@jharenza
Copy link
Member

jharenza commented Aug 20, 2024

Output created: 03-tp53-cnv-loss-domain.nb.html
03 tp53


processing file: 04-tp53-sv-loss.Rmd
  |............                                        |  24% [unnamed-chunk-2]Error in `load_package_gracefully()`:
! Could not load package txdbmaker. Is it installed?

  Note that starting with BioC 3.19, calling makeTxDbFromGFF() requires
  the txdbmaker package. Please install it with:

    BiocManager::install("txdbmaker")
Backtrace:
 1. GenomicFeatures::makeTxDbFromGFF(...)
 2. GenomicFeatures:::call_fun_in_txdbmaker("makeTxDbFromGFF", ...)
 3. GenomicFeatures:::load_package_gracefully(...)

Quitting from lines 42-74 [unnamed-chunk-2] (04-tp53-sv-loss.Rmd)
                                                                                                            
Execution halted

I did get the same thing as @naqvia - I think we do need to add this new package to the docker image since we did the 4.4 update.

@komalsrathi komalsrathi mentioned this pull request Aug 20, 2024
@komalsrathi
Copy link
Collaborator Author

Will update once #118 is merged.

@jharenza
Copy link
Member

@komalsrathi this now runs to completion with the new docker image, however I did notice that the sv_overlap_tp53.tsv file is now empty. This should come from the manta sv file, can you check the script pulling SVs?

@komalsrathi
Copy link
Collaborator Author

@jharenza
Copy link
Member

Are you talking about this file: https://github.com/d3b-center/hope-cohort-analysis/blob/update-nf1-score/analyses/tp53_nf1_score/results/sv_overlap_tp53.tsv ?

yes, seems it is empty and affecting some losses

@komalsrathi
Copy link
Collaborator Author

Could be the same data.table issue, checking now.

@komalsrathi
Copy link
Collaborator Author

I deleted my v3 folder -> merged the master branch -> ran download script to get v3 files. As expected, only 1 file fails:

Checking MD5 hashes...
Hope-and-CPTAC-GBM-gene-expression-rsem-tpm-collapsed.rds: OK
Hope-and-CPTAC-GBM.splice-events-rmats.tsv.gz: OK
Hope-GBM-histologies.tsv: FAILED
Hope-methyl-beta-values.rds: OK
Hope-methyl-m-values.rds: OK
release-notes.md: OK
Hope-sv-manta.tsv.gz: OK
Hope-GBM-histologies-base.tsv: OK
Hope-cnv-controlfreec-tumor-only.rds: OK
Hope-cnv-controlfreec.rds: OK
Hope-fusion-putative-oncogenic.rds: OK
Hope-gene-counts-rsem-expected_count-collapsed.rds: OK
Hope-gene-counts-rsem-expected_count.rds: OK
Hope-gene-expression-rsem-tpm-collapsed.rds: OK
Hope-gene-expression-rsem-tpm.rds: OK
Hope-snv-consensus-plus-hotspots.maf.tsv.gz: OK
Hope-tumor-only-snv-mutect2.maf.tsv.gz: OK
md5sum: WARNING: 1 of 17 computed checksums did NOT match

Next, I manually soft-linked v3 files to data folder. Here is my data folder:

(base) ➜  d3b-hope-cohort-analysis git:(update-nf1-score) ls -lt data 
total 356736
lrwxr-xr-x   1 rathik  1768498755         30 Aug 21 15:52 Hope-methyl-beta-values.rds -> v3/Hope-methyl-beta-values.rds
lrwxr-xr-x   1 rathik  1768498755         28 Aug 21 15:52 Hope-cnv-controlfreec.rds -> v3/Hope-cnv-controlfreec.rds
lrwxr-xr-x   1 rathik  1768498755         36 Aug 21 15:52 Hope-gene-expression-rsem-tpm.rds -> v3/Hope-gene-expression-rsem-tpm.rds
lrwxr-xr-x   1 rathik  1768498755         39 Aug 21 15:52 Hope-cnv-controlfreec-tumor-only.rds -> v3/Hope-cnv-controlfreec-tumor-only.rds
lrwxr-xr-x   1 rathik  1768498755         41 Aug 21 15:52 Hope-tumor-only-snv-mutect2.maf.tsv.gz -> v3/Hope-tumor-only-snv-mutect2.maf.tsv.gz
lrwxr-xr-x   1 rathik  1768498755         46 Aug 21 15:52 Hope-gene-expression-rsem-tpm-collapsed.rds -> v3/Hope-gene-expression-rsem-tpm-collapsed.rds
lrwxr-xr-x   1 rathik  1768498755         48 Aug 21 15:52 Hope-and-CPTAC-GBM.splice-events-rmats.tsv.gz -> v3/Hope-and-CPTAC-GBM.splice-events-rmats.tsv.gz
lrwxr-xr-x   1 rathik  1768498755         23 Aug 21 15:52 Hope-sv-manta.tsv.gz -> v3/Hope-sv-manta.tsv.gz
lrwxr-xr-x   1 rathik  1768498755         43 Aug 21 15:52 Hope-gene-counts-rsem-expected_count.rds -> v3/Hope-gene-counts-rsem-expected_count.rds
lrwxr-xr-x   1 rathik  1768498755         60 Aug 21 15:52 Hope-and-CPTAC-GBM-gene-expression-rsem-tpm-collapsed.rds -> v3/Hope-and-CPTAC-GBM-gene-expression-rsem-tpm-collapsed.rds
lrwxr-xr-x   1 rathik  1768498755         46 Aug 21 15:52 Hope-snv-consensus-plus-hotspots.maf.tsv.gz -> v3/Hope-snv-consensus-plus-hotspots.maf.tsv.gz
lrwxr-xr-x   1 rathik  1768498755         53 Aug 21 15:52 Hope-gene-counts-rsem-expected_count-collapsed.rds -> v3/Hope-gene-counts-rsem-expected_count-collapsed.rds
lrwxr-xr-x   1 rathik  1768498755         27 Aug 21 15:52 Hope-GBM-histologies.tsv -> v3/Hope-GBM-histologies.tsv
lrwxr-xr-x   1 rathik  1768498755         27 Aug 21 15:52 Hope-methyl-m-values.rds -> v3/Hope-methyl-m-values.rds
lrwxr-xr-x   1 rathik  1768498755         37 Aug 21 15:52 Hope-fusion-putative-oncogenic.rds -> v3/Hope-fusion-putative-oncogenic.rds
lrwxr-xr-x   1 rathik  1768498755         32 Aug 21 15:52 Hope-GBM-histologies-base.tsv -> v3/Hope-GBM-histologies-base.tsv
drwxr-xr-x  20 rathik  1768498755        640 Aug 21 15:47 v3
-rw-r--r--   1 rathik  1768498755  131420160 Aug 20 19:36 txdb_from_gencode.v39.gtf.db
drwxr-xr-x  21 rathik  1768498755        672 Aug 17 08:40 tmp
drwxr-xr-x  20 rathik  1768498755        640 Aug 16 16:21 v2
-rw-r--r--   1 rathik  1768498755   47572349 Sep 15  2023 gencode.v39.primary_assembly.annotation.gtf.gz
drwxr-xr-x  20 rathik  1768498755        640 Sep 15  2023 v1

Finally, deleted all results and plots from this module -> reran classifier -> a couple files changed but I don't see the specific file you mentioned changing. Also deleted obsolete results and plots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants