-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update tp53_nf1_score module (3/11) #106
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to manually install this library txdbmaker
. May need to add to our Dockerfile and image. After installation I was able to generate the same files as @komalsrathi.
I am not sure what we expect, but polya_stranded and exome_capture plots indicate an AUC = 0, which means theres some flaw in the classification method. I am not too familiar with the classifier, but I suspect its happening because we are assigning 0 in the modified code (06-evaluate-classifier.py
). I tried to do some digging, but it seems the error that occurs indicates that there only one class present (y_true). This would make it impossible to compute ROC AUC. To handle single-class I found this:
Handle Single-Class Cases:
If you encounter a single-class situation due to an imbalanced dataset, consider using other metrics that do not require both classes, such as Precision-Recall AUC (average_precision_score), or focusing on metrics that can handle imbalance better, such as F1-score or accuracy.
@jharenza can you advise what to do in this case?
That's interesting. I think I ran all the modules on Docker and didn't face any issues. Let me check. |
I also reran this with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 127-129 script 01, Loss needs to be lowercase here @komalsrathi - can you update and rerun?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@komalsrathi this looks perfect now, same results as before, bringing in those two samples. Can you now merge this down the line into your other prs and rerun those? Thanks!
Output created: 03-tp53-cnv-loss-domain.nb.html
03 tp53
processing file: 04-tp53-sv-loss.Rmd
|............ | 24% [unnamed-chunk-2]Error in `load_package_gracefully()`:
! Could not load package txdbmaker. Is it installed?
Note that starting with BioC 3.19, calling makeTxDbFromGFF() requires
the txdbmaker package. Please install it with:
BiocManager::install("txdbmaker")
Backtrace:
1. GenomicFeatures::makeTxDbFromGFF(...)
2. GenomicFeatures:::call_fun_in_txdbmaker("makeTxDbFromGFF", ...)
3. GenomicFeatures:::load_package_gracefully(...)
Quitting from lines 42-74 [unnamed-chunk-2] (04-tp53-sv-loss.Rmd)
Execution halted I did get the same thing as @naqvia - I think we do need to add this new package to the docker image since we did the 4.4 update. |
Will update once #118 is merged. |
@komalsrathi this now runs to completion with the new docker image, however I did notice that the |
Are you talking about this file: https://github.com/d3b-center/hope-cohort-analysis/blob/update-nf1-score/analyses/tp53_nf1_score/results/sv_overlap_tp53.tsv ? |
yes, seems it is empty and affecting some losses |
Could be the same data.table issue, checking now. |
I deleted my v3 folder -> merged the master branch -> ran download script to get v3 files. As expected, only 1 file fails:
Next, I manually soft-linked v3 files to data folder. Here is my data folder:
Finally, deleted all results and plots from this module -> reran classifier -> a couple files changed but I don't see the specific file you mentioned changing. Also deleted obsolete results and plots. |
Created this branch off of update-merge branch.
Data Preparation
The
tp53_nf1_score
module uses input files fromdata
folder so I manually copied files that I generated bymerge-files
module todata/v3
and then soft-linked those files underdata/
. So, these files point tov3
:And, these point to v2:
Debugging scripts
When I was re-running the module using docker, I got a
ValueError
for theroc_auc_score
function within the06-evaluate-classifier.py
script as follows:So I added a try-catch block and assigned
0
where this error is encountered. This fixed the error but could you check the outputs if it makes sense?Additionally, there is no file
results/gene-expression-rsem-tpm-collapsed-poly-A_classifier_scores.tsv
so I had to remove the command that calls06-evaluate-classifier.py
on this file in therun_classifier.sh
script. I replaced it with the command to run onresults/gene-expression-rsem-tpm-collapsed-poly-A-stranded_classifier_scores.tsv
because that file is present.Output files
There were a lot of files not updated when running the bash script, and I am unsure why:
Similarly, a lot of plots did not get updated as well: