Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated GSVA module to include cancer_group comparison #299

Merged
merged 9 commits into from
Jan 10, 2023
Merged

Conversation

sangeetashukla
Copy link
Collaborator

@sangeetashukla sangeetashukla commented Dec 19, 2022

Purpose/implementation Section

Update gene-set-enrichment-analysis module

What scientific question is your analysis addressing?

  • Exclude harmonized_diagnosis and include broad_histology in addition to cancer_group and predictor variables
  • Rectify incorrect naming of output files for harmonized diagnosis Turkey statistics
  • include only tumor samples for all cohorts
  • exclude TCGA and GTex samples

What GitHub issue does your pull request address?

Issue 455

Is there anything that you want to discuss further?

No.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

gsva_anova_exome_capture_cancer_group.tsv
gsva_anova_polya_broad_histology.tsv
gsva_anova_polya_cancer_group.tsv
gsva_anova_polya_stranded_broad_histology.tsv
gsva_anova_polya_stranded_cancer_group.tsv
gsva_anova_stranded_broad_histology.tsv
gsva_anova_stranded_cancer_group.tsv
gsva_scores.tsv
gsva_tukey_exome_capture_cancer_group.tsv
gsva_tukey_polya_broad_histology.tsv
gsva_tukey_polya_cancer_group.tsv
gsva_tukey_polya_stranded_broad_histology.tsv
gsva_tukey_polya_stranded_cancer_group.tsv
gsva_tukey_stranded_broad_histology.tsv
gsva_tukey_stranded_cancer_group.tsv

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

if(harmonized_diagnosis_n>=2){
#if(harmonized_diagnosis_n>=2){
Copy link

@ewafula ewafula Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sangeetashukla, Why are you removing >2 cutoffs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, we can probably remove harm dx from this altogether....just keep broad hist and the cancer groups?

if(harmonized_diagnosis_n>=2){
#if(harmonized_diagnosis_n>=2){
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sangeetashukla, Why are you removing >2 cutoffs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ewafula The issue description requires "Check to confirm that ... all harmonized diagnosis ANOVA statistics don't meet the >2 cutoffs". Am I missing something? On the other hand, as @jharenza suggests, I can go ahead and remove harm_dx altogether, keeping broad_hist and cancer_group.

@ewafula
Copy link

ewafula commented Jan 4, 2023 via email

Copy link

@ewafula ewafula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for @sangeetashukla working on this!
@jharenza, I have updated the PR to exclude harmonized_diagnosis and include `broad_histology.
Questions:

  • Should the module be updated to include other cohorts? Currently subsetting for only PBTA
  • PBTA's broad_histology column includes terms such as Non-tumor and Other, which are being used as a level factor for the predictor variable (broad_histology in ANOVA and the post-hoc test, TukeyHSD tests. Should they be filtered out from histology files before the statistical tests are run?

@jharenza
Copy link
Member

jharenza commented Jan 4, 2023

Should the module be updated to include other cohorts? Currently subsetting for only PBTA

Yes, should do all here

PBTA's broad_histology column includes terms such as Non-tumor and Other, which are being used as a level factor for the predictor variable (broad_histology in ANOVA and the post-hoc test, TukeyHSD tests. Should they be filtered out from histology files before the statistical tests are run?

Yes, we can remove

@ewafula
Copy link

ewafula commented Jan 4, 2023

Should the module be updated to include other cohorts? Currently subsetting for only PBTA

Yes, should do all here

PBTA's broad_histology column includes terms such as Non-tumor and Other, which are being used as a level factor for the predictor variable (broad_histology in ANOVA and the post-hoc test, TukeyHSD tests. Should they be filtered out from histology files before the statistical tests are run?

Yes, we can remove

@jharenza, I am assuming we need to filter for tumor samples only, which is currently not implemented in the code. Do we also include adult tumors in the TCGA cohort?

@jharenza
Copy link
Member

jharenza commented Jan 4, 2023

@jharenza, I am assuming we need to filter for tumor samples only, which is currently not implemented in the code. Do we also include adult tumors in the TCGA cohort?

Yes, tumor only (exclude gtex, should not have much normal RNA-Seq in other cohorts). We can leave TCGA out for now

@ewafula ewafula merged commit ee4965d into dev Jan 10, 2023
@jharenza jharenza deleted the ss_update_gsea branch February 19, 2023 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants