Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update create subset files to enrich for rnaseq batch correction #297

Merged
merged 8 commits into from
Jan 7, 2023

Conversation

ewafula
Copy link

@ewafula ewafula commented Dec 6, 2022

Purpose/implementation Section

What scientific question is your analysis addressing?

Update create-subset-files module to enrich expression RDS files with samples suitable to fully test the rnaseq batch correction module

What was your approach?

1). Create a script, 00-enrich-batch-correction-examples.Rmd to determine a list of bs_ids required for tumor-only and tumor-normal RNA-Seq batch correction analyses as described in the issue ticket and discussion
2) Include randomly selected samples from each category (subtypes, cancer groups, and normal subgroups)in the RNA-Seq RDS subset files

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

  • Please use the updated v11 histologies files available in the D3b-code repo to test the 00-enrich-batch-correction-examples.Rmd script in the module. The updated histologies files include MYCN status values in the molecular_subtype column (mostly from the clinical data while the mycn-molecular-subtyping module is being reviewed) required for selecting Neuroblastoma samples for batch
  • @aadamk, please use the updated RDS count matrix to test the batch correction module still in PR before we can upload it to the s3 bucket for CI testing using GitHub Actions.

Is there anything that you want to discuss further?

Sample selected:

  • TARGET NBL samples that are with MYCN status are all poly-A RNA_Seq libraries.
    • poly-A stranded and stranded library types don't have clinical MYCN status in v11
    • randomly selected 5 MYCN amp and 5 MYCN non-amp
  • PBTA DMG cancer_group samples have both poly-A and stranded RNA_Seq libraries
    • randomly selected 5 poly-A and 5 stranded
  • PBTA HGG cancer_group samples have both poly-A and stranded RNA_Seq libraries
    • randomly selected 5 poly-A and 5 stranded
  • GTEx samples are poly-A RNA_Seq libraries
    • randomly selected gtex_subgroup- 10 Brain Cortex and 10 Brain Cerebellum

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What is your summary of the results?

  • OpenPedCan subset files for CI testing
  • all the subset files should be available in ../../data/testing/v11

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@ewafula
Copy link
Author

ewafula commented Dec 6, 2022

@aadamk, I have attached the subset counts matrix to quickly test the batch correction module without rerunning the updated code in this PR, which takes a bit of time to complete and can only run successfully on EC2 with sufficient resources.

gene-counts-rsem-expected_count-collapsed.rds.zip

@aadamk
Copy link

aadamk commented Dec 12, 2022

Thank you @ewafula . I am ooo this week but will test when I return next Monday

Copy link

@aadamk aadamk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just tested the above file in the batch correction module and it is confirmed to run. looks ready to merge.

@jharenza jharenza merged commit e9087cb into dev Jan 7, 2023
@jharenza jharenza mentioned this pull request Jan 7, 2023
5 tasks
@jharenza jharenza deleted the update-create-subset-files branch February 19, 2023 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants