Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CI testing data to include methyl matrices and gatk cn subsets #310

Merged
merged 7 commits into from
Jan 31, 2023

Conversation

ewafula
Copy link

@ewafula ewafula commented Jan 28, 2023

Purpose/implementation Section

What scientific question is your analysis addressing?

Update create-subset-files module to create methylation subset beta-values(methyl-beta-values.rds), m-values (methyl-m-values.rds), cn-values (methyl-cn-values.rds), and cnv-gatk(cnv-gatk.seg.gz) for CI testing data

What was your approach?

1). Created a script to select a list of sample IDs from all three methylation matrices to create a CI testing subset datasets
2). Using the histologies file and independent samples lists, randomly selected a list of 5 methylation sample IDs from 850K (CBTN) and 450K (TARGET) arrays each and corresponding RNA-Seq samples for patients who have both datasets
3). Included the selected 10 methylation sample IDs in the subset data sets to enrich for methylation samples
4). Included the selected 10 RNA-Seq sample IDs in the subset data sets to enrich for RNA-Seq samples with methylation data
5). updated the copy_number_consensus_call and the methylation-summary modules to read input data files from the release data/ directory
6). Uploaded the updated CI testing subset files to the s3 bucket - s3://d3b-openaccess-us-east-1-prd-pbta/open-targets/testing/

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Is there anything that you want to discuss further?

NA

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

YES

Results

What types of results are included (e.g., table, figure)?

CI test subset data files

What is your summary of the results?

d3b-center/ticket-tracker-OPC#493

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@jharenza
Copy link
Member

@ewafula I downloaded the testing files and all md5sums check out. Merging dev back in - could have been a wonky thing with GA, so let's see what happens with a rerun.

@jharenza jharenza self-requested a review January 31, 2023 01:09
Copy link
Member

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized I did not push these comments..

@jharenza
Copy link
Member

CN modules are down to ~5 minutes 🎉 , but some other checks failed..

Copy link
Member

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants