Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pre-release data release QC #334

Merged
merged 25 commits into from
May 1, 2023
Merged

Update pre-release data release QC #334

merged 25 commits into from
May 1, 2023

Conversation

ewafula
Copy link

@ewafula ewafula commented Mar 6, 2023

Purpose/implementation Section

What scientific question is your analysis addressing?

Update harmonization data release QC.

What was your approach?

Determine biospecimen IDs not concordant between histologies base and harmonization data released on the v12 s3 bucket.

What GitHub issue does your pull request address?

NA

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Review to check the logic of determining differences between histologies base and release data files.

Is there anything that you want to discuss further?

  • Only results for check that produces difference are reported
  • the module currently QCs harmonized data files provided by @zhangb1 group
  • module to be updated later to include an optional QC for files generated by the pre-release OPC analysis modules.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

Tables

What is your summary of the results?

NOTE: QC result file names are self-informative, containing the histologies string and the data files being compared in the names, i.e., histologies-samples-missing-in-<data-file-name>.tsv or <data-file-name>-missing-in-histologies.tsv

fusion-annoFuse-samples-missing-in-histologies.tsv
fusion-starfusion-samples-missing-in-histologies.tsv
genes_not_in_all_gene_expression_matrices.tsv
histologies-samples-missing-in-cnv-gatk.tsv
histologies-samples-missing-in-snv-consensus-plus-hotspots.tsv
snv-column-class-diffs-consensus-vs-dgd.tsv

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

Copy link
Member

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ewafula I went through and requested some changes. As per the ticket, I think the major issues are inclusion of samples not in histologies and the MAF and splicing matrices not having all relevant samples.

analyses/data-pre-release-qc/01-data-harmonization-qc.Rmd Outdated Show resolved Hide resolved
analyses/data-pre-release-qc/01-data-harmonization-qc.Rmd Outdated Show resolved Hide resolved
analyses/data-pre-release-qc/01-data-harmonization-qc.Rmd Outdated Show resolved Hide resolved
analyses/data-pre-release-qc/01-data-harmonization-qc.Rmd Outdated Show resolved Hide resolved
analyses/data-pre-release-qc/01-data-harmonization-qc.Rmd Outdated Show resolved Hide resolved
analyses/data-pre-release-qc/01-data-harmonization-qc.Rmd Outdated Show resolved Hide resolved
ewafula and others added 23 commits March 7, 2023 11:12
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
@jharenza jharenza merged commit c16ff6f into dev May 1, 2023
@sangeetashukla sangeetashukla deleted the data-pre-release-qc branch May 3, 2023 15:25
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants