Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: retrieve additional missing data from QC_Report.xlsx #307

Merged
merged 4 commits into from
Jul 11, 2024

Conversation

jaamarks
Copy link
Collaborator

@jaamarks jaamarks commented Jul 11, 2024

This PR fixes missing data in the SAMPLE_QC tab of the QC_Report.xlsx. In particular, it implements the logic to capture data for the "Count_of_QC_Issue" and "Sample Pass QC" columns. It also removes redundant columns "Expected Replicate" and "IdatsInProjectDir" from the tab.


Fixes #306

jaamarks added 4 commits July 9, 2024 12:18
- Removed "IdatsInProjectDir" from _SAMPLE_QC_COLUMNS and QC_Report.xlsx.
- The "IdatsInProjectDir" column was empty in QC_Report.xlsx because it
  is not present in sample_qc.csv.
- The equivalent column "is_missing_idats" is already included in
  QC_Report.xlsx, containing the same information from cgr_sample_sheet.csv.
- Therefore, the "IdatsInProjectDir" column has been removed to avoid
  redundancy.
Removes "Expected Replicate" from _SAMPLE_QC_COLUMNS and thus the
QC_Report.xlsx. It duplicates the more informative "Replicate IDs"
column in the QC_Report.xlsx
- Introduces functionality to calculate the total number of QC issues
  present in a sample.
- Addresses missing data in the "Count_of_QC_Issue" column of
  QC_Report.xlsx.
- Calculates the number of "TRUE" occurrences in five boolean columns:
    "Low Call Rate",
    "Contaminated",
    "Expected Replicate Discordance",
    "Unexpected Replicate",
    and "Sex Discordant".
- Fills in missing data for "Sample Pass QC" column in QC_Report.xlsx.
- Populates the column with a boolean value indicating if a sample passed all QC metrics:
       "Low Call Rate",
       "Contaminated",
       "Expected Replicate Discordance",
       "Unexpected Replicate",
       and "Sex Discordant".
- A sample passes QC if `Count_of_QC_Issue` is 0 (no QC issues).
- Otherwise, the sample fails and the column is set to False.
@jaamarks jaamarks force-pushed the issue-306-handle-missing-data branch from 3f6653c to 16f3586 Compare July 11, 2024 13:22
@jaamarks jaamarks merged commit 2bd2e41 into default Jul 11, 2024
2 checks passed
@jaamarks jaamarks deleted the issue-306-handle-missing-data branch July 11, 2024 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle Missing Data in QC_Report.xlsx (Follow up to #303)
1 participant