-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BCF entry point with intensity and contamination checks using BCF for data_catalog usage. #314
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rajwanir
commented
Aug 16, 2024
- Creates BCF entry point for data catalog.
- Makes BPM manifest optional (relevant for aggregated file inputs e.g. BCF, VCF or BED)
- Adds scripts to compute median intensity and contamination score from BCF.
- Separates the intensity checks from contamination checks.
- No observable difference with standard GTC input.
This comment was marked as resolved.
This comment was marked as resolved.
jaamarks
reviewed
Sep 11, 2024
jaamarks
reviewed
Sep 11, 2024
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
rajwanir
force-pushed
the
data_catalog
branch
from
September 13, 2024 18:09
f7a7396
to
270e105
Compare
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
jaamarks
reviewed
Sep 13, 2024
jaamarks
reviewed
Sep 13, 2024
Avoids dependency of bpm to name allele B frequencies (abf) file.
to separate median idat intensity retrieval from verifyIDintensity bundled into contamination.smk
…cks with vcf entry
Modifis the entry_points.smk to create BCF entry point by simply converting BCF to plink BED. Testing and validation yet to be done.
A previous commit puts them in a separate idat_intensity.smk
Modifies contamination.smk and grouped_contamination.py to enable contamination check in cluster mode.
Avoids the dependency on IDAT files for calculating median intensity with VCF/BCF input. Adds scripts for both in per-sample and grouped/cluster mode. Modifies the intensity workflow to execute appropriately if VCF/BCF input.
…kefile and sample_qc subworkflow Other than existing 'use_contamination' checks, also adds 'intensity_retreived' and 'contamination_checked' tests which simply tests specifically if output csv files were created regardless of configs/entry point to feed them to sample_qc.
… check is default
Renames from vcf_file to bcf_file explictly indicate that bcf is input.
snakemake params were imported through named import. Changed to import all params through a loop in unnamed fashion. Allows seemless compatibility when gtc or bcf entry point is used.
The starting few lines were duplicated in the entry_points in copy/paste. This removes the duplicated lines.
…r contamination checks. Previously GC_SCORE was added to the adpc.bin which had depenency that a cluster egt file had to be used in preparation of vcf/bcf. IGC score is encoded in gtc so doesn't depended on cluster egt file. The conamination scores should also be more similar with the gtc input.
Consistent with vcf2adpc.py. Now both should IGC and work with vcf/bcf prepared with gtc2vcf workflow.
…ontamatination.py
Earliar gentrain score was used to mark AF as NA if score is negative. This change excludes using it to ensure compatibility with gtc2vcf workflow prepared bcf.
…ing with BCF input.
…processing for both IDAT and BCF input.
rajwanir
force-pushed
the
data_catalog
branch
from
September 25, 2024 20:25
5c6032c
to
4bcf9e6
Compare
jaamarks
approved these changes
Sep 25, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
jaamarks
approved these changes
Sep 25, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.