-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2.0.0-beta.3 #349
2.0.0-beta.3 #349
Conversation
Going to run the new release on UKB overnight and test out some things. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new version now runs correctly on copy of UKB (Cambridge cluster) and local dataset (single-sample) with the fraposa_update. Log also correct for the scores.
Waiting for bioconda/bioconda-recipes#49916 before merging. |
Changelog
Important fix: Fix splitting duplicated variant IDs across multiple scoring files
Background
MATCH_COMBINE
step writes new scoring files for input toplink2 --score
Example
When using PGS000039, PGS000040, and PGS000041 in parallel some variants have different effect alleles at the same coordinates, for example:
22:40682469:T:C
with effect allele T (PGS000041_hmPOS_GRCh38)22:40682469:T:C
with effect allele C (PGS000039_hmPOS_GRCh38)Impact
In versions
v2.0.0-beta
,beta.1
, andbeta.2
the duplicated variant is written to the same scoring file and ignored by plink2. The duplicated variant doesn't contribute to the final calculated PGS.In all
v2.0.0-alpha
versions andbeta.3
a second scoring file is correctly written containing the other allele (additional alleles create extra scoring files automatically within the updatedMATCH_COMBINE
process). We have also updated the software tests to ensure this error doesn't occur in future releases.This problem is more likely to happen when larger scores are calculated in parallel. As more scores are calculated in parallel, it's more likely that variant IDs with different effect alleles will duplicate and be ignored during the score calculation stage.
While the overall impact on the final score is likely to be small we encourage users to upgrade to beta.3, especially if they calculate larger scores in parallel.
How do I know if my data are affected?
One missing variant appears in the output. This check is now included in the scoring module.
Other fixes
--keep_ambiguous
parameter Issue with '--keep_ambiguous' Option and Possible Bug #346