-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add modified Multi-snp and Docstring #547
Add modified Multi-snp and Docstring #547
Conversation
Adds a modified multivariate version of POE for multiple SNPs Also fixes POESingleSNP's init docstring to match parameters
Fixes logic issue with multiple snp algorithm causing slower iterations
Updates test call for multiplesnp2 to fit updated parameters
…into feature/poe_multisnp
Modifies cov_regularization used in permutation test and calculates maf vals from genotype data instead of having it passed in
WalkthroughThis update enhances the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- python/python/bystro/parent_of_origin/parent_of_origin.py (4 hunks)
- python/python/bystro/parent_of_origin/tests/test_parent_of_origin.py (2 hunks)
Additional comments not posted (2)
python/python/bystro/parent_of_origin/tests/test_parent_of_origin.py (1)
13-13
: Import statement forPOEMultipleSNP2
is correct.The import statement correctly introduces the new
POEMultipleSNP2
class, ensuring it's available for use in the test suite.python/python/bystro/parent_of_origin/parent_of_origin.py (1)
38-38
: Updated import statement to includeDict
.The inclusion of
Dict
from thetyping
module is appropriate for the new features added in this file, which utilize dictionary types for certain operations.
compute_ci : bool, optional, default=False | ||
Whether to compute confidence intervals. | ||
|
||
store_samples : bool, optional, default=False | ||
Whether to store bootstrap samples. | ||
|
||
pval_method : str, optional, default="rmt4ds" | ||
The method for p-value computation. | ||
|
||
n_permutations_pval : int, optional, default=10000 | ||
The number of permutations for p-value calculation. | ||
|
||
n_permutations_bootstrap : int, optional, default=10000 | ||
The number of permutations for bootstrap confidence intervals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhancements to POESingleSNP
initialization.
The addition of new parameters such as compute_ci
, store_samples
, and pval_method
enhances the functionality of the POESingleSNP
class, allowing for more flexible and detailed statistical analysis. Ensure that these parameters are well-documented and tested.
+ # Ensure the new parameters are covered in unit tests
Committable suggestion was skipped due to low confidence.
homo_count = int(homo_prob * n_total) | ||
het_count = n_total - homo_count | ||
|
||
perm_indices = rng.permutation(n_total) | ||
homo_indices = perm_indices[:homo_count] | ||
het_indices = perm_indices[homo_count : homo_count + het_count] | ||
|
||
X_homo = X[homo_indices] | ||
X_het = X[het_indices] | ||
|
||
X_homo = X_homo - np.mean(X_homo, axis=0) | ||
X_het = X_het - np.mean(X_het, axis=0) | ||
cov_reg = self.cov_reg | ||
cov_reg.fit(X_homo) | ||
Sigma_AA = np.array(cov_reg.covariance) | ||
L = la.cholesky(Sigma_AA) | ||
L_inv = la.inv(L) | ||
|
||
X_het_whitened = np.dot(X_het, L_inv.T) | ||
Sigma_AB_white = np.cov(X_het_whitened.T) | ||
|
||
U, s, Vt = la.svd(Sigma_AB_white) | ||
|
||
if self.svd_loss: | ||
s, _ = optimal_shrinkage( | ||
s, self.n_phenotypes / X_het.shape[0], self.svd_loss | ||
) | ||
|
||
norm_a = np.maximum(s[0] - 1, 0) | ||
parent_effect_white = Vt[0] * 2 * np.sqrt(norm_a) | ||
parent_effect = np.dot(parent_effect_white, L.T) | ||
perms.append(np.linalg.norm(parent_effect)) | ||
maf_perms[maf] = np.array(perms) | ||
|
||
for i in range(self.n_genotypes): | ||
current_maf = maf_vals[i] | ||
appropriate_threshold = max( | ||
[t for t in maf_thresholds if t <= current_maf], | ||
default=min(maf_thresholds), | ||
) | ||
relevant_perms = maf_perms[appropriate_threshold] | ||
|
||
model = POESingleSNP( | ||
compute_pvalue=False, | ||
compute_ci=False, | ||
cov_regularization=self.cov_regularization, | ||
svd_loss=self.svd_loss, | ||
) | ||
model.fit(X, Y[:, i], seed=seed) | ||
self.parent_effects_[i] = model.parent_effect_ | ||
norm_effect = np.linalg.norm(model.parent_effect_) | ||
|
||
p_value = (relevant_perms >= norm_effect).mean() | ||
self.p_vals[i] = p_value | ||
|
||
return self | ||
|
||
def transform( | ||
self, X: np.ndarray, return_inner: bool = False | ||
) -> Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]: | ||
""" | ||
This method predicts whether the heterozygote allele came from | ||
a maternal/paternal origin. Note that due to a lack of | ||
identifiability, we can't state whether class 1 is paternal or | ||
maternal | ||
|
||
Parameters | ||
---------- | ||
X : np.array-like, shape=(N, self.phenotypes) | ||
The phenotype data | ||
|
||
return_inner : bool, default=False | ||
Whether to return the inner product classification, a measure | ||
of confidence in the call | ||
|
||
Returns | ||
------- | ||
calls : np.array-like, shape=(N,self.n_genotypes) | ||
A vector of 1s and 0s predicting class | ||
|
||
preds : np.array-like, shape=(N,self.n_genotypes) | ||
The inner product, representing confidence in calls | ||
""" | ||
N = X.shape[0] | ||
calls = np.zeros((N, self.n_genotypes)) | ||
preds = np.zeros((N, self.n_genotypes)) | ||
X_dm = X - np.mean(X, axis=0) | ||
for i in range(self.n_genotypes): | ||
preds[:, i] = np.dot(X_dm, self.parent_effects_[i]) | ||
calls[:, i] = 1.0 * (preds[:, i] > 0) | ||
if return_inner is False: | ||
return calls | ||
return calls, preds | ||
|
||
def _test_inputs(self, X: np.ndarray, Y: np.ndarray) -> None: | ||
if not isinstance(X, np.ndarray): | ||
raise ValueError("X is numpy array") | ||
if not isinstance(Y, np.ndarray): | ||
raise ValueError("y is numpy array") | ||
if X.shape[0] != Y.shape[0]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comprehensive review of POEMultipleSNP2
class.
The POEMultipleSNP2
class is well-implemented with appropriate methods for fitting and transforming data. The use of cov_regularization
and svd_loss
parameters are correctly handled. However, ensure that the exception handling for invalid cov_regularization
values is robust and that all new methods are accompanied by appropriate unit tests.
+ # Add unit tests for new methods in POEMultipleSNP2
Committable suggestion was skipped due to low confidence.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- python/python/bystro/parent_of_origin/tests/test_parent_of_origin.py (2 hunks)
Files skipped from review as they are similar to previous changes (1)
- python/python/bystro/parent_of_origin/tests/test_parent_of_origin.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Adds a modified multivariate version of POE for multiple SNPs
Also fixes POESingleSNP's init docstring to match parameters
Summary by CodeRabbit
New Features
POEMultipleSNP2
class with methods for improved prediction and classification.Tests
test_multi2_fit
to validate the functionality of thePOEMultipleSNP2
class.