Partial correlation when the covariate is identical to x or y #371
m-guggenmos
started this conversation in
Ideas
Replies: 2 comments
-
Hi @m-guggenmos , Thanks for the clear explanation and example code. Yeah I think your proposal makes sense, so please feel free to work on a PR if you'd like! You can re-use this line for the return statement: |
Beta Was this translation helpful? Give feedback.
0 replies
-
Thanks for getting back to me so quickly @raphaelvallat. At the moment I don't have the time to learn about creating a PR, so for now I just post how it would look like — replacing the assertion checks with roughly the following: depchecks = []
if isinstance(covar, list):
if x in covar:
depchecks += ["One of the covariates is identical to x"]
elif True in [np.isclose(data["x"], data[cv]) for cv in covar]:
depchecks += ["One of the covariates is too similar or identical to x"]
if y in covar:
depchecks += ["One of the covariates is identical to y"]
elif True in [np.isclose(data["y"], data[cv]) for cv in covar]:
depchecks += ["One of the covariates is too similar or identical to y"]
else:
if x == covar:
depchecks += ["The covariate is identical to x"]
elif np.isclose(data["x"], data[covar]):
depchecks += ["The covariate is too similar or identical to x"]
if y == covar:
depchecks += ["The covariate is identical to y"]
elif np.isclose(data["y"], data[covar]):
depchecks += ["The covariate is too similar or identical to y"]
if len(depchecks) > 0:
warnings.warn(
"Non-independent covariate(s) detected.\n"
"Failed checks:\n" +
"\n".join([f" - {depcheck}" for depcheck in depchecks]) +
"\nr, CI95% and p-val are set to nan."
)
return pd.DataFrame({"n": n, "r": np.nan, "CI95%": np.nan, "p-val": np.nan}, index=[method]) (Untested!) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
[I realized this issue is more subtle / less critical during writing this post, but perhaps it is useful anyway..]
I encountered the following situation (this is a MRE from some more complex code):
The use case here is (sort of a control) analysis, in which I use
pg.partial_corr
to correct a correlation matrix for one of it's factors (let's say factorsz
). I would have expected that all partial correlations ofz
when controlling forz
should be nan or possibly 0, but in the above code, the result ofpg.partial_corr(df, 'x', 'y', 'z')
wheredf['y'] = df['z']
is nearly identical topg.partial_corr(df, 'x', 'y')
, as ifz
is not partialed out at all.With some testing I realized that had I programmed it as pg.partial_corr(df, 'x', 'z', 'z') I would have received the error message
AssertionError: y and covar must be independent
. So long story short, I wonder whether instead of only asserting/testing forx != covar
andy != covar
one couldnot np.allclose(data['y'], data['covar'])
andnot np.allclose(data['x'], data['covar'])
andnan
correlation with a warning about non-independence.Beta Was this translation helpful? Give feedback.
All reactions