Partial correlation when the covariate is identical to x or y #371

m-guggenmos · 2023-08-08T07:59:36Z

m-guggenmos
Aug 8, 2023

[I realized this issue is more subtle / less critical during writing this post, but perhaps it is useful anyway..]

I encountered the following situation (this is a MRE from some more complex code):

import numpy as np
import pingouin as pg
import pandas as pd
np.random.seed(0)

data = np.random.multivariate_normal([0, 0], [[1, 0.5], [0.5, 1]], 1000)
df = pd.DataFrame(dict(x=data[:, 0], y=data[:, 1]))
df['z'] = df['y']

# I expected this correlation to be nan (or maybe 0):
pg.partial_corr(df, 'x', 'y', 'z')
# ..however:
#             n         r         CI95%         p-val
# pearson  1000  0.497099  [0.45, 0.54]  1.803982e-63
# and:
pg.partial_corr(df, 'x', 'y')
#            n         r         CI95%         p-val
# pearson  1000  0.497099  [0.45, 0.54]  1.564525e-63

The use case here is (sort of a control) analysis, in which I use pg.partial_corr to correct a correlation matrix for one of it's factors (let's say factors z). I would have expected that all partial correlations of z when controlling for z should be nan or possibly 0, but in the above code, the result of pg.partial_corr(df, 'x', 'y', 'z') where df['y'] = df['z'] is nearly identical to pg.partial_corr(df, 'x', 'y'), as if z is not partialed out at all.

With some testing I realized that had I programmed it as pg.partial_corr(df, 'x', 'z', 'z') I would have received the error message AssertionError: y and covar must be independent. So long story short, I wonder whether instead of only asserting/testing for x != covar and y != covar one could

also test something like not np.allclose(data['y'], data['covar']) and not np.allclose(data['x'], data['covar']) and
if one of these conditions is met, instead of raising an assertion error, return a nan correlation with a warning about non-independence.

raphaelvallat · 2023-08-08T18:12:17Z

raphaelvallat
Aug 8, 2023
Maintainer

Hi @m-guggenmos ,

Thanks for the clear explanation and example code. Yeah I think your proposal makes sense, so please feel free to work on a PR if you'd like! You can re-use this line for the return statement:

https://github.com/raphaelvallat/pingouin/blob/7923141161564b7a065b75f44f5fc75a2c1a1aa2/pingouin/correlation.py#L885C9-L885C101

0 replies

m-guggenmos · 2023-08-09T11:03:01Z

m-guggenmos
Aug 9, 2023
Author

Thanks for getting back to me so quickly @raphaelvallat.

At the moment I don't have the time to learn about creating a PR, so for now I just post how it would look like — replacing the assertion checks with roughly the following:

depchecks = []
if isinstance(covar, list):
    if x in covar:
        depchecks += ["One of the covariates is identical to x"]
    elif True in [np.isclose(data["x"], data[cv]) for cv in covar]:
        depchecks += ["One of the covariates is too similar or identical to x"]
    if y in covar:
        depchecks += ["One of the covariates is identical to y"]
    elif True in [np.isclose(data["y"], data[cv]) for cv in covar]:
        depchecks += ["One of the covariates is too similar or identical to y"]
else:
    if x == covar:
        depchecks += ["The covariate is identical to x"]
    elif np.isclose(data["x"], data[covar]):
        depchecks += ["The covariate is too similar or identical to x"]
    if y == covar:
        depchecks += ["The covariate is identical to y"]
    elif np.isclose(data["y"], data[covar]):
        depchecks += ["The covariate is too similar or identical to y"]
if len(depchecks) > 0:
    warnings.warn(
        "Non-independent covariate(s) detected.\n"
        "Failed checks:\n" +
        "\n".join([f"  - {depcheck}" for depcheck in depchecks]) +
        "\nr, CI95% and p-val are set to nan."
    )
    return pd.DataFrame({"n": n, "r": np.nan, "CI95%": np.nan, "p-val": np.nan}, index=[method])

(Untested!)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial correlation when the covariate is identical to x or y #371

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Partial correlation when the covariate is identical to x or y #371

m-guggenmos Aug 8, 2023

Replies: 2 comments

raphaelvallat Aug 8, 2023 Maintainer

m-guggenmos Aug 9, 2023 Author

m-guggenmos
Aug 8, 2023

raphaelvallat
Aug 8, 2023
Maintainer

m-guggenmos
Aug 9, 2023
Author