Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partial_corr documentation / assertion wrong #387

Open
mkeds opened this issue Nov 8, 2023 · 4 comments
Open

partial_corr documentation / assertion wrong #387

mkeds opened this issue Nov 8, 2023 · 4 comments
Assignees

Comments

@mkeds
Copy link

mkeds commented Nov 8, 2023

The following minimum code example straight from documentation

https://pingouin-stats.org/build/html/generated/pingouin.partial_corr.html#pingouin.partial_corr
Example 1

import pingouin as pg
df = pg.read_dataset('partial_corr')
pg.partial_corr(data=df, x='x', y='y', covar='cv1').round(3)
n r CI95% p-val
pearson 30 0.568 [0.25, 0.77] 0.001

returns:
AssertionError: columns are not in dataframe.

on line 843 from correlation.py:
assert all([c in data for c in col]), "columns are not in dataframe."

which kind of makes sense, because just before in line 842 we have:
col = _flatten_list([x, y, covar, x_covar, y_covar])

and since we don't supply x_covar y_covar when calling the function in the example, they default to None in function header, and None in not in the DataFrame by default:

In[664]: None in df Out[664]: False

Is this expected behaviour or something wrong with my Pandas? I'm using pandas=2.1.2,

@mkeds
Copy link
Author

mkeds commented Nov 9, 2023

partial_corr() assertion checks seem wobbly in general, e.g.:

        assert x != covar, "x and covar must be independent"
        assert y != covar, "y and covar must be independent"

These come where we yet not know whether covar is a list or a string. If it's a list, truth value is ambiguous.

I'd suggest following small changes:
In 835:

    assert x != y, "x and y must be independent"
    if isinstance(covar, list):
        assert x not in covar, "x and covar must be independent"
        assert y not in covar, "y and covar must be independent"
    else:
        assert x != covar, "x and covar must be independent"
        assert y != covar, "y and covar must be independent"

And then in 842:

    # Check that columns exist
    col = _flatten_list([x, y, covar, x_covar, y_covar])
    col = [i for i in col if i is not None] # <-- add this
    assert all([c in data for c in col]), "columns are not in dataframe."

without trimming the col list from None values, the assertion is triggered each time we don't have x_covar and y_covar - which we should not have at the same time, anyway.

Hope this helps somebody.

@mkeds
Copy link
Author

mkeds commented Nov 9, 2023

Also, note that in partial_corr() you drop NA's from your data, while in pcorr() you don't, which at best throws an error / some NA's in the result, and at worst is leading to different results in a situation which should lead to the same results.

@raphaelvallat raphaelvallat self-assigned this Nov 10, 2023
@raphaelvallat
Copy link
Owner

Hi @mkeds

  1. The AssertionError occurs in Python 3.12 and has already been fixed in Fix in flatten_list for Python 3.12 #370, however I did not have the time to release a new version of Pingouin yet.
  2. The proposed update to the assertion check is a good idea. I'll implement it now.
  3. Feel free to open a PR to drop nan values in pingouin.pcorr as well, ideally with a minimal example showing that the output is different between the two functions.

Thanks,
Raphael

@raphaelvallat
Copy link
Owner

Also, note that in partial_corr() you drop NA's from your data, while in pcorr() you don't, which at best throws an error / some NA's in the result, and at worst is leading to different results in a situation which should lead to the same results.

Just a clarification: missing data are dropped with pcorr when calculating the covariance matrix with pandas.DataFrame.cov

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants