Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: support masked arrays in groupby cython algos #37493

Closed
10 tasks done
jorisvandenbossche opened this issue Oct 29, 2020 · 1 comment · Fixed by #48138
Closed
10 tasks done

ENH: support masked arrays in groupby cython algos #37493

jorisvandenbossche opened this issue Oct 29, 2020 · 1 comment · Fixed by #48138
Labels
Enhancement Groupby NA - MaskedArrays Related to pd.NA and nullable extension arrays

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Oct 29, 2020

Similarly as for normal reductions (eg #30982), we should investigate having masked array-specific support in the groupby algorithms.

Currently, when starting from a nullable extension array, they get converted to a numpy array (eg integers with missing values will typically get cast to float with nan) before passing to the cython algorithm.
Having support for passing a mask to the cython algos can improve the groupby support for nullable dtypes.

@jorisvandenbossche
Copy link
Member Author

It seems that for some algorithms, we actually already use a mask, eg for group_any_all in groupby.pyx (although in the specific case of any/all, we should also add a version that uses Kleene logic for nullable dtypes, like is done for the plain any/all method), or group_quantile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Groupby NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant