Skip to content
This repository has been archived by the owner on Apr 10, 2024. It is now read-only.

Make NA/null a first-class citizen in groupby operations #16

Open
wesm opened this issue Sep 7, 2016 · 3 comments
Open

Make NA/null a first-class citizen in groupby operations #16

wesm opened this issue Sep 7, 2016 · 3 comments

Comments

@wesm
Copy link
Owner

wesm commented Sep 7, 2016

xref #9

Maybe we can collect a list of pandas issues that have happened in and around this.

I've found it's valuable to be able to consistently compute statistics including the NA values, especially with multiple group keys. I haven't kept track of how pandas handles these now in all cases, but it would be nice to come up with a strategy to make NA behave like any other group in a group by setting.

@wesm
Copy link
Owner Author

wesm commented Sep 19, 2016

This problem also extends to other analytics, like value_counts:

In:
s = pd.Series([1, 2, np.nan, 1, 1, 2, np.nan])
s.value_counts()

Out:
1.0    3
2.0    2
dtype: int64

Here, NA should appear in the result and indicate 2 values. Same goes for groupby(...).size()

@jorisvandenbossche
Copy link
Contributor

jorisvandenbossche commented Sep 19, 2016

In the specific case of value_counts, there is the dropna keyword which does this:

In [15]: s.value_counts(dropna=False)
Out[15]: 
 1.0    3
NaN     2
 2.0    2
dtype: int64

But of course that does not dismiss the bigger problem with groupby and others (and you could also argue whether dropna=False would be a better default value ..)

@chris-b1
Copy link

It's linked in the top issue, but just for visibility, pandas-dev/pandas#12607 is a WIP PR that would add the dropna keyword arg to groupby.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants