Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Groupby NaT Handling #6992

Closed
sinhrks opened this issue Apr 28, 2014 · 6 comments · Fixed by #6996
Closed

BUG: Groupby NaT Handling #6992

sinhrks opened this issue Apr 28, 2014 · 6 comments · Fixed by #6996
Labels
Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@sinhrks
Copy link
Member

sinhrks commented Apr 28, 2014

xref #9236

There seems to be an inconsistency in some GroupBy methods when NaT is included in the group key.

  • GroupBy.groups includes NaT as a key.
  • GroupBy.ngroups doesn't count NaT.
  • GroupBy.__iter__ doesn't return NaT group.
  • GroupBy.get_group fails when NaT is specified.

I understand NaT should be included in the group key according to other function's behaviour, such as dropna. Is it OK to fix it to include NaT?

import pandas as pd
import numpy as np
>>> df = pd.DataFrame({'values': np.random.randn(8), 
                   'dt': [np.nan, pd.Timestamp('2013-01-01'), np.nan, pd.Timestamp('2013-02-01'),
                          np.nan, pd.Timestamp('2013-02-01'), np.nan, pd.Timestamp('2013-01-01')]})
>>> grouped = df.groupby('dt')

>>> grouped.groups
{numpy.datetime64('NaT'): [0, 2, 4, 6], numpy.datetime64('2013-01-01T09:00:00.000000000+0900'): [1, 7], numpy.datetime64('2013-02-01T09:00:00.000000000+0900'): [3, 5]}

>>> grouped.ngroups
2

>>> grouped.indices
# ValueError: DatetimeIndex with NaT cannot be converted to object

>>> grouped.get_group(pd.NaT)
ValueError: DatetimeIndex with NaT cannot be converted to object
@jreback
Copy link
Contributor

jreback commented Apr 28, 2014

nope, these should be handled like nan groups (excluded). http://pandas-docs.github.io/pandas-docs-travis/groupby.html#na-group-handling

maybe add some tests for this explicity (and NaT if its the only key should not be a group)

@jreback
Copy link
Contributor

jreback commented Apr 28, 2014

see also #5456

@sinhrks
Copy link
Member Author

sinhrks commented Apr 28, 2014

Thanks. I'm working on #3729 now (hopefully finish it soon), and will handle NaT like nan. I'll add some tests at that time.

@jreback jreback added this to the 0.15.0 milestone Apr 28, 2014
@sinhrks
Copy link
Member Author

sinhrks commented Apr 28, 2014

If group key contains NaT, get_group for valid group also doesn't work... It looks to be caused by the spec #1348.

# continued from above example
>>> grouped.get_group(pd.Timestamp('2013-02-01'))
# ValueError: DatetimeIndex with NaT cannot be converted to object

@jreback
Copy link
Contributor

jreback commented Apr 28, 2014

group key should not contain NaT that is an error (just as it should not contain nan).

@sinhrks
Copy link
Member Author

sinhrks commented Apr 28, 2014

Understood the principle, and I think it is better to get_group works even if the key contains NaT (because it works when the key contains nan).

@jreback jreback modified the milestones: 0.14.1, 0.15.0 May 1, 2014
@jreback jreback modified the milestones: 0.15.0, 0.14.1, 0.15.1 Jul 6, 2014
@jreback jreback modified the milestones: 0.15.1, 0.15.0 Sep 17, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jreback jreback modified the milestones: 0.17.0, Next Major Release May 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants