DataFrame.groupby fails with MultiIndex containing pd.NaT #9236

stevenmanton · 2015-01-13T01:26:40Z

It seems that the groupby operation fails when the row index is a MultiIndex containing NaT values. For example, the following code fails (v0.15.2) with TypeError: 'numpy.ndarray' object is not callable:

midx = pd.MultiIndex(levels=[[pd.NaT, pd.datetime(2012,1,2), 
                     pd.datetime(2012,1,3)], ['a', 'b']],
                     labels=[[0, 1, 1, 2], [0, 0, 1, 0]], names=['date', None])
df = pd.Series(pd.np.random.rand(4), index=midx)
df.groupby(level=1)

However, it seems as though np.nan values are handled properly:

midx = pd.MultiIndex(levels=[[pd.np.nan, 10, 20], ['a', 'b']],
                     labels=[[0, 1, 1, 2], [0, 0, 1, 0]], names=['date', None])
df = pd.Series(pd.np.random.rand(4), index=midx)
df.groupby(level=1)

The text was updated successfully, but these errors were encountered:

shoyer · 2015-01-13T04:07:13Z

I can reproduce this on master.

Thanks for the report!

jreback · 2015-01-13T04:10:43Z

iirc this is a dupe issue - if someone would like 2 find the reference

jreback · 2015-01-13T11:00:04Z

this is covered by #6996, #6992, will xref it there.

pull-requests are welcome

jreback · 2015-01-13T11:01:33Z

actually, will reopen in case it is slightly different.

stevenmanton · 2015-01-14T01:08:47Z

Thanks for looking into this! I've been banging into this all day as I've been working on some analysis. I took a look at the pandas source, but it's not clear to me where the bug is and how to go about fixing it. Nonetheless, I've found a pretty quick workaround that produces the behavior I would expect. Maybe this will help others with a similar problem or give some direction in fixing the issue. Essentially, the workaround drops the NaT value within the level.

Here's an example of the workaround that works for me:

midx = pd.MultiIndex(levels=[[pd.NaT, pd.datetime(2012,1,2), 
                     pd.datetime(2012,1,3)], ['a', 'b']],
                     labels=[[0, 1, 1, 2], [0, 0, 1, 0]], names=['date', None])
df = pd.Series(pd.np.random.rand(4), index=midx)
df.groupby(df.index.get_level_values(0)).count()

mroeschke · 2019-02-08T05:15:59Z

Looks to be fixed on master. I imagine this edge case could use a test.

In [8]: In [4]: pd.__version__
   ...: Out[4]: '0.25.0.dev0+85.g0eddba883'

In [9]: In [7]: midx = pd.MultiIndex(levels=[[pd.NaT, pd.datetime(2012,1,2),
   ...:    ...:                      pd.datetime(2012,1,3)], ['a', 'b']],
   ...:    ...:                      labels=[[0, 1, 1, 2], [0, 0, 1, 0]], names=['date', None])
   ...:    ...: df = pd.Series(pd.np.random.rand(4), index=midx)
   ...:    ...: df.groupby(level=1).mean()
/anaconda3/envs/pandas-dev/bin/ipython:3: FutureWarning: the 'labels' keyword is deprecated, use 'codes' instead
  # -*- coding: utf-8 -*-
Out[9]:
a    0.849207
b    0.877276
dtype: float64

TrigonaMinima · 2019-02-13T10:01:53Z

@mroeschke
tried the code on the pandas version 0.24.0, it ran successfully. Could you point me to the file where this test should be added?

mroeschke · 2019-02-13T17:27:23Z

pandas/tests/groupby/test_groupby.py

shoyer added Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Bug labels Jan 13, 2015

shoyer added this to the 0.16.0 milestone Jan 13, 2015

jreback closed this as completed Jan 13, 2015

jreback added the Duplicate Report Duplicate issue or pull request label Jan 13, 2015

sinhrks mentioned this issue Jan 13, 2015

BUG: Groupby NaT Handling #6992

Closed

jreback mentioned this issue Jan 13, 2015

BUG: GroupBy.get_group raises ValueError when group key contains NaT #6996

Merged

jreback reopened this Jan 13, 2015

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Duplicate Report Duplicate issue or pull request labels Feb 8, 2019

TrigonaMinima mentioned this issue Feb 13, 2019

#9236: test for the DataFrame.groupby with MultiIndex having pd.NaT #25310

Merged

3 tasks

jreback modified the milestones: Contributions Welcome, 0.25.0 Feb 16, 2019

jreback closed this as completed in #25310 Feb 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame.groupby fails with MultiIndex containing pd.NaT #9236

DataFrame.groupby fails with MultiIndex containing pd.NaT #9236

stevenmanton commented Jan 13, 2015

shoyer commented Jan 13, 2015

jreback commented Jan 13, 2015

jreback commented Jan 13, 2015

jreback commented Jan 13, 2015

stevenmanton commented Jan 14, 2015

mroeschke commented Feb 8, 2019

TrigonaMinima commented Feb 13, 2019

mroeschke commented Feb 13, 2019

DataFrame.groupby fails with MultiIndex containing pd.NaT #9236

DataFrame.groupby fails with MultiIndex containing pd.NaT #9236

Comments

stevenmanton commented Jan 13, 2015

shoyer commented Jan 13, 2015

jreback commented Jan 13, 2015

jreback commented Jan 13, 2015

jreback commented Jan 13, 2015

stevenmanton commented Jan 14, 2015

mroeschke commented Feb 8, 2019

TrigonaMinima commented Feb 13, 2019

mroeschke commented Feb 13, 2019