Inconsistent groupby-apply output shape and random values returned. #20420

r3k4mn14r · 2018-03-20T15:51:35Z

Code Sample, a copy-pastable example if possible

df = pd.DataFrame({"A": [1, 2, 3, 4, 5], "B": [6, 7, 8, 9, 0],
                   "C": [1, 1, 1, 2, 2]}, index=range(5))     
a = df.groupby("C").apply(lambda x: x.A)                      
b = df.groupby("C").apply(lambda x: x.A.sort_index())

In [9]: print(df)
   A  B  C
0  1  6  1
1  2  7  1
2  3  8  1
3  4  9  2
4  5  0  2

In [248]: print(a)
C
1  0    1
   1    2
   2    3
2  3    4
   4    5
Name: A, dtype: int64

In [249]: print(b)
A  0  1   2
C
1  1  2   3
2  4  5  33

Problem description

First the output of .groupby().apply() seems inconsistent, sometimes it returns the "correct" shape as in case A while in case B the output is transposed.

Second, the 33 returned values in case B is not what I would expect it to be. That number changes if I call the function multiple times.

This does not only happens when sort_index() is called but it was the simplest example I could consistently reproduce.

Expected Output

C
1  0    1
   1    2
   2    3
2  3    4
   4    5
Name: A, dtype: int64

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-112-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: 6.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: None
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.5.0

The text was updated successfully, but these errors were encountered:

h-vetinari · 2018-08-30T19:29:09Z

I started collecting inconsistencies related to groupby.apply in #22545 - feel free to comment there if you have some more weird cases.

mroeschke · 2020-06-28T06:15:08Z

Looks to be fixed on master. Could use a test

In [113]: df = pd.DataFrame({"A": [1, 2, 3, 4, 5], "B": [6, 7, 8, 9, 0],
     ...:                    "C": [1, 1, 1, 2, 2]}, index=range(5))
     ...: a = df.groupby("C").apply(lambda x: x.A)
     ...: b = df.groupby("C").apply(lambda x: x.A.sort_index())

In [114]: a
Out[114]:
C
1  0    1
   1    2
   2    3
2  3    4
   4    5
Name: A, dtype: int64

In [115]: b
Out[115]:
C
1  0    1
   1    2
   2    3
2  3    4
   4    5
Name: A, dtype: int64

jbrockmendel added the Groupby label Jul 30, 2018

WillAyd mentioned this issue Aug 30, 2018

API: groupby aggregation with apply does not drop groupby-column #22542

Closed

h-vetinari mentioned this issue Aug 30, 2018

API/DOC: clean up DataFrame.groupby.apply #22545

Open

h-vetinari mentioned this issue Nov 18, 2018

Towards "pandas 1.0" #10000

Closed

h-vetinari mentioned this issue Jan 28, 2019

RLS: 0.25.0 #24950

Closed

WillAyd mentioned this issue Sep 20, 2019

BUG: Groupby selection context not being properly reset #28541

Closed

5 tasks

mroeschke added the Apply Apply, Aggregate, Transform, Map label Oct 27, 2019

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Apply Apply, Aggregate, Transform, Map Groupby labels Jun 28, 2020

mroeschke mentioned this issue May 28, 2021

TST: More old issues #41697

Merged

10 tasks

mroeschke added this to the 1.3 milestone May 28, 2021

jreback closed this as completed in #41697 May 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent groupby-apply output shape and random values returned. #20420

Inconsistent groupby-apply output shape and random values returned. #20420

r3k4mn14r commented Mar 20, 2018

INSTALLED VERSIONS

h-vetinari commented Aug 30, 2018

mroeschke commented Jun 28, 2020

Inconsistent groupby-apply output shape and random values returned. #20420

Inconsistent groupby-apply output shape and random values returned. #20420

Comments

r3k4mn14r commented Mar 20, 2018

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

h-vetinari commented Aug 30, 2018

mroeschke commented Jun 28, 2020

Output of `pd.show_versions()`