Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: df.agg call passes different things to a custom function depending on whether a unused kwarg is supplied or not #39169

Open
2 of 3 tasks
pjireland opened this issue Jan 14, 2021 · 5 comments · May be fixed by #57706
Open
2 of 3 tasks
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby

Comments

@pjireland
Copy link


Code Sample, a copy-pastable example

import pandas as pd
import numpy as np
import scipy.stats


def circ_mean(data, dummy_kwarg=0):
    # print(data)
    return 180/np.pi*scipy.stats.circmean(data*np.pi/180)


def numpy_mean(data, dummy_kwarg=0):
    return np.mean(data)


@pd.api.extensions.register_dataframe_accessor("my")
class CircstatsAccessor(object):
    def __init__(self, pandas_obj):
        self._obj = pandas_obj
        
    def circ_mean(self, axis=0, level=None, **kwargs):
        df = self._obj
        if axis != 0 or level is not None:
            df = df.groupby(axis=axis, level=level)
        return df.agg(circ_mean, **kwargs)
    
    def numpy_mean(self, axis=0, level=None, **kwargs):
        df = self._obj
        if axis != 0 or level is not None:
            df = df.groupby(axis=axis, level=level)
        return df.agg(numpy_mean, **kwargs)


df = pd.DataFrame(
    data={
        "col1": [10, 11, 12, 13],
        "col2": [20, 21, 22, 23],
    },
    index=[1, 2, 3, 4]
)

# Compute results with the standard `df.mean` call
# I'd like my custom mean function to do a similar thing
df.mean(level=0, axis=0)

# If I don't pass in any kwargs, `df.my.circ_mean` behaves as expected
# Results approximately match those from `df.mean`
df.my.circ_mean(level=0, axis=0)

# If I pass in a kwarg that is not ever used, `df.my.circ_mean`
# returns unusual results - the returned values in `col1` are 
# identical to those in `col2`, whereas they were
# different before
df.my.circ_mean(level=0, axis=0, dummy_kwarg=0)

# If I call `df.my.numpy_mean`, results are identical
# without or without providing the kwarg
df.my.numpy_mean(level=0, axis=0)
df.my.numpy_mean(level=0, axis=0, dummy_kwarg=0)

Problem description

As discussed in the code comments above, I see a difference in behavior in my circ_mean function depending on whether a dummy (un-used) keyword argument is specified. Uncommenting the print command in the circ_mean function indicates that df.agg is passing in different things depending on whether or not this keyword is provided.

I would expect there to be no difference in behavior since this keyword has no effect. Interestingly, I see the expected no difference in behavior if I replace the more complicated circular mean call with a simple np.mean call inside my custom function (compare circ_mean and numpy_mean functions).

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 3e89b4c
python : 3.7.9.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : None.None

pandas : 1.2.0
numpy : 1.19.2
pytz : 2020.5
dateutil : 2.8.1
pip : 20.3.3
setuptools : 51.0.0.post20201207
Cython : 0.29.21
pytest : 6.2.1
hypothesis : None
sphinx : 3.4.1
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.2
html5lib : 1.1
pymysql : 0.10.1
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.3
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.2
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 0.15.1
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.2
sqlalchemy : 1.3.20
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.16.2
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.51.2

@pjireland pjireland added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 14, 2021
@rhshadrach
Copy link
Member

Thanks for the report, can you simplify the example to only include necessary details?

@pjireland
Copy link
Author

Thanks for the report, can you simplify the example to only include necessary details?

No problem. The example was the simplest one I was able to find that still showed the behavior described, but feel free to let me know if anything is confusing about it, or you see anything that seems like it could be simplified.

@rhshadrach
Copy link
Member

rhshadrach commented Jan 16, 2021

It appears to me the entire class is unnecessary. Can just call circle_mean directly to demonstrate the issue.

@simonjayhawkins
Copy link
Member

Can just call circle_mean directly to demonstrate the issue.

>>> import numpy as np
>>> import pandas as pd
>>>
>>> def func(data, **kwargs):
...     return np.sum(np.sum(data)) # np.sum twice to ensure scalar result
...
>>> df = pd.DataFrame([[1,2], [3,4]])
>>> print(df.groupby(level=0).agg(func))
   0  1
0  1  2
1  3  4
>>> print(df.groupby(level=0).agg(func, foo=42))
   0  1
0  3  3
1  7  7
>>>

@rhshadrach
Copy link
Member

Thanks @simonjayhawkins. In pandas.core.groupby.generic.aggregate, we're taking two different paths depending on whether args/kwargs are being used. The path through _aggregate_frame computes the result on each group as a whole, then uses _wrap_frame_output which in turn uses the original objects columns. On the other hand, the path through agg_list_like aggregates column-by-column.

Further investigations and PRs to fix are welcome.

@rhshadrach rhshadrach added Apply Apply, Aggregate, Transform, Map Groupby and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 17, 2021
@rhshadrach rhshadrach added this to the Contributions Welcome milestone Jan 17, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@rhshadrach rhshadrach linked a pull request Mar 2, 2024 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby
Projects
None yet
4 participants