Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupby_agg ingores exceptions #1462

Closed
dchigarev opened this issue May 19, 2020 · 1 comment · Fixed by #1703
Closed

groupby_agg ingores exceptions #1462

dchigarev opened this issue May 19, 2020 · 1 comment · Fixed by #1703
Assignees
Labels
bug 🦗 Something isn't working
Milestone

Comments

@dchigarev
Copy link
Collaborator

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 1809
  • Modin version (modin.__version__): 0.7.3+41.gd617985
  • Python version: Python 3.7.5
  • Code we can use to reproduce:
if __name__ == "__main__":
    import pandas
    import modin.pandas as pd

    data = {
        "col1": [0, 1, 2, 3],
        "col2": [4, 5, 6, 7],
    }

    md_df, pd_df = (
        pd.DataFrame(data).astype({"col1": "category"}),
        pandas.DataFrame(data).astype({"col1": "category"}),
    )
    by = [1, 2, 1, 2]

    md_grp, pd_grp = md_df.groupby(by=by), pd_df.groupby(by=by)

    print(md_grp.quantile())  # prints an empty DataFrame
    print(pd_grp.quantile())  # throws TypeError: No matching signature found

Describe the problem

Some of groupby aggregation operations in pandas may throw TypeError exception, just like quantile in related code. However modin in that case returns an empty DataFrame, which is different behavior from pandas.

Source code / logs

The problem is probably here: modin/backends/pandas/query_compiler.py: def groupby_agg()

...
grouped_df = df.groupby(by=by, axis=axis, **groupby_args)
try:
    result = agg_func(grouped_df, **agg_args)
except (DataError, TypeError):
    result = pandas.DataFrame(index=grouped_df.size().index)
return result
...

when our agg_func raises TypeError we catch it and try to handle that situation, but it seems that this approach sometimes gives us different from pandas behavior

@dchigarev dchigarev added the bug 🦗 Something isn't working label May 19, 2020
@devin-petersohn
Copy link
Collaborator

Thanks @dchigarev this will be somewhat difficult because sometimes errors are thrown asynchronously.

@devin-petersohn devin-petersohn added this to the 0.7.4 milestone May 20, 2020
devin-petersohn pushed a commit that referenced this issue Jul 10, 2020
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants