Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupby() with concat() result columns can't be added #878

Closed
ecoughlan opened this issue Nov 20, 2019 · 2 comments · Fixed by #908
Closed

groupby() with concat() result columns can't be added #878

ecoughlan opened this issue Nov 20, 2019 · 2 comments · Fixed by #908
Labels
pandas concordance 🐼 Functionality that does not match pandas
Milestone

Comments

@ecoughlan
Copy link
Contributor

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04.2
  • Modin installed from (source or binary): pip
  • Modin version: 0.6.3
  • Python version: 3.7.3
  • Exact command to reproduce: Snippet below

Describe the problem

Cannot add a combination of results to a DF composed of grouped results.

Source code / logs

import modin.pandas as pd #import pandas as pd
df = pd.DataFrame(data={"i":[1,1 ], "a":[0, 0], "b":[0, 0]})
g = df.groupby(by="i")
cat = pd.concat([g["a"].first(), g["b"].first()], keys=["x", "y"], axis=1)
cat["z"] = cat.x + cat.y

doesn't work because the subtraction results in two columns being returned with modin vs one in pandas, probably something is mixed up with a multi-index on the columns.

@devin-petersohn
Copy link
Collaborator

Thanks @ecoughlan, you are right that it is setting a multi-index on the columns. When keys is set for concat, Modin seems to be adding a level to the columns instead of replacing the label names with the values. It should be a straightforward fix.

As a side note, cat = pd.concat([g["a"].first(), g["b"].first()], keys=["x", "y"], axis=1) could be rewritten to cat = g.first().set_axis(["x", "y"], axis=1, inplace=False). I realize that isn't the point of the issue, but I want to add this as it is generally less efficient to do the way written.

Thanks again for posting!

@devin-petersohn devin-petersohn added the pandas concordance 🐼 Functionality that does not match pandas label Nov 20, 2019
@devin-petersohn devin-petersohn added this to the 0.6.4 milestone Dec 10, 2019
devin-petersohn added a commit to devin-petersohn/modin that referenced this issue Dec 13, 2019
* Resolves modin-project#878
* Create a `SeriesGroupBy` object that intercepts every call to the
  `SeriesGroupBy` object, applies it to the pandas object, then
  re-distributes the object if it is a `pandas.Series` or a
  `pandas.DataFrame`. This is a temporary measure until we can implement
  a `SeriesGroupBy` object with all of the methods.
* This issue originally surfaced with issues handling interactions
  between pandas and modin Series objects.
* A further pass is required to remove other cases where Modin can
  return a pandas object.
@devin-petersohn
Copy link
Collaborator

@ecoughlan a quick update: This issue happened because the g["a"].first() object was a pandas.Series and we were internally checking for modin.pandas.Series. I fixed the case where it was creating a pandas.Series and tested that the Series behavior is still correct for modin.pandas.Series. The pandas.Series behavior in Modin was falling back to the DataFrame behavior, which does create a new level.

devin-petersohn added a commit that referenced this issue Dec 17, 2019
…#908)

* Create SeriesGroupBy wrapper to default to pandas and return to Modin

* Resolves #878
* Create a `SeriesGroupBy` object that intercepts every call to the
  `SeriesGroupBy` object, applies it to the pandas object, then
  re-distributes the object if it is a `pandas.Series` or a
  `pandas.DataFrame`. This is a temporary measure until we can implement
  a `SeriesGroupBy` object with all of the methods.
* This issue originally surfaced with issues handling interactions
  between pandas and modin Series objects.
* A further pass is required to remove other cases where Modin can
  return a pandas object.

* Add tests

* Lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pandas concordance 🐼 Functionality that does not match pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants