Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concatenating two pandas.Series objects results in modin.pandas.dataframe #919

Closed
fritz-morgendorfer opened this issue Dec 24, 2019 · 5 comments
Labels
Needs more information ❔ Issues that require more information from the reporter

Comments

@fritz-morgendorfer
Copy link

System information

  • OS Platform and Distribution: Windows 10 WSL with Ubuntu 18.04.3 LTS
  • Modin installed from (source or binary): conda pip install modin
  • Modin version: modin==0.6.3
  • Python version: Python 3.6.6
  • Exact command to reproduce:

import pandas
import modin.pandas as pd

s1 = pandas.Series(['a', 'b']) # type(s1) pandas.core.series.Series
s2 = pandas.Series(['X', 'Y']) # type(s2) pandas.core.series.Series
s = pd.concat([s1, s2], axis=0)
type(s) # modin.pandas.dataframe.DataFrame

Describe the problem

In the original script using pandas I have a pandas.DataFrame, where I take two pandas.Series and concatinate them on axis=0. The resulting series is then used as input for parameter data={'name': my_series} in the pandas.DataFrame(...) command.
While substituing pandas with modin.pandas, the two columns taken from modin.pandas.DataFrame, being pandas.Series objects, are concatinated not into modin.pandas.Series but into modin.pandas.DataFrame which then breaks the skript.

Source code / logs

@eavidan
Copy link
Collaborator

eavidan commented Dec 24, 2019

Hi @fritz-morgendorfer, thanks for posting !
I suggest you either use modin or pandas as each supports its own Dataframe and Series operations

import modin.pandas as pd

s1 = pd.Series(['a', 'b'])  # type(s1) modin.pandas.series.Series
s2 = pd.Series(['X', 'Y'])  # type(s2) modin.pandas.series.Series
s = pd.concat([s1, s2], axis=0)     # type(s) modin.pandas.series.Series

In case you have pandas.Series as input, you can convert them to modin.Series and later concat into modin.Series

s1 = pd.Series(pandas.Series(['a', 'b'])) 
s2 = pd.Series(pandas.Series(['X', 'Y']))
s = pd.concat([s1, s2], axis=0)

Let me know if this solves the issues

@fritz-morgendorfer
Copy link
Author

Thanks for the response. I was just wondering if it is possible to really change just one line of code in my existing project to enjoy all the benefits. I woundn't like to change anything more than the import statement. While it is obviously not yet the case with the project under consideration, the proposed solution would help if I were working on new code.
The two pandas.Series objects appear as a result of some operations on modin.pandas.dataframe that are just not yet implemented in modin, so I guess this is an issue of the current state of the project.

@eavidan
Copy link
Collaborator

eavidan commented Dec 24, 2019

When working only with modin this is not an expected behavior.
Could you please share which operations on modin.pandas.dataframe resulted in pandas.Series?

@devin-petersohn
Copy link
Collaborator

@fritz-morgendorfer Is there a groupby that produces these pandas.Series objects? There was a case where we defaulted to pandas but did not return to a modin.pandas.Series for groupby, but that was fixed in #908.

@devin-petersohn devin-petersohn added the Needs more information ❔ Issues that require more information from the reporter label Jan 2, 2020
@devin-petersohn
Copy link
Collaborator

Closing this for now, please feel free to reopen if the issue persists!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs more information ❔ Issues that require more information from the reporter
Projects
None yet
Development

No branches or pull requests

3 participants