Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using reset_index on empty DataFrame coverts column datatypes to object #4615

Closed
litlep-nibbyt opened this issue Jun 28, 2022 · 3 comments
Closed
Labels
bug 🦗 Something isn't working

Comments

@litlep-nibbyt
Copy link

System information

  • OS X 11.6.4
  • Modin version '0.15.2'
  • Python 3.9.12

Describe the problem

When calling reset_index on an empty DataFrame, column types all get converted to object.

Source code / logs

from numpy import int64, datetime64
from modin.pandas import DataFrame
import modin.pandas as pd
import numpy as np

if __name__ == "__main__":
    patients = {
        "patient_id": int64(),
        "gender": int64(),
        "dob": datetime64()
    }

    df = DataFrame(patients, index=[])
    print(f"dtypes before reset_index: {df.dtypes}")
    df = df.reset_index(drop=True)
    print(f"dtypes after reset_index: {df.dtypes}")
@mvashishtha
Copy link
Collaborator

@meijiu thank you for the detailed bug report! I was able to reproduce the bug at Modin version 86d3610. If I see a quick fix, I'll assign the bug to myself and make a fix. Otherwise, I'll leave the issue unassigned.

@mvashishtha mvashishtha added the bug 🦗 Something isn't working label Jun 28, 2022
@mvashishtha
Copy link
Collaborator

@meijiu I just noticed that this was a bug with an empty dataframe. Modin functions on empty dataframes usually default to pandas, and we lose dtypes when we default to pandas. See e.g. #4191 and #4060 for similar issues.

#4605 tracks a way to robustly handle empty dataframes in general in Modin. We actually have a draft PR, #4606, ready for that feature. I think that PR should fix this bug.

I will mark this issue as a duplicate of #4605. For now, I suggest using pandas to work with empty dataframes, then converting the dataframes to Modin dataframes once they're not empty.

Please reply here if you have any other concerns.

@mvashishtha
Copy link
Collaborator

Duplicate of #4605

@mvashishtha mvashishtha marked this as a duplicate of #4605 Jun 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants