Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Dataframe constructor and DataFrame.from_records shoudl allow specifying column dtypes #46868

Closed
jlumpe opened this issue Apr 25, 2022 · 2 comments
Labels
Closing Candidate May be closeable, needs more eyeballs DataFrame DataFrame data structure Enhancement

Comments

@jlumpe
Copy link

jlumpe commented Apr 25, 2022

Is your feature request related to a problem?

When creating a DataFrame from a list of row tuples using either the constructor or DataFrame.from_rows(), I often run into issues where the column data type is not inferred correctly. In my specific case I have a mix of ints and Nones and would like to have an object dtype, but pandas wants to convert to float.

Describe the solution you'd like

read_csv() has a dtype argument that allows setting individual column data types, I think these two methods should have something similar. The constructor does already have a dtype argument but it accepts only a single value, and tabular data is very frequently not of a uniform data type. Because the columns argument is optional it should probably accept a list/sequence of dtypes instead of just a dict keyed by column names as in read_csv().

API breaking implications

None, this would add a new optional argument to DataFrame.from_records() and expand the allowed values of the dtype argument to the constructor without changing the interpretation of currently accepted values.

Describe alternatives you've considered

df['A'] = df['A'].astype(dtype)

may work in a lot of cases, but mine requires an ugly list comprehension to undo the float conversion. Being able to manually specify the column data types seems like a no-brainer.

@jlumpe jlumpe added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 25, 2022
@mroeschke mroeschke added DataFrame DataFrame data structure and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 11, 2022
@mroeschke
Copy link
Member

Thanks for the report. I think this was discussed in the past and was not moved forward with because astype has more thorough dtype conversion handling than the constructors. (astype also takes a dict of column label -> type such that a list comprehension is not needed)

@mroeschke mroeschke added the Closing Candidate May be closeable, needs more eyeballs label Aug 11, 2022
@phofl
Copy link
Member

phofl commented Aug 19, 2022

Covered by #4464

@phofl phofl closed this as completed Aug 19, 2022
@phofl phofl added this to the No action milestone Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs DataFrame DataFrame data structure Enhancement
Projects
None yet
Development

No branches or pull requests

3 participants