Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: pd.DataFrame(dtype) arg cannot be list, dict, Series. And None will infer wider type than necessary. #14764

Closed
smcinerney opened this issue Nov 29, 2016 · 8 comments · Fixed by #16487
Labels
Docs Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@smcinerney
Copy link

smcinerney commented Nov 29, 2016

Code Sample, a copy-pastable example if possible

# This is a DOCbug simply to document what pd.DataFrame(dtype) currently does.
# It is non-obvious to new users, gives non-obvious error messages, and also behaves differently to read_csv(dtype)

# a) Leaving dtype=None in constructor will infer a wider type than necessary
df_cols = {'year':np.int32, 'month':np.int8}
df = pd.DataFrame(columns=df_cols.keys(), dtype=None, index=range(10), data=-1)
>>> df.dtypes
month    int64
year     int64

# b) The doc doesn't explicitly say a list/dict/Series/array-like is not allowed (and if you
# pass in one the error is not very friendly). Also behaves differently to read_csv(dtype)
df = pd.DataFrame(columns=df_cols.keys(), dtype=np.int32, index=range(10), data=-1)
# df.dtypes shows they're all np.int32
# Fix up dtypes after declaration
for col,coltype in df_cols.items():
    df[col] = df[col].astype(coltype) 

Problem description

The DataFrame() doc doesn't explicitly say a list/dict/Series/array-like is not allowed (and if you pass in one the error is not very friendly). Also behaves differently to read_csv(dtype).
Leaving dtype=None in constructor will infer a wider type than necessary.
So in general you either set dtype=widest_necessary_type, or dtype=None and then manually fix them up after declaration, by casting with astype()

Expected Output

Output of pd.show_versions()

python: 2.7.10.final.0 python-bits: 64 machine: x86_64 processor: i386 byteorder: little pandas: 0.19.1 numpy: 1.11.2 scipy: 0.18.1
@chris-b1
Copy link
Contributor

repro for the error case

df = pd.DataFrame(columns=['year', 'month'], 
                  dtype={'year': 'int32', 'month': 'int8'},
                  index=range(10), data=-1)

I could see an argument for supporting this, consistent with #13375

@smcinerney
Copy link
Author

To be clear, I'm only asking for the documentation to reflect what currently happens. It's pretty confusing. We can live with fixing-up the dtypes after the df is declared; it would be good to document that workaround.

@jreback
Copy link
Contributor

jreback commented Nov 29, 2016

this is a (partial) duplicate of: #4464 (which is the actual impl issue). No objections to clarifying the doc-string.

@jreback jreback added Docs Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 29, 2016
@jreback jreback added this to the Next Major Release milestone Nov 29, 2016
@patniharshit
Copy link
Contributor

I would like to do this but I am not exactly sure what to do. Should I copy paste the example given by @smcinerney in docstring of DataFrame class in pandas/core/frame.py ?

@jreback
Copy link
Contributor

jreback commented Dec 12, 2016

@patniharshit well, the idea is to create a nice-example / expl in the doc-string. so you can certainly start with that.

@VincentLa
Copy link
Contributor

Looks like this issue got dropped. Happy to see if i can take a stab at this now. If anyone is at pycon sprints, happy to pair!

@VincentLa
Copy link
Contributor

Ok added an example in the docstring. Would welcome feedback about its usefulness!

jorisvandenbossche pushed a commit that referenced this issue May 31, 2017
* Adding some more documentation on dataframe with regards to dtype

* Making example for creating dataframe from np matrix easier
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.21.0, Next Major Release May 31, 2017
Kiv pushed a commit to Kiv/pandas that referenced this issue Jun 11, 2017
…as-dev#16487)

* Adding some more documentation on dataframe with regards to dtype

* Making example for creating dataframe from np matrix easier
stangirala pushed a commit to stangirala/pandas that referenced this issue Jun 11, 2017
…as-dev#16487)

* Adding some more documentation on dataframe with regards to dtype

* Making example for creating dataframe from np matrix easier
@techvslife
Copy link

Commented here on this issue, to note it's still a problem--maybe fixed in next release? thanks:
#4464

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
7 participants