Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting types on DataFrame constructor not working, while setting them after construction does. #21445

Closed
acburigo opened this issue Jun 12, 2018 · 8 comments
Labels
Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request

Comments

@acburigo
Copy link

acburigo commented Jun 12, 2018

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: pd.DataFrame(columns=['a', 'b'], dtype={'a': str, 'b': int})
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-738409e7d339> in <module>()
----> 1 pd.DataFrame(columns=['a', 'b'], dtype={'a': str, 'b': int})

c:\users\arthu\appdata\local\programs\python\python36\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
    319             data = {}
    320         if dtype is not None:
--> 321             dtype = self._validate_dtype(dtype)
    322
    323         if isinstance(data, DataFrame):

c:\users\arthu\appdata\local\programs\python\python36\lib\site-packages\pandas\core\generic.py in _validate_dtype(self, dtype)
    150
    151         if dtype is not None:
--> 152             dtype = pandas_dtype(dtype)
    153
    154             # a compound dtype

c:\users\arthu\appdata\local\programs\python\python36\lib\site-packages\pandas\core\dtypes\common.py in pandas_dtype(dtype)
   1949
   1950     try:
-> 1951         npdtype = np.dtype(dtype)
   1952     except (TypeError, ValueError):
   1953         raise

c:\users\arthu\appdata\local\programs\python\python36\lib\site-packages\numpy\core\_internal.py in _usefields(adict, align)
     60         names = None
     61     if names is None:
---> 62         names, formats, offsets, titles = _makenames_list(adict, align)
     63     else:
     64         formats = []

c:\users\arthu\appdata\local\programs\python\python36\lib\site-packages\numpy\core\_internal.py in _makenames_list(adict, align)
     28     for fname in fnames:
     29         obj = adict[fname]
---> 30         n = len(obj)
     31         if not isinstance(obj, tuple) or n not in [2, 3]:
     32             raise ValueError("entry not a 2- or 3- tuple")

TypeError: object of type 'type' has no len()

In [3]: pd.DataFrame(columns=['a', 'b']).astype({'a': str, 'b': int})
Out[3]:
Empty DataFrame
Columns: [a, b]
Index: []

Problem description

Why does setting types on constructor does not work, when setting them after construction is OK (see code)?

Expected Output

See code above.

Output of pd.show_versions()

Happens with pandas 0.22.00 and 0.23.00 (not sure about other versions).

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0
pytest: None
pip: 10.0.1
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 6.3.1
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@WillAyd
Copy link
Member

WillAyd commented Jun 12, 2018

Looks like its just an issue with the constructor. Investigation and PRs welcome!

@WillAyd WillAyd added the Dtype Conversions Unexpected or buggy dtype conversions label Jun 12, 2018
@gfyoung gfyoung added the Bug label Jun 13, 2018
@uds5501
Copy link
Contributor

uds5501 commented Jun 18, 2018

@gfyoung @WillAyd @acburigo I would like to draw your attention to the Dataframe documentation. Now I may be wrong in the inference and may not have noticed a dictionary / tuple unpacking sequence in the generator, but the documentation says that only one dtype is allowed during initialization.

pandas/pandas/core/frame.py

Lines 249 to 273 in 9e982e1

class DataFrame(NDFrame):
""" Two-dimensional size-mutable, potentially heterogeneous tabular data
structure with labeled axes (rows and columns). Arithmetic operations
align on both row and column labels. Can be thought of as a dict-like
container for Series objects. The primary pandas data structure.
Parameters
----------
data : numpy ndarray (structured or homogeneous), dict, or DataFrame
Dict can contain Series, arrays, constants, or list-like objects
.. versionchanged :: 0.23.0
If data is a dict, argument order is maintained for Python 3.6
and later.
index : Index or array-like
Index to use for resulting frame. Will default to RangeIndex if
no indexing information part of input data and no index provided
columns : Index or array-like
Column labels to use for resulting frame. Will default to
RangeIndex (0, 1, 2, ..., n) if no column labels are provided
dtype : dtype, default None
Data type to force. Only a single dtype is allowed. If None, infer
copy : boolean, default False
Copy data from inputs. Only affects DataFrame / 2d ndarray input

@WillAyd
Copy link
Member

WillAyd commented Jun 18, 2018

Hmm good point @uds5501 . I didn't even look at that to be honest.

We do allow dicts for dtype in other areas so I assumed that was the case in the constructor. If you want to look at the constructor source and see what is allowed, then I think we could either:

  • Raise a more informative error message when passing anything other than a single type OR
  • Figure out whatever is going on with the above and be sure to update the docs appropriately

@uds5501
Copy link
Contributor

uds5501 commented Jun 18, 2018

@WillAyd In my opinion , raising a more informative message would be a quicker path (not sure about the best though) and work upon introducing a dictionary unpacker in with constructor in the future?

@WillAyd
Copy link
Member

WillAyd commented Jun 18, 2018

I don't disagree, but I suppose if the documentation was wrong (i.e. it looks like the code should allow for this type of construction) then I would lean towards the latter.

It would also be entirely reasonable for one PR to have a better error message (this would be easier as you mentioned) and then have a separate issue / PR to enhance this and allow more than one dtype, if someone wants to make that contribution

@uds5501
Copy link
Contributor

uds5501 commented Jun 18, 2018

@WillAyd Well, I looked through the documentation and could not see a single instance where it may indicate that such multi data type construction is allowed. I would look at it a bit thoroughly again and will file a PR for putting in better error message if couldn't get on

@jreback
Copy link
Contributor

jreback commented Jun 18, 2018

there is an open issue for this already

@WillAyd WillAyd added Duplicate Report Duplicate issue or pull request and removed Bug labels Jun 18, 2018
@WillAyd
Copy link
Member

WillAyd commented Jun 18, 2018

I think I found that #4464 - closing this in favor of that

@WillAyd WillAyd closed this as completed Jun 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

5 participants