Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected to_dict output after upgrading to 0.24.1 #25408

Open
atlasstrategic opened this issue Feb 22, 2019 · 3 comments
Open

Unexpected to_dict output after upgrading to 0.24.1 #25408

atlasstrategic opened this issue Feb 22, 2019 · 3 comments
Labels
Bug Regression Functionality that used to work in a prior pandas version

Comments

@atlasstrategic
Copy link

Code to illustrate 0.23.4 versus 0.24.1

pandas 0.23.4

import pandas as pd

columns = ["A", "B"]
pd.DataFrame(columns=columns).transpose().to_dict(orient="split")
# output
# {'index': ['A', 'B'], 'columns': [], 'data': [[], []]}

pd.DataFrame(columns=columns).transpose().values.tolist()
# output
# [[], []]

pandas 0.24.1

import pandas as pd

columns = ["A", "B"]
pd.DataFrame(columns=columns).transpose().to_dict(orient="split")
# output not consistent with 0.23.4
# {'index': ['A', 'B'], 'columns': [], 'data': []}

pd.DataFrame(columns=columns).transpose().values.tolist()
# output is consistent with 0.23.4
# [[], []]

Problem description

After upgrading to Pandas 0.24.1, the expected output when doing a transpose, followed by a to_dict has changed. Before, an empty DataFrame was returning an empty list for each column (similar to values.tolist()) e.g. 'data': [[]. []]. Now the output is 'data': [].

Expected Output

Refer to code and output for 0.23.4 above.

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-45-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_ZA.UTF-8
LOCALE: en_ZA.UTF-8

pandas: 0.24.1
pytest: None
pip: 19.0.2
setuptools: 40.4.3
Cython: None
numpy: 1.14.0
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.5.12
xlrd: None
xlwt: None
xlsxwriter: 1.1.2
lxml.etree: None
bs4: 4.7.1
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: 2.7.7 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
@jorisvandenbossche
Copy link
Member

Thanks for the report!

We had a regression in to_dict in 0.24.0, which should be fixed in 0.24.1 (#24965). But apparently there are some more corner cases broken as well (although this is certainly much more a corner case where the correct answer is less obvious I would say that the other ones we have fixed).

This is another consequence of the usage of itertuples:

In [13]: df = pd.DataFrame(columns=['A', 'B']).transpose()                                                                                                                      

In [14]: list(df.itertuples(name=None, index=False))                                                                                                                            
Out[14]: []

In [15]: list(df.itertuples(name=None, index=True))                                                                                                                             
Out[15]: [('A',), ('B',)]

You could say that the output [] should actually be [(), ()] (which would solve the to_dict problem as well).

@jorisvandenbossche jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Feb 22, 2019
@jorisvandenbossche jorisvandenbossche added this to the 0.24.2 milestone Feb 22, 2019
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.24.2, 0.24.3 Mar 11, 2019
@jreback jreback modified the milestones: 0.24.3, Contributions Welcome Apr 20, 2019
@MarcoGorelli
Copy link
Member

take

@MarcoGorelli
Copy link
Member

You could say that the output [] should actually be [(), ()]

What if name is not None? Would we expect

[Pandas(_0=()), Pandas(_0=())]

?

@mroeschke mroeschke added the Bug label May 11, 2020
@MarcoGorelli MarcoGorelli removed their assignment Dec 20, 2020
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants