Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: reindex(columns=..) after get_dummies raises TypeError: values must be SparseArray #18914

Closed
ghost opened this issue Dec 23, 2017 · 4 comments · Fixed by #18924
Closed
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Sparse Sparse Data Type
Milestone

Comments

@ghost
Copy link

ghost commented Dec 23, 2017

Code Sample

df = pd.DataFrame.from_items([('GDP', [1, 2]),('Nation', ['AB', 'CD'])])
df = pd.get_dummies(df, columns=['Nation'], sparse=True)  # SparseDataFrame
df.reindex(columns=['GDP'])  # Fails :/

TypeError: values must be SparseArray

Problem description

I'm doing a pandas upgrade from 0.19.x to 0.21.x for my project. The above code works under 0.19.x, but not under 0.21.x.

@jreback
Copy link
Contributor

jreback commented Dec 23, 2017

hmm that does look buggy.

cc @Licht-T

getitem works ok here

In [10]: df[['GDP']]
Out[10]: 
   GDP
0    1
1    2

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Sparse Sparse Data Type Difficulty Intermediate labels Dec 23, 2017
@jreback jreback added this to the Next Major Release milestone Dec 23, 2017
@jreback
Copy link
Contributor

jreback commented Dec 23, 2017

@ShadowGiraffe welcome for an investigation / PR

@jreback jreback changed the title [BUG] reindex(columns=..) after get_dummies raises TypeError: values must be SparseArray BUG: reindex(columns=..) after get_dummies raises TypeError: values must be SparseArray Dec 23, 2017
@hexgnu
Copy link
Contributor

hexgnu commented Dec 24, 2017

I think I figured out the problem. Inside of get_dummies not all columns are cast as sparse but are marked as such causing some interesting issues down the line when trying to reindex.

Added a PR for that fix.

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Jan 1, 2018
@summerela
Copy link

I just ran into this issue with the latest version of pandas. Please let me know if you would like me to post my code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants