Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_dummies with sparse doesn't convert numeric to sparse #18686

Closed
NagabhushanS opened this issue Dec 8, 2017 · 4 comments · Fixed by #18924
Closed

get_dummies with sparse doesn't convert numeric to sparse #18686

NagabhushanS opened this issue Dec 8, 2017 · 4 comments · Fixed by #18924
Labels
Bug Needs Info Clarification about behavior needed to assess issue Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type
Milestone

Comments

@NagabhushanS
Copy link

I got the error
AttributeError: 'IntBlock' object has no attribute 'sp_index'

when converting a SparseDataFrame to Scipy csr_matrix using the following code:

dfTotalCat = get_dummies(dfTotalCat, sparse=True)

XTotalCat = csr_matrix(dfTotalCat.to_coo())

The SparseDataFrame is obtained from get_dummies.

Following is the exact error trace:

Traceback (most recent call last):
File "pandaSrc.py", line 76, in
XTotalCat = csr_matrix(dfTotalCat.to_coo())
File "C:\Users\nagabhushan.s\AppData\Local\Programs\Python\Python36\lib\site-p
ackages\pandas\core\sparse\frame.py", line 255, in to_coo
row = s.sp_index.to_int_index().indices
File "C:\Users\nagabhushan.s\AppData\Local\Programs\Python\Python36\lib\site-p
ackages\pandas\core\generic.py", line 3614, in getattr
return object.getattribute(self, name)
File "C:\Users\nagabhushan.s\AppData\Local\Programs\Python\Python36\lib\site-p
ackages\pandas\core\sparse\series.py", line 245, in sp_index
return self.block.sp_index
AttributeError: 'IntBlock' object has no attribute 'sp_index'

@TomAugspurger
Copy link
Contributor

Could you make a reproducible examples? What's dfTotalCat?

@TomAugspurger TomAugspurger added the Needs Info Clarification about behavior needed to assess issue label Dec 8, 2017
@dfaivre
Copy link

dfaivre commented Dec 22, 2017

If not all of your columns are dummy encoded, then it will return some columns that are not sparse. Seems like if you df.to_sparse() before dummy encoding, the error should go away.

@TomAugspurger -- I don't know enough to know if this is expected behavior and docs just need to be updated (took me a bit to figure it out...)

Repro code below

import pandas as pd

df = pd.DataFrame(
    {
        "A": ["a", "b", "c", "a"],
        "B": [1, 2, 3, 4]
    })
df['A'] = df['A'].astype('category')


def _throw_no_attribute_sp_index_err():
    one_hot = pd.get_dummies(df, sparse=True)
    print(one_hot.columns)
    one_hot.to_coo()


def _no_throw():
    one_hot = pd.get_dummies(df.to_sparse(), sparse=True)
    print(one_hot.columns)
    one_hot.to_coo()

@TomAugspurger
Copy link
Contributor

Thanks. I'm not sure that get_dummies(sparse=True) should convert numeric columns. I think it's best to just document this.

@TomAugspurger TomAugspurger added the Sparse Sparse Data Type label Dec 24, 2017
@TomAugspurger TomAugspurger changed the title Attribute Error when converting SparseDataFrame to Scipy sparse csr_matrix. get_dummies sparse with sparse doesn't convert numeric to sparse Dec 24, 2017
@TomAugspurger TomAugspurger changed the title get_dummies sparse with sparse doesn't convert numeric to sparse get_dummies with sparse doesn't convert numeric to sparse Dec 24, 2017
@hexgnu
Copy link
Contributor

hexgnu commented Dec 26, 2017

I did a little digging into this... and what happens is that get_dummies somehow casts the non-sparse column as sparse even though the underlying block is not sparse. Which causes some cascading issues like the sp_index error. Haven't quite figured out what is going on with that but right now my hypothesis is that it's something to do with how concat is working with sparse frames.

@jreback jreback added this to the 0.23.0 milestone Jan 1, 2018
@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Docs labels Jan 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Info Clarification about behavior needed to assess issue Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants