-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_dummies with sparse doesn't convert numeric to sparse #18686
Comments
Could you make a reproducible examples? What's |
If not all of your columns are dummy encoded, then it will return some columns that are not sparse. Seems like if you @TomAugspurger -- I don't know enough to know if this is expected behavior and docs just need to be updated (took me a bit to figure it out...) Repro code below
|
Thanks. I'm not sure that |
I did a little digging into this... and what happens is that get_dummies somehow casts the non-sparse column as sparse even though the underlying block is not sparse. Which causes some cascading issues like the sp_index error. Haven't quite figured out what is going on with that but right now my hypothesis is that it's something to do with how concat is working with sparse frames. |
I got the error
AttributeError: 'IntBlock' object has no attribute 'sp_index'
when converting a SparseDataFrame to Scipy csr_matrix using the following code:
dfTotalCat = get_dummies(dfTotalCat, sparse=True)
XTotalCat = csr_matrix(dfTotalCat.to_coo())
The SparseDataFrame is obtained from get_dummies.
Following is the exact error trace:
Traceback (most recent call last):
File "pandaSrc.py", line 76, in
XTotalCat = csr_matrix(dfTotalCat.to_coo())
File "C:\Users\nagabhushan.s\AppData\Local\Programs\Python\Python36\lib\site-p
ackages\pandas\core\sparse\frame.py", line 255, in to_coo
row = s.sp_index.to_int_index().indices
File "C:\Users\nagabhushan.s\AppData\Local\Programs\Python\Python36\lib\site-p
ackages\pandas\core\generic.py", line 3614, in getattr
return object.getattribute(self, name)
File "C:\Users\nagabhushan.s\AppData\Local\Programs\Python\Python36\lib\site-p
ackages\pandas\core\sparse\series.py", line 245, in sp_index
return self.block.sp_index
AttributeError: 'IntBlock' object has no attribute 'sp_index'
The text was updated successfully, but these errors were encountered: