to_sparse bug with fill_value specified #1375

changhiskhan · 2012-06-01T14:10:13Z

originally raised on pydata mailing list:

In [50]: DataFrame({'x': [1., 1.]}).to_sparse(fill_value=0).x.mean()
Out[50]: 0.5

In [51]: DataFrame({'x': [1., 1.]}).to_sparse().x.mean()
Out[51]: 1.0

grsr · 2012-06-01T14:55:16Z

I have looked quickly at the python code implementing this and it appears that in both SparseArray.mean and SparseArray.sum the nsparse variable is counting the number of non-sparse entries rather than the number of sparse entries, and this is the cause of the incorrect values returned from these methods. I think that setting nsparse = self.sp_index.length - self.sp_index.npoints in both methods should fix this issue, but I don't understand the code well enough to be sure that this is correct.

changhiskhan · 2012-06-01T16:03:49Z

@grsr that's pretty much what I did. Thanks for the input.
If you're looking for ways to get involved without digging too deep into the codebase, we'll soon start providing "Community" labels issues that we think are more discrete and require less staring at too much of pandas internals.

ghost assigned changhiskhan Jun 1, 2012

changhiskhan pushed a commit that referenced this issue Jun 1, 2012

BUG: sparse reduction bug #1375

4e6a055

changhiskhan closed this as completed Jun 1, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to_sparse bug with fill_value specified #1375

to_sparse bug with fill_value specified #1375

changhiskhan commented Jun 1, 2012

grsr commented Jun 1, 2012

changhiskhan commented Jun 1, 2012

to_sparse bug with fill_value specified #1375

to_sparse bug with fill_value specified #1375

Comments

changhiskhan commented Jun 1, 2012

grsr commented Jun 1, 2012

changhiskhan commented Jun 1, 2012