BUG: Fix wrong SparseBlock initialization #17386

Licht-T · 2017-08-31T00:33:43Z

closes AttributeError: 'IntBlock'/'FloatBlock'/etc. object has no attribute 'sp_index' #17198
tests added / passed

Passed the same tests which the current master branch pass.
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

codecov · 2017-08-31T02:20:42Z

Codecov Report

Merging #17386 into master will decrease coverage by 0.04%.
The diff coverage is 71.42%.

@@            Coverage Diff             @@
##           master   #17386      +/-   ##
==========================================
- Coverage   91.03%   90.99%   -0.05%     
==========================================
  Files         163      163              
  Lines       49580    49586       +6     
==========================================
- Hits        45137    45119      -18     
- Misses       4443     4467      +24

Flag	Coverage Δ
#multiple	`88.77% <71.42%> (-0.03%)`	⬇️
#single	`40.25% <42.85%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/sparse/frame.py	`94.7% <100%> (ø)`	⬆️
pandas/core/internals.py	`94.2% <66.66%> (-0.1%)`	⬇️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/plotting/_converter.py	`63.23% <0%> (-1.82%)`	⬇️
pandas/core/frame.py	`97.72% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 64c8a8d...7928aaa. Read the comment docs.

codecov · 2017-08-31T02:20:54Z

Codecov Report

Merging #17386 into master will decrease coverage by 0.02%.
The diff coverage is 91.78%.

@@            Coverage Diff             @@
##           master   #17386      +/-   ##
==========================================
- Coverage   91.43%    91.4%   -0.03%     
==========================================
  Files         163      163              
  Lines       50091    50148      +57     
==========================================
+ Hits        45800    45838      +38     
- Misses       4291     4310      +19

Flag	Coverage Δ
#multiple	`89.21% <91.78%> (-0.01%)`	⬇️
#single	`40.34% <27.39%> (-0.08%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/sparse/frame.py	`94.82% <100%> (+0.04%)`	⬆️
pandas/core/sparse/array.py	`92.05% <100%> (-0.24%)`	⬇️
pandas/core/internals.py	`94.32% <90.9%> (-0.22%)`	⬇️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.8% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 96a5274...6b36d55. Read the comment docs.

jreback · 2017-08-31T10:31:47Z

needs tests

jreback

w/o tests I don't even know what you are trying to do. Further we don't use if/then like this at all, instead dispatch per-block. but pls add tests first.

Licht-T · 2017-09-02T07:21:18Z

@jreback Thanks for your comment. I've just added test codes.

jreback

first need to get the tests correct

jreback · 2017-09-09T17:09:35Z

pandas/core/internals.py

@@ -527,7 +535,7 @@ def f(m, v, i):

        return self.split_and_operate(None, f, False)

-    def astype(self, dtype, copy=False, errors='raise', values=None, **kwargs):
+    def astype(self, dtype, copy=True, errors='raise', values=None, **kwargs):


pls don't change things like this, which completely changes semantics

jreback · 2017-09-09T17:10:23Z

pandas/core/internals.py

@@ -442,12 +443,19 @@ def make_a_block(nv, ref_loc):
                    nv = _block_shape(nv, ndim=self.ndim)
                except (AttributeError, NotImplementedError):
                    pass
-                block = self.make_block(values=nv,
-                                        placement=ref_loc, fastpath=True)
+


I don't like this pattern at all, nv should be a sparse array at this point; the inference should just work. is that not the case?

jreback · 2017-09-09T17:11:28Z

pandas/core/internals.py

@@ -1562,6 +1581,10 @@ def _nanpercentile(values, q, axis, **kw):
        result = self._try_coerce_result(result)
        if is_scalar(result):
            return ax, self.make_block_scalar(result)
+


again this should be a sparse array, if its not then you need to implement this on the SparseBlock itself (and you can certainly call the super method). I don't like type checking inside a block.

jreback · 2017-09-09T17:11:39Z

pandas/core/internals.py

@@ -2653,7 +2676,7 @@ def sp_index(self):
    def kind(self):
        return self.values.kind

-    def _astype(self, dtype, copy=False, raise_on_error=True, values=None,
+    def _astype(self, dtype, copy=True, raise_on_error=True, values=None,


do not do this

jreback · 2017-09-09T17:12:08Z

pandas/core/sparse/frame.py

@@ -321,8 +321,8 @@ def _apply_columns(self, func):
            data=new_data, index=self.index, columns=self.columns,
            default_fill_value=self.default_fill_value).__finalize__(self)

-    def astype(self, dtype):
-        return self._apply_columns(lambda x: x.astype(dtype))
+    def astype(self, dtype, copy=True, errors='raise'):


does this match the super signature?

jreback · 2017-09-09T17:12:24Z

pandas/tests/sparse/test_frame.py

@@ -1385,3 +1385,62 @@ def test_numpy_func_call(self):
                 'std', 'min', 'max']
        for func in funcs:
            getattr(np, func)(self.frame)
+
+    def test_where(self):
+        data = [[1, 2], [3, 4]]


add the issue number as a comment

jreback · 2017-09-09T17:12:52Z

pandas/tests/sparse/test_frame.py

+        sparse_df = SparseDataFrame(data)
+        result = sparse_df.where(sparse_df >= 2)
+
+        dense_df = DataFrame(data)


so these return dense frames????

jreback · 2017-09-09T17:13:13Z

pandas/tests/sparse/test_frame.py

+        q = 0.1
+
+        sparse_df = SparseDataFrame(data)
+        result = sparse_df.quantile(q)


these are very odd tests, why are you returning dense frames?

jreback · 2017-09-09T17:13:22Z

pandas/tests/sparse/test_series.py

+        result = sparse.where(sparse >= 2)
+
+        dense = Series(data)
+        expected = dense.where(dense >= 2)


same questions as above

Licht-T · 2017-09-15T00:24:59Z

@jreback Thanks for your review! I've found the initial solution is not enough through fixing tests.
Also I've modified the solution!

jreback

you are changing lots of things. can you do this step by step rather that all at once. you are touching some pretty gnarly code.

jreback · 2017-09-15T01:36:03Z

pandas/core/internals.py

@@ -978,7 +984,7 @@ def f(m, v, i):

        return [self.make_block(new_values, fastpath=True)]

-    def coerce_to_target_dtype(self, other):
+    def coerce_to_target_dtype(self, other, copy=False):


hmm, I dont think this actually helps, why did you add it?

SparseBlock astype(dtype, copy=False) makes reinterpret cast, so I override coerce_to_target_dtype and set copy=True in SparseBlock class.
203a8f9#diff-e705e723b2d6e7c0e2a0443f80916abfR2639

jreback · 2017-09-15T01:37:13Z

pandas/core/internals.py

+                                          **kwargs)
+
+        dtype = self.values.sp_values.dtype
+


this is pretty complex, pls simplify

I added these codes because coerce_to_target_dtype does not work well in SparseBlock. This is only checking the type information of SparseArray and these procedure is also implemented in IntBlock, etc. I cannot figure out how to simplify. Any suggestion for simplifying?

I am not against this theorectially, but the implementation is fragile here. there are functions for validation already in the sparse classes.

Licht-T · 2017-09-15T14:12:32Z

@jreback Okay. I'll split the commit.

Licht-T · 2017-09-16T14:17:58Z

@jreback The big commit is now split. If you have any questions, feel free to ask.

Licht-T · 2017-09-16T16:18:48Z

Rebased to change commit logs

jreback

again you are doing too much in this PR. I require much simpler PR's that are built on top of each other. IOW pls break this apart. You can add tests for everything in a single PR (the first) and simply xfail tests that don't work. That will give you a basis to build on, and simplify understandbility of the reviews / PR / code. Adding paths via if/thens is not a good way forward here.

jreback · 2017-09-17T14:48:25Z

pandas/core/internals.py

-        return isinstance(element, dtype)
+        else:
+            element_dtype = infer_dtype_from(element, pandas_dtype=True)[0]
+            return isinstance(element, dtype) or dtype == element_dtype


jreback · 2017-09-17T14:49:32Z

pandas/core/internals.py

@@ -995,7 +1001,7 @@ def coerce_to_target_dtype(self, other):

        if self.is_bool or is_object_dtype(dtype) or is_bool_dtype(dtype):
            # we don't upcast to bool
-            return self.astype(object)
+            return self.astype(object, copy=copy)


I guess this is ok, though numpy ignores the copy= flag when dtype is object so no point in passing it if dtype is object

jreback · 2017-09-17T14:49:42Z

pandas/core/internals.py


            raise AssertionError("possible recursion in "
                                 "coerce_to_target_dtype: {} {}".format(
                                     self, other))

        try:
-            return self.astype(dtype)
+            return self.astype(dtype, copy=copy)


here i guess is ok

jreback · 2017-09-17T14:50:01Z

pandas/core/internals.py

@@ -1382,6 +1388,11 @@ def where(self, other, cond, align=True, raise_on_error=True,
        if hasattr(other, 'reindex_axis'):
            other = other.values

+        if is_scalar(other) or is_list_like(other):


jreback · 2017-09-17T14:50:56Z

pandas/core/internals.py

@@ -1394,6 +1405,9 @@ def where(self, other, cond, align=True, raise_on_error=True,
        if not hasattr(cond, 'shape'):
            raise ValueError("where must have a condition that is ndarray "
                             "like")
+        else:


huh? if you find yourself adding an if/then pretty much anywhere then you are doing it wrong. better to add the method to Sparse and call super; sometimes the super method may need a bit of refactor to make it more general.

jreback · 2017-09-17T14:51:19Z

pandas/core/internals.py

@@ -1440,7 +1454,12 @@ def func(cond, values, other):
            if try_cast:
                result = self._try_cast_result(result)

-            return self.make_block(result)
+            if isinstance(result, np.ndarray):


again this is just completely confusing to do and makes the code way more complex. find a better way

jreback · 2017-09-17T14:51:36Z

pandas/core/internals.py

@@ -1713,6 +1733,7 @@ class FloatBlock(FloatOrComplexBlock):
    is_float = True
    _downcast_dtype = 'int64'

+    @classmethod


this is such a huge change, what is the purpose?

jreback · 2017-09-17T14:52:44Z

pandas/core/internals.py

+                                          **kwargs)
+
+        dtype = self.values.sp_values.dtype
+


I am not against this theorectially, but the implementation is fragile here. there are functions for validation already in the sparse classes.

jreback · 2017-09-17T14:53:01Z

pandas/core/internals.py

+            self._can_hold_na = False
+
+    def _can_hold_element(self, element):
+        """ require the same dtype as ourselves """


again this is so complex and adds so much techincal debt.

jreback · 2017-09-17T14:53:57Z

pandas/core/internals.py

@@ -2769,9 +2858,15 @@ def sparse_reindex(self, new_index):
        return self.make_block_same_class(values, sparse_index=new_index,
                                          placement=self.mgr_locs)

+    def _try_coerce_result(self, result):


so this is the right idea, though you shoul dgenerally have a function that intercepts a ndarray and creates a SparseArray; it should be called for most sparse methods.

jreback · 2017-11-10T20:17:16Z

@Licht-T I believe you have all of the xfail tests in. pls rebase and update.

Licht-T · 2017-11-12T06:09:01Z

@jreback Yeah! Nothing stands in my way!

BUG: Fix wrong SparseBlock initialization in quantile method BUG: Fix make_spase mask generation not to cast when dtype is object BUG: Add SparseArray.all method BUG: Add copy parameter to prevent reinterpret cast of sparse Revert and fix astype parameters BUG: Create SparseBlock.__init__ to set type information of SparseArray BUG: Override SparseBlock._can_hold_element Revert changes in Block.whare BUG: Override SparseBlock.make_block with fill_value argument BUG: Set fill_value and ndim parameter in make_block when generating SparseBlock from result BUG: Override SparseBlock._try_coerce_result to make result flatten and sparse BUG: Change form _can_hold_na to _can_hold_element for supporting non NA fill value BUG: Fix 1D check statement SparseDataFrame.where passes (1, n)-shape SparseBlock, but actual values is n-length SparseArray BUG: Adjust cond shape to SparseBlock SparseDataFrame.where passes (1, n)-shape SparseBlock and condition block to Block.where, but it compares n-length SparseArray held by the SparseBlock and (1, n)-shape condition block. BUG: Override SparseDataFrame.where method to set _default_fill_value

Licht-T · 2017-11-12T09:35:26Z

Now rebased.

jreback · 2017-11-12T15:27:16Z

@Licht-T can you see if you can simplify this, there seem to be lots of if/then cases. I think you might be able to define / override some sparse methods in internals to avoid this.

jreback · 2017-12-28T12:35:35Z

@Licht-T can you update?

jreback · 2018-07-07T14:49:19Z

closing as stale, though we'd definitely take a fixed up / rebased version updated for comments

gfyoung added 2/3 Compat Sparse Sparse Data Type Bug and removed 2/3 Compat labels Aug 31, 2017

jreback requested changes Sep 1, 2017

View reviewed changes

Licht-T force-pushed the fix-wrong-sparseblock-initialization branch from ea077b3 to 785bbcf Compare September 2, 2017 04:30

jreback requested changes Sep 9, 2017

View reviewed changes

jreback requested changes Sep 15, 2017

View reviewed changes

Licht-T force-pushed the fix-wrong-sparseblock-initialization branch from 2d94fba to 1c0613b Compare September 16, 2017 14:06

Licht-T force-pushed the fix-wrong-sparseblock-initialization branch from 1c0613b to d0bf226 Compare September 16, 2017 16:14

jreback requested changes Sep 17, 2017

View reviewed changes

This was referenced Sep 18, 2017

TST: Add tests for sparse quantile/where #17568

Merged

BUG: Add SparseArray.all #17570

Merged

BUG: Fix make_sparse mask generation #17574

Merged

Licht-T added 3 commits November 12, 2017 18:27

BUG: Fix wrong argument in Sparse.where

1cbb4a8

TST: Remove xfail/skip marks from Sparse.where tests

6b36d55

Licht-T force-pushed the fix-wrong-sparseblock-initialization branch from d0bf226 to 6b36d55 Compare November 12, 2017 09:34

jreback closed this Jul 7, 2018

BUG: Fix wrong SparseBlock initialization #17386

BUG: Fix wrong SparseBlock initialization #17386

Conversation

Licht-T commented Aug 31, 2017 • edited Loading

codecov bot commented Aug 31, 2017

Codecov Report

codecov bot commented Aug 31, 2017 • edited Loading

Codecov Report

jreback commented Aug 31, 2017

jreback left a comment

Choose a reason for hiding this comment

Licht-T commented Sep 2, 2017 • edited Loading

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Licht-T commented Sep 15, 2017

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Licht-T Sep 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Licht-T commented Sep 15, 2017

Licht-T commented Sep 16, 2017 • edited Loading

Licht-T commented Sep 16, 2017

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Nov 10, 2017

Licht-T commented Nov 12, 2017

Licht-T commented Nov 12, 2017

jreback commented Nov 12, 2017

jreback commented Dec 28, 2017

jreback commented Jul 7, 2018

Licht-T commented Aug 31, 2017 •

edited

Loading

codecov bot commented Aug 31, 2017 •

edited

Loading

Licht-T commented Sep 2, 2017 •

edited

Loading

Licht-T Sep 15, 2017 •

edited

Loading

Licht-T commented Sep 16, 2017 •

edited

Loading