CLN GH23123 Move SparseArray to arrays #23147

JustinZhengBC · 2018-10-14T11:29:51Z

closes Move SparseArray implementation to pandas/core/arrays #23123
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Importing SparseArray from the Arrays folder was causing circular import paths with the __init__.py, so I had to empty it and fix every other file that imported from pandas.core.arrays

jorisvandenbossche · 2018-10-14T12:13:46Z

This change in imports of other arrays everywhere should not be needed. Can you detail where the circular import problem was coming from?

datapythonista · 2018-10-14T16:54:27Z

@JustinZhengBC you've got some print() calls in your PR that I guess were for your own debugging. Can you remove them?

jreback · 2018-10-14T17:02:16Z

pandas/core/arrays/__init__.py

-                   ExtensionScalarOpsMixin)
-from .categorical import Categorical  # noqa
-from .datetimes import DatetimeArrayMixin  # noqa
-from .interval import IntervalArray  # noqa


this is a nice pattern, pls restore this

JustinZhengBC · 2018-10-14T18:00:18Z

The main import loop problem is in pandas.core.types.dtypes.common.py, which imports SparseDtype from pandas.core.arrays.sparse.dtype for two uses in checking is_instance. Whenever a SparseArray is imported, it now executes pandas.core.arrays.__init__.py. Many of the files mentioned in __init__.py in turn import common.py, causing another import from SparseArray, causing another call to __init__.py in the process.

I will try putting the offending import statements from common.py into the functions where they are used so that they are not automatically called on import, and see if that gets rid of the loop.

JustinZhengBC · 2018-10-14T21:55:17Z

Almost all of the tests seem to be failing in the same place. The trace is below, and the offending line is from pandas.core.arrays.sparse.array import _maybe_to_sparse, where the tests claim they can't find pandas.core.arrays.sparse.array. The weird thing is, the tests do not fail locally, and I can even execute the exact same line without any errors, so I'm a bit stuck here. Am I doing something wrong, or is it that moving a folder and thus changing the API causes tests to fail because of how they're set up?

Traceback (most recent call last):
  File "/home/travis/miniconda3/envs/pandas/lib/python2.7/site-packages/_pytest/config/__init__.py", line 419, in _importconftest
    mod = conftestpath.pyimport()
  File "/home/travis/miniconda3/envs/pandas/lib/python2.7/site-packages/py/_path/local.py", line 668, in pyimport
    __import__(modname)
  File "/home/travis/build/pandas-dev/pandas/pandas/__init__.py", line 42, in <module>
    from pandas.core.api import *
  File "/home/travis/build/pandas-dev/pandas/pandas/core/api.py", line 10, in <module>
    from pandas.core.groupby import Grouper
  File "/home/travis/build/pandas-dev/pandas/pandas/core/groupby/__init__.py", line 1, in <module>
    from pandas.core.groupby.groupby import GroupBy  # flake8: noqa
  File "/home/travis/build/pandas-dev/pandas/pandas/core/groupby/groupby.py", line 39, in <module>
    from pandas.core.generic import NDFrame
  File "/home/travis/build/pandas-dev/pandas/pandas/core/generic.py", line 43, in <module>
    from pandas.core.internals import BlockManager
  File "/home/travis/build/pandas-dev/pandas/pandas/core/internals/__init__.py", line 10, in <module>
    from .managers import (  # noqa:F401
  File "/home/travis/build/pandas-dev/pandas/pandas/core/internals/managers.py", line 32, in <module>
    from pandas.core.arrays.sparse.array import _maybe_to_sparse
ImportError: No module named sparse.array
ERROR: could not load /home/travis/build/pandas-dev/pandas/pandas/conftest.py

datapythonista · 2018-10-15T01:57:36Z

Did you try to simply import pandas? I think there is a wrong import in an __init__.py, or something similar.

JustinZhengBC · 2018-10-15T03:22:14Z

Yes, import pandas works locally. I can even run the exact same lines in the .sh file that Travis fails on.

EDIT: I've noticed all the tests that run on python 2.7 fail said line, but the tests that run on 3.x don't. I'll look into this further.

EDIT2: Seems like somehow the __init__.py got lost in the move. Hopefully the tests work now.

codecov · 2018-10-15T04:20:48Z

Codecov Report

Merging #23147 into master will increase coverage by 0.01%.
The diff coverage is 92.68%.

@@            Coverage Diff             @@
##           master   #23147      +/-   ##
==========================================
+ Coverage   92.13%   92.14%   +0.01%     
==========================================
  Files         170      170              
  Lines       51073    51017      -56     
==========================================
- Hits        47056    47011      -45     
+ Misses       4017     4006      -11

Flag	Coverage Δ
#multiple	`90.57% <92.68%> (+0.01%)`	⬆️
#single	`42.3% <68.29%> (+0.02%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/arrays/period.py	`95.56% <ø> (+1.27%)`	⬆️
pandas/compat/pickle_compat.py	`75.6% <ø> (ø)`	⬆️
pandas/core/util/hashing.py	`98.4% <ø> (ø)`	⬆️
pandas/core/arrays/sparse/scipy_sparse.py	`97.05% <ø> (ø)`
pandas/core/arrays/sparse/dtype.py	`100% <ø> (ø)`
pandas/core/arrays/sparse/frame.py	`94.86% <100%> (ø)`
pandas/io/packers.py	`88.04% <100%> (ø)`	⬆️
pandas/core/arrays/integer.py	`94.9% <100%> (ø)`	⬆️
pandas/core/reshape/reshape.py	`99.55% <100%> (ø)`	⬆️
pandas/core/arrays/sparse/array.py	`91.55% <100%> (ø)`
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 85dc171...120bc5e. Read the comment docs.

jorisvandenbossche · 2018-10-15T06:35:54Z

@JustinZhengBC It seems you have removed the sparse tests, without adding them back in the tests/arrays folder.

Something general, should we keep the SparseSeries / SparseDataFrame where they were in pandas.core.sparse ? Since they are not really arrays, and we might deprecate them anyhow. Then moving them is maybe not that worth it?

JustinZhengBC · 2018-10-15T07:35:29Z

@jorisvandenbossche thanks for the catch. All the tests should be there now. The "files changed" tab still says I deleted pandas/core/sparse/api.py even though it also says I created another file with the same name in the right location. Other than that, it shouldn't say I deleted any other files now.

jreback

yeah I would only more the actual sparse array itself here, pls leave the SparseSeries/DataFrame exactly where they were (and there tests).

jreback · 2018-10-15T11:48:37Z

pandas/compat/numpy/function.py

@@ -19,7 +19,8 @@
 """

 from numpy import ndarray
-from pandas.util._validators import (validate_args, validate_kwargs,
+from pandas.util._validators import (validate_args,


is there a reason you are changing unrelated things? (even formatting)

TomAugspurger · 2018-10-15T13:55:12Z

Test failure is unrelated to the changes here. Apparently, the SparseArray.unique implementation is flaky. Taking a look now.

TomAugspurger · 2018-10-15T13:57:09Z

In [4]: pd.SparseArray([0, 0]).unique()
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-4-c1befb06646c> in <module>
----> 1 pd.SparseArray([0, 0]).unique()

~/sandbox/pandas/pandas/core/sparse/array.py in unique(self)
    570     def unique(self):
    571         uniques = list(pd.unique(self.sp_values))
--> 572         fill_loc = self._first_fill_value_loc()
    573         if fill_loc >= 0:
    574             uniques.insert(fill_loc, self.fill_value)

~/sandbox/pandas/pandas/core/sparse/array.py in _first_fill_value_loc(self)
    562
    563         indices = self.sp_index.to_int_index().indices
--> 564         if indices[0] > 0:
    565             return 0
    566

IndexError: index 0 is out of bounds for axis 0 with size 0

TomAugspurger · 2018-10-15T13:59:48Z

And a fix

diff --git a/pandas/core/sparse/array.py b/pandas/core/sparse/array.py
index 15b5118db..17e2bd188 100644
--- a/pandas/core/sparse/array.py
+++ b/pandas/core/sparse/array.py
@@ -561,7 +561,7 @@ class SparseArray(PandasObject, ExtensionArray, ExtensionOpsMixin):
             return -1
 
         indices = self.sp_index.to_int_index().indices
-        if indices[0] > 0:
+        if len(indices) == 0 or indices[0] > 0:
             return 0
 
         diff = indices[1:] - indices[:-1]
diff --git a/pandas/tests/sparse/test_array.py b/pandas/tests/sparse/test_array.py
index 0257d9962..5b1afdc7a 100644
--- a/pandas/tests/sparse/test_array.py
+++ b/pandas/tests/sparse/test_array.py
@@ -1065,6 +1065,13 @@ def test_unique_na_fill(arr, fill_value):
     tm.assert_numpy_array_equal(a, b)
 
 
+def test_unique_all_sparse():
+    arr = SparseArray([0, 0])
+    result = arr.unique()
+    expected = SparseArray([0])
+    tm.assert_sp_array_equal(result, expected)
+
+
 def test_map():
     arr = SparseArray([0, 1, 2])
     expected = SparseArray([10, 11, 12], fill_value=10)

@jreback @jorisvandenbossche any objections to including that fix here? It seems simple enough to not need another PR, which will create a merge conflict here. Or we can merge this and do that in a separate PR.

TomAugspurger · 2018-10-15T18:03:13Z

Found another random failure. Going to split these two out to a new PR.

TomAugspurger · 2018-10-15T18:25:58Z

@JustinZhengBC I'm going to push some changes to your branch.

pep8speaks · 2018-10-15T19:05:45Z

Hello @JustinZhengBC! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/api/extensions/__init__.py !
There are no PEP8 issues in the file pandas/compat/numpy/function.py !
There are no PEP8 issues in the file pandas/compat/pickle_compat.py !
There are no PEP8 issues in the file pandas/core/arrays/__init__.py !
There are no PEP8 issues in the file pandas/core/arrays/datetimes.py !
There are no PEP8 issues in the file pandas/core/arrays/integer.py !
There are no PEP8 issues in the file pandas/core/arrays/interval.py !
There are no PEP8 issues in the file pandas/core/arrays/period.py !
There are no PEP8 issues in the file pandas/core/arrays/sparse.py !
There are no PEP8 issues in the file pandas/core/dtypes/common.py !
There are no PEP8 issues in the file pandas/core/dtypes/concat.py !
There are no PEP8 issues in the file pandas/core/frame.py !
There are no PEP8 issues in the file pandas/core/groupby/generic.py !
There are no PEP8 issues in the file pandas/core/internals/managers.py !
There are no PEP8 issues in the file pandas/core/ops.py !
There are no PEP8 issues in the file pandas/core/reshape/reshape.py !
There are no PEP8 issues in the file pandas/core/series.py !
There are no PEP8 issues in the file pandas/core/sparse/api.py !
There are no PEP8 issues in the file pandas/core/sparse/frame.py !
There are no PEP8 issues in the file pandas/core/sparse/series.py !
There are no PEP8 issues in the file pandas/core/util/hashing.py !
There are no PEP8 issues in the file pandas/io/packers.py !
There are no PEP8 issues in the file pandas/io/pytables.py !
There are no PEP8 issues in the file pandas/tests/arrays/sparse/test_array.py !
There are no PEP8 issues in the file pandas/tests/arrays/sparse/test_libsparse.py !
There are no PEP8 issues in the file pandas/tests/arrays/test_datetimelike.py !
There are no PEP8 issues in the file pandas/tests/extension/test_sparse.py !
There are no PEP8 issues in the file pandas/tests/indexing/test_indexing.py !
There are no PEP8 issues in the file pandas/tests/series/test_combine_concat.py !
There are no PEP8 issues in the file pandas/tests/series/test_subclass.py !
There are no PEP8 issues in the file pandas/tests/sparse/series/test_series.py !
There are no PEP8 issues in the file pandas/tests/test_base.py !

TomAugspurger · 2018-10-15T19:06:56Z

Updates:

Moved pandas/core/arrays/sparse/array.py to pandas/core/arrays/sparse.py
Moved SparseDtype implementation into pandas/core/arrays/sparse.py
Moved sparse/{series,frame,api,scipy_sparse}.py back to pandas/core/sparse (so no change from master)

TomAugspurger · 2018-10-15T19:08:10Z

pandas/core/arrays/datetimes.py

@@ -523,7 +523,7 @@ def _add_delta(self, delta):
        The result's name is set outside of _add_delta by the calling
        method (__add__ or __sub__)
        """
-        from pandas.core.arrays.timedeltas import TimedeltaArrayMixin
+        from pandas.core.arrays import TimedeltaArrayMixin


Were these changes here before mine? I may revert them.

TomAugspurger · 2018-10-15T19:17:26Z

I reverted some extraneous changes in fe53b50. @JustinZhengBC if you're interested, you can re-revert that commit later and make a separate PR with those sytlisitc changes. I notice now that it has a formatting change too in https://github.com/pandas-dev/pandas/pull/23147/files#diff-6cdf34ac065f3e1e350f94c70e76898bL6, sorry. You'll need to manually fix that if you re-revert it.

TomAugspurger · 2018-10-15T19:30:19Z

pandas/api/extensions/__init__.py

@@ -3,8 +3,8 @@
                                  register_index_accessor,
                                  register_series_accessor)
 from pandas.core.algorithms import take  # noqa
-from pandas.core.arrays.base import (ExtensionArray,    # noqa
-                                     ExtensionScalarOpsMixin)
+from pandas.core.arrays import (ExtensionArray,    # noqa


Ah, should have reverted this too :/ Oh well.

TomAugspurger · 2018-10-15T19:31:08Z

@jreback @jorisvandenbossche @datapythonista could I get a quick sanity check on this? I'd like to merge on green since I'll have some followup PRs that will be blocked by this.

jreback · 2018-10-15T20:21:52Z

lgtm at a glance

jorisvandenbossche · 2018-10-15T20:44:48Z

Looks good to me as well based on a quick view

TomAugspurger · 2018-10-16T11:23:19Z

OK to ignore the numpydev failure, or will we be re-running PRs after #23178 is in?

TomAugspurger · 2018-10-16T13:36:58Z

Merging so we can move forward. Will keep an eye on master though.

TomAugspurger · 2018-10-16T13:37:24Z

Thanks @JustinZhengBC!

CLN-23123 Move SparseArray to arrays

233a284

datapythonista added Refactor Internal refactoring of code Sparse Sparse Data Type labels Oct 14, 2018

jreback requested changes Oct 14, 2018

View reviewed changes

JustinZhengBC added 4 commits October 14, 2018 11:01

CLN-23123 Remove print statements

96cd3d5

CLN-23123 Restore pandas.core.arrays.__init__.py

c9bfeae

CLN-23123 Fix test_api.py

f63ac12

CLN-23123 Clean up imports and fix linting

36285c1

Add __init__.py to SparseArray folder

ab60797

JustinZhengBC added 3 commits October 15, 2018 00:06

CLN-23123 Add __init__.py to sparse tests

e8808e0

CLN-23123 Add missing sparse tests

54ed5bc

CLN-23123 Modify re-added tests to use correct imports

b261f85

jreback requested changes Oct 15, 2018

View reviewed changes

move frame, series back

1d0b50b

TomAugspurger reviewed Oct 15, 2018

View reviewed changes

Revert extraneous changes

fe53b50

TomAugspurger reviewed Oct 15, 2018

View reviewed changes

fixup

120bc5e

TomAugspurger merged commit 913f71f into pandas-dev:master Oct 16, 2018

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018

CLN: Move SparseArray to arrays (pandas-dev#23147)

8b64d56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN GH23123 Move SparseArray to arrays #23147

CLN GH23123 Move SparseArray to arrays #23147

JustinZhengBC commented Oct 14, 2018

jorisvandenbossche commented Oct 14, 2018

datapythonista commented Oct 14, 2018

jreback Oct 14, 2018

JustinZhengBC commented Oct 14, 2018

JustinZhengBC commented Oct 14, 2018

datapythonista commented Oct 15, 2018

JustinZhengBC commented Oct 15, 2018 •

edited

Loading

codecov bot commented Oct 15, 2018 •

edited

Loading

jorisvandenbossche commented Oct 15, 2018

JustinZhengBC commented Oct 15, 2018

jreback left a comment

jreback Oct 15, 2018

TomAugspurger commented Oct 15, 2018

TomAugspurger commented Oct 15, 2018

TomAugspurger commented Oct 15, 2018

TomAugspurger commented Oct 15, 2018

TomAugspurger commented Oct 15, 2018

pep8speaks commented Oct 15, 2018

TomAugspurger commented Oct 15, 2018 •

edited

Loading

TomAugspurger Oct 15, 2018

TomAugspurger commented Oct 15, 2018

TomAugspurger Oct 15, 2018

TomAugspurger commented Oct 15, 2018

jreback commented Oct 15, 2018

jorisvandenbossche commented Oct 15, 2018

TomAugspurger commented Oct 16, 2018

TomAugspurger commented Oct 16, 2018

TomAugspurger commented Oct 16, 2018

CLN GH23123 Move SparseArray to arrays #23147

CLN GH23123 Move SparseArray to arrays #23147

Conversation

JustinZhengBC commented Oct 14, 2018

jorisvandenbossche commented Oct 14, 2018

datapythonista commented Oct 14, 2018

jreback Oct 14, 2018

Choose a reason for hiding this comment

JustinZhengBC commented Oct 14, 2018

JustinZhengBC commented Oct 14, 2018

datapythonista commented Oct 15, 2018

JustinZhengBC commented Oct 15, 2018 • edited Loading

codecov bot commented Oct 15, 2018 • edited Loading

Codecov Report

jorisvandenbossche commented Oct 15, 2018

JustinZhengBC commented Oct 15, 2018

jreback left a comment

Choose a reason for hiding this comment

jreback Oct 15, 2018

Choose a reason for hiding this comment

TomAugspurger commented Oct 15, 2018

TomAugspurger commented Oct 15, 2018

TomAugspurger commented Oct 15, 2018

TomAugspurger commented Oct 15, 2018

TomAugspurger commented Oct 15, 2018

pep8speaks commented Oct 15, 2018

TomAugspurger commented Oct 15, 2018 • edited Loading

TomAugspurger Oct 15, 2018

Choose a reason for hiding this comment

TomAugspurger commented Oct 15, 2018

TomAugspurger Oct 15, 2018

Choose a reason for hiding this comment

TomAugspurger commented Oct 15, 2018

jreback commented Oct 15, 2018

jorisvandenbossche commented Oct 15, 2018

TomAugspurger commented Oct 16, 2018

TomAugspurger commented Oct 16, 2018

TomAugspurger commented Oct 16, 2018

JustinZhengBC commented Oct 15, 2018 •

edited

Loading

codecov bot commented Oct 15, 2018 •

edited

Loading

TomAugspurger commented Oct 15, 2018 •

edited

Loading