-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN GH23123 Move SparseArray to arrays #23147
Conversation
This change in imports of other arrays everywhere should not be needed. Can you detail where the circular import problem was coming from? |
@JustinZhengBC you've got some |
pandas/core/arrays/__init__.py
Outdated
ExtensionScalarOpsMixin) | ||
from .categorical import Categorical # noqa | ||
from .datetimes import DatetimeArrayMixin # noqa | ||
from .interval import IntervalArray # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a nice pattern, pls restore this
The main import loop problem is in I will try putting the offending import statements from |
Almost all of the tests seem to be failing in the same place. The trace is below, and the offending line is
|
Did you try to simply |
Yes, EDIT: I've noticed all the tests that run on python 2.7 fail said line, but the tests that run on 3.x don't. I'll look into this further. EDIT2: Seems like somehow the |
Codecov Report
@@ Coverage Diff @@
## master #23147 +/- ##
==========================================
+ Coverage 92.13% 92.14% +0.01%
==========================================
Files 170 170
Lines 51073 51017 -56
==========================================
- Hits 47056 47011 -45
+ Misses 4017 4006 -11
Continue to review full report at Codecov.
|
@JustinZhengBC It seems you have removed the sparse tests, without adding them back in the Something general, should we keep the SparseSeries / SparseDataFrame where they were in |
@jorisvandenbossche thanks for the catch. All the tests should be there now. The "files changed" tab still says I deleted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I would only more the actual sparse array itself here, pls leave the SparseSeries/DataFrame exactly where they were (and there tests).
pandas/compat/numpy/function.py
Outdated
@@ -19,7 +19,8 @@ | |||
""" | |||
|
|||
from numpy import ndarray | |||
from pandas.util._validators import (validate_args, validate_kwargs, | |||
from pandas.util._validators import (validate_args, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason you are changing unrelated things? (even formatting)
Test failure is unrelated to the changes here. Apparently, the SparseArray.unique implementation is flaky. Taking a look now. |
In [4]: pd.SparseArray([0, 0]).unique()
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-4-c1befb06646c> in <module>
----> 1 pd.SparseArray([0, 0]).unique()
~/sandbox/pandas/pandas/core/sparse/array.py in unique(self)
570 def unique(self):
571 uniques = list(pd.unique(self.sp_values))
--> 572 fill_loc = self._first_fill_value_loc()
573 if fill_loc >= 0:
574 uniques.insert(fill_loc, self.fill_value)
~/sandbox/pandas/pandas/core/sparse/array.py in _first_fill_value_loc(self)
562
563 indices = self.sp_index.to_int_index().indices
--> 564 if indices[0] > 0:
565 return 0
566
IndexError: index 0 is out of bounds for axis 0 with size 0 |
And a fix diff --git a/pandas/core/sparse/array.py b/pandas/core/sparse/array.py
index 15b5118db..17e2bd188 100644
--- a/pandas/core/sparse/array.py
+++ b/pandas/core/sparse/array.py
@@ -561,7 +561,7 @@ class SparseArray(PandasObject, ExtensionArray, ExtensionOpsMixin):
return -1
indices = self.sp_index.to_int_index().indices
- if indices[0] > 0:
+ if len(indices) == 0 or indices[0] > 0:
return 0
diff = indices[1:] - indices[:-1]
diff --git a/pandas/tests/sparse/test_array.py b/pandas/tests/sparse/test_array.py
index 0257d9962..5b1afdc7a 100644
--- a/pandas/tests/sparse/test_array.py
+++ b/pandas/tests/sparse/test_array.py
@@ -1065,6 +1065,13 @@ def test_unique_na_fill(arr, fill_value):
tm.assert_numpy_array_equal(a, b)
+def test_unique_all_sparse():
+ arr = SparseArray([0, 0])
+ result = arr.unique()
+ expected = SparseArray([0])
+ tm.assert_sp_array_equal(result, expected)
+
+
def test_map():
arr = SparseArray([0, 1, 2])
expected = SparseArray([10, 11, 12], fill_value=10) @jreback @jorisvandenbossche any objections to including that fix here? It seems simple enough to not need another PR, which will create a merge conflict here. Or we can merge this and do that in a separate PR. |
Found another random failure. Going to split these two out to a new PR. |
@JustinZhengBC I'm going to push some changes to your branch. |
Hello @JustinZhengBC! Thanks for updating the PR.
|
Updates:
|
pandas/core/arrays/datetimes.py
Outdated
@@ -523,7 +523,7 @@ def _add_delta(self, delta): | |||
The result's name is set outside of _add_delta by the calling | |||
method (__add__ or __sub__) | |||
""" | |||
from pandas.core.arrays.timedeltas import TimedeltaArrayMixin | |||
from pandas.core.arrays import TimedeltaArrayMixin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were these changes here before mine? I may revert them.
I reverted some extraneous changes in fe53b50. @JustinZhengBC if you're interested, you can re-revert that commit later and make a separate PR with those sytlisitc changes. I notice now that it has a formatting change too in https://github.com/pandas-dev/pandas/pull/23147/files#diff-6cdf34ac065f3e1e350f94c70e76898bL6, sorry. You'll need to manually fix that if you re-revert it. |
@@ -3,8 +3,8 @@ | |||
register_index_accessor, | |||
register_series_accessor) | |||
from pandas.core.algorithms import take # noqa | |||
from pandas.core.arrays.base import (ExtensionArray, # noqa | |||
ExtensionScalarOpsMixin) | |||
from pandas.core.arrays import (ExtensionArray, # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, should have reverted this too :/ Oh well.
@jreback @jorisvandenbossche @datapythonista could I get a quick sanity check on this? I'd like to merge on green since I'll have some followup PRs that will be blocked by this. |
lgtm at a glance |
Looks good to me as well based on a quick view |
OK to ignore the numpydev failure, or will we be re-running PRs after #23178 is in? |
Merging so we can move forward. Will keep an eye on master though. |
Thanks @JustinZhengBC! |
git diff upstream/master -u -- "*.py" | flake8 --diff
Importing
SparseArray
from theArrays
folder was causing circular import paths with the__init__.py
, so I had to empty it and fix every other file that imported frompandas.core.arrays