Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: Fixes first_valid_index when DataFrame or Series has duplicate row index (GH21441) #21497

Merged
merged 21 commits into from
Jun 20, 2018
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
6151181
Initial commit
KalyanGokhale Jun 15, 2018
003f801
Removed failing test from closed PR
KalyanGokhale Jun 16, 2018
952758a
Updated logic
KalyanGokhale Jun 17, 2018
1f4beb0
Changed logic from AND to OR for chk_notna
KalyanGokhale Jun 17, 2018
675201d
Series reverse logic for how == last, edits to whatsnew
KalyanGokhale Jun 17, 2018
0640279
Reverting how==last logic to original, restoring deleted test
KalyanGokhale Jun 18, 2018
177a3f4
Fixed the if / for for chk_notna, added test cases for NA values in d…
KalyanGokhale Jun 19, 2018
e94aad5
Initial commit
KalyanGokhale Jun 15, 2018
ff58ffd
Removed failing test from closed PR
KalyanGokhale Jun 16, 2018
d326b0a
Updated logic
KalyanGokhale Jun 17, 2018
b53bb11
Changed logic from AND to OR for chk_notna
KalyanGokhale Jun 17, 2018
0cb3405
Series reverse logic for how == last, edits to whatsnew
KalyanGokhale Jun 17, 2018
11edb51
Reverting how==last logic to original, restoring deleted test
KalyanGokhale Jun 18, 2018
ed410e1
Fixed the if / for for chk_notna, added test cases for NA values in d…
KalyanGokhale Jun 19, 2018
05e8a99
Rebased and updated whatsnew
KalyanGokhale Jun 19, 2018
01a9f7e
Moved tests to test_timeseries
KalyanGokhale Jun 19, 2018
cbcb089
Rebased and whatsnew
KalyanGokhale Jun 19, 2018
111efb0
Removed tests from test_generic
KalyanGokhale Jun 19, 2018
608c09e
Updated test parameter name
KalyanGokhale Jun 19, 2018
d8fface
Minor update to whatsnew to force TravisCI build
KalyanGokhale Jun 19, 2018
751046d
Mirrored logic for how == first and last
KalyanGokhale Jun 20, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.23.2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ and bug fixes. We recommend that all users upgrade to this version.
Fixed Regressions
~~~~~~~~~~~~~~~~~

-
- Bug in :meth:`first_valid_index` raised for a row index with duplicate values (:issue:`21441`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have left it as :meth:`first_valid_index` as this issue affects both DataFrame and Series

I don't think the :meth: will correctly link to anything as written since there's no global pd.first_valid_index. You can write something like "Bug in both :meth:`Series.first_valid_index` and :meth:`DataFrame.first_valid_index` ..." if you want to be explicit that both are affected, which would link to both Series and DataFrame separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - Done

-

.. _whatsnew_0232.performance:
Expand Down
15 changes: 6 additions & 9 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -8970,17 +8970,14 @@ def _find_valid_index(self, how):

if how == 'first':
# First valid value case
i = is_valid.idxmax()
if not is_valid[i]:
return None
return i

idxpos = is_valid.values[::].argmin()
elif how == 'last':
# Last valid value case
i = is_valid.values[::-1].argmax()
if not is_valid.iat[len(self) - i - 1]:
return None
return self.index[len(self) - i - 1]
idxpos = len(self) - 1 - is_valid.values[::-1].argmax()

if not is_valid.iat[idxpos]:
return None
return self.index[idxpos]

@Appender(_shared_docs['valid_index'] % {'position': 'first',
'klass': 'NDFrame'})
Expand Down
10 changes: 10 additions & 0 deletions pandas/tests/generic/test_generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -612,6 +612,16 @@ def test_pct_change(self, periods, fill_method, limit, exp):
else:
tm.assert_series_equal(res, Series(exp))

@pytest.mark.parametrize("data,index,expected_first,expected_last", [
({'A': [1, 2, 3]}, [1, 1, 2], 1, 2),
({'A': [1, 2, 3]}, [1, 2, 2], 1, 2),
({'A': [1, 2, 3, 4]}, ['d', 'd', 'd', 'd'], 'd', 'd')])
def test_valid_index(self, data, index, expected_first, expected_last):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we not already have some tests for this? pls put near the others. does this duplicate existing tests at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we not already have some tests for this? pls put near the others.

Thanks - The only test involving first_valid_index and last_valid_index is in ./pandas/tests/frame/test_timeseries.py - and does not specifically check for duplicate
first or last index values. Would you suggest I move this test there?

# GH 21441
df = DataFrame(data, index=index)
assert expected_first == df.first_valid_index()
assert expected_last == df.last_valid_index()


class TestNDFrame(object):
# tests that don't fit elsewhere
Expand Down
7 changes: 0 additions & 7 deletions pandas/tests/test_resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -649,13 +649,6 @@ def test_asfreq_fill_value(self):
expected = frame.reindex(new_index, fill_value=4.0)
assert_frame_equal(result, expected)

def test_resample_interpolate(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this test as it was failing - investigated and seems that this test is from a closed PR #12974 opened for issue #12925
Not sure if this is the right call....
probably there are other tests failing which I haven't investigated yet - my sense is those might be related to this one - will check the TravisCI and other checks for it again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored this test now - this test along with others were failing due to error in interpolation, which is fixed now

# # 12925
df = self.create_series().to_frame('value')
assert_frame_equal(
df.resample('1T').asfreq().interpolate(),
df.resample('1T').interpolate())

def test_raises_on_non_datetimelike_index(self):
# this is a non datetimelike index
xp = DataFrame()
Expand Down