Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: Fixes first_valid_index when DataFrame or Series has duplicate row index (GH21441) #21497

Merged
merged 21 commits into from
Jun 20, 2018
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
6151181
Initial commit
KalyanGokhale Jun 15, 2018
003f801
Removed failing test from closed PR
KalyanGokhale Jun 16, 2018
952758a
Updated logic
KalyanGokhale Jun 17, 2018
1f4beb0
Changed logic from AND to OR for chk_notna
KalyanGokhale Jun 17, 2018
675201d
Series reverse logic for how == last, edits to whatsnew
KalyanGokhale Jun 17, 2018
0640279
Reverting how==last logic to original, restoring deleted test
KalyanGokhale Jun 18, 2018
177a3f4
Fixed the if / for for chk_notna, added test cases for NA values in d…
KalyanGokhale Jun 19, 2018
e94aad5
Initial commit
KalyanGokhale Jun 15, 2018
ff58ffd
Removed failing test from closed PR
KalyanGokhale Jun 16, 2018
d326b0a
Updated logic
KalyanGokhale Jun 17, 2018
b53bb11
Changed logic from AND to OR for chk_notna
KalyanGokhale Jun 17, 2018
0cb3405
Series reverse logic for how == last, edits to whatsnew
KalyanGokhale Jun 17, 2018
11edb51
Reverting how==last logic to original, restoring deleted test
KalyanGokhale Jun 18, 2018
ed410e1
Fixed the if / for for chk_notna, added test cases for NA values in d…
KalyanGokhale Jun 19, 2018
05e8a99
Rebased and updated whatsnew
KalyanGokhale Jun 19, 2018
01a9f7e
Moved tests to test_timeseries
KalyanGokhale Jun 19, 2018
cbcb089
Rebased and whatsnew
KalyanGokhale Jun 19, 2018
111efb0
Removed tests from test_generic
KalyanGokhale Jun 19, 2018
608c09e
Updated test parameter name
KalyanGokhale Jun 19, 2018
d8fface
Minor update to whatsnew to force TravisCI build
KalyanGokhale Jun 19, 2018
751046d
Mirrored logic for how == first and last
KalyanGokhale Jun 20, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.23.2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ and bug fixes. We recommend that all users upgrade to this version.
Fixed Regressions
~~~~~~~~~~~~~~~~~

-
- Bug in both :meth:`DataFrame.first_valid_index` and :meth:`Series.first_valid_index` raised for a row index with duplicate values (:issue:`21441`)
-

.. _whatsnew_0232.performance:
Expand Down
27 changes: 15 additions & 12 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -8969,18 +8969,21 @@ def _find_valid_index(self, how):
is_valid = is_valid.any(1) # reduce axis 1

if how == 'first':
# First valid value case
i = is_valid.idxmax()
if not is_valid[i]:
return None
return i

elif how == 'last':
# Last valid value case
i = is_valid.values[::-1].argmax()
if not is_valid.iat[len(self) - i - 1]:
return None
return self.index[len(self) - i - 1]
idx = is_valid.idxmax()
if isinstance(is_valid[idx], ABCSeries):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are you trying to do here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - this block is supposed to check that for multiple counts of same index, at least one is not NA.

However, while testing this with following data, the expected output is not being returned

x = pd.DataFrame({'b': [1,np.NaN,3]}, index=[1,1,2])

Expected 1, returned None

I'll rework this patch and commit again - Thanks again for the question prompt, it was fallacy of assumption on my part (had not checked explicitly for NaN value among the multiple index)

Copy link
Contributor Author

@KalyanGokhale KalyanGokhale Jun 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the loop was incorrect leading to an error, not sure what I was thinking earlier :) - fixed now and committing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - rebased and committed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still not clear on the logic here, why can't this be a mirror of the 'last' logic?

for chk_notna in is_valid[idx]:
chk_notna = False or chk_notna
else:
chk_notna = is_valid[idx]

if how == 'last':
idxpos = len(self) - 1 - is_valid.values[::-1].argmax()
chk_notna = is_valid.iat[idxpos]
idx = self.index[idxpos]

if not chk_notna:
return None
return idx

@Appender(_shared_docs['valid_index'] % {'position': 'first',
'klass': 'NDFrame'})
Expand Down
10 changes: 10 additions & 0 deletions pandas/tests/generic/test_generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -612,6 +612,16 @@ def test_pct_change(self, periods, fill_method, limit, exp):
else:
tm.assert_series_equal(res, Series(exp))

@pytest.mark.parametrize("data,index,expected_first,expected_last", [
({'A': [1, 2, 3]}, [1, 1, 2], 1, 2),
({'A': [1, 2, 3]}, [1, 2, 2], 1, 2),
({'A': [1, 2, 3, 4]}, ['d', 'd', 'd', 'd'], 'd', 'd')])
def test_valid_index(self, data, index, expected_first, expected_last):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we not already have some tests for this? pls put near the others. does this duplicate existing tests at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we not already have some tests for this? pls put near the others.

Thanks - The only test involving first_valid_index and last_valid_index is in ./pandas/tests/frame/test_timeseries.py - and does not specifically check for duplicate
first or last index values. Would you suggest I move this test there?

# GH 21441
df = DataFrame(data, index=index)
assert expected_first == df.first_valid_index()
assert expected_last == df.last_valid_index()


class TestNDFrame(object):
# tests that don't fit elsewhere
Expand Down