Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: Fixes first_valid_index when DataFrame or Series has duplicate row index (GH21441) #21497

Merged
merged 21 commits into from
Jun 20, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
6151181
Initial commit
KalyanGokhale Jun 15, 2018
003f801
Removed failing test from closed PR
KalyanGokhale Jun 16, 2018
952758a
Updated logic
KalyanGokhale Jun 17, 2018
1f4beb0
Changed logic from AND to OR for chk_notna
KalyanGokhale Jun 17, 2018
675201d
Series reverse logic for how == last, edits to whatsnew
KalyanGokhale Jun 17, 2018
0640279
Reverting how==last logic to original, restoring deleted test
KalyanGokhale Jun 18, 2018
177a3f4
Fixed the if / for for chk_notna, added test cases for NA values in d…
KalyanGokhale Jun 19, 2018
e94aad5
Initial commit
KalyanGokhale Jun 15, 2018
ff58ffd
Removed failing test from closed PR
KalyanGokhale Jun 16, 2018
d326b0a
Updated logic
KalyanGokhale Jun 17, 2018
b53bb11
Changed logic from AND to OR for chk_notna
KalyanGokhale Jun 17, 2018
0cb3405
Series reverse logic for how == last, edits to whatsnew
KalyanGokhale Jun 17, 2018
11edb51
Reverting how==last logic to original, restoring deleted test
KalyanGokhale Jun 18, 2018
ed410e1
Fixed the if / for for chk_notna, added test cases for NA values in d…
KalyanGokhale Jun 19, 2018
05e8a99
Rebased and updated whatsnew
KalyanGokhale Jun 19, 2018
01a9f7e
Moved tests to test_timeseries
KalyanGokhale Jun 19, 2018
cbcb089
Rebased and whatsnew
KalyanGokhale Jun 19, 2018
111efb0
Removed tests from test_generic
KalyanGokhale Jun 19, 2018
608c09e
Updated test parameter name
KalyanGokhale Jun 19, 2018
d8fface
Minor update to whatsnew to force TravisCI build
KalyanGokhale Jun 19, 2018
751046d
Mirrored logic for how == first and last
KalyanGokhale Jun 20, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion doc/source/whatsnew/v0.23.2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,12 @@ and bug fixes. We recommend that all users upgrade to this version.
Fixed Regressions
~~~~~~~~~~~~~~~~~

-
-
-

**Other Fixes**

- Bug in :meth:`first_valid_index` that raised for row index with duplicate values (:issue:`21441`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be :meth:`DataFrame.first_valid_index`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need a separate sub-section here, just list the issue

'raised for a row index'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - updated
Have left it as :meth:first_valid_index as this issue affects both DataFrame and Series (though the example and title of the original issue points only to DataFrame)


.. _whatsnew_0232.performance:

Expand Down
14 changes: 7 additions & 7 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -8970,17 +8970,17 @@ def _find_valid_index(self, how):

if how == 'first':
# First valid value case
i = is_valid.idxmax()
if not is_valid[i]:
return None
return i
i = is_valid.values[::].argmin()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just call this idxpos, no need for i any longer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - done

idxpos = i

elif how == 'last':
# Last valid value case
i = is_valid.values[::-1].argmax()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this idxpos

if not is_valid.iat[len(self) - i - 1]:
return None
return self.index[len(self) - i - 1]
idxpos = len(self) - i - 1

if not is_valid.iat[idxpos]:
return None
return self.index[idxpos]

@Appender(_shared_docs['valid_index'] % {'position': 'first',
'klass': 'NDFrame'})
Expand Down
10 changes: 10 additions & 0 deletions pandas/tests/generic/test_generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -612,6 +612,16 @@ def test_pct_change(self, periods, fill_method, limit, exp):
else:
tm.assert_series_equal(res, Series(exp))

@pytest.mark.parametrize("DF,idx,first_idx,last_idx", [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some renaming suggestions for readability:

  • DF --> data (match DataFrame constructor)
  • idx --> index (match DataFrame constructor)
  • first_idx --> expected_first (follow standard expected/result unit test setup)
  • last_idx --> expected_last (follow standard expected/result unit test setup)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - done

({'A': [1, 2, 3]}, [1, 1, 2], 1, 2),
({'A': [1, 2, 3]}, [1, 2, 2], 1, 2),
({'A': [1, 2, 3, 4]}, ['d', 'd', 'd', 'd'], 'd', 'd')])
def test_valid_index(self, DF, idx, first_idx, last_idx):
# GH 21441
df1 = pd.DataFrame(DF, index=idx)
Copy link
Member

@jschendel jschendel Jun 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just call this df; there's no ambiguity since there's only one frame in the test. Also DataFrame is imported, so the pd. isn't needed.

assert first_idx == df1.first_valid_index()
assert last_idx == df1.last_valid_index()


class TestNDFrame(object):
# tests that don't fit elsewhere
Expand Down