Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: Fixes first_valid_index when DataFrame or Series has duplicate row index (GH21441) #21497

Merged
merged 21 commits into from
Jun 20, 2018
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
6151181
Initial commit
KalyanGokhale Jun 15, 2018
003f801
Removed failing test from closed PR
KalyanGokhale Jun 16, 2018
952758a
Updated logic
KalyanGokhale Jun 17, 2018
1f4beb0
Changed logic from AND to OR for chk_notna
KalyanGokhale Jun 17, 2018
675201d
Series reverse logic for how == last, edits to whatsnew
KalyanGokhale Jun 17, 2018
0640279
Reverting how==last logic to original, restoring deleted test
KalyanGokhale Jun 18, 2018
177a3f4
Fixed the if / for for chk_notna, added test cases for NA values in d…
KalyanGokhale Jun 19, 2018
e94aad5
Initial commit
KalyanGokhale Jun 15, 2018
ff58ffd
Removed failing test from closed PR
KalyanGokhale Jun 16, 2018
d326b0a
Updated logic
KalyanGokhale Jun 17, 2018
b53bb11
Changed logic from AND to OR for chk_notna
KalyanGokhale Jun 17, 2018
0cb3405
Series reverse logic for how == last, edits to whatsnew
KalyanGokhale Jun 17, 2018
11edb51
Reverting how==last logic to original, restoring deleted test
KalyanGokhale Jun 18, 2018
ed410e1
Fixed the if / for for chk_notna, added test cases for NA values in d…
KalyanGokhale Jun 19, 2018
05e8a99
Rebased and updated whatsnew
KalyanGokhale Jun 19, 2018
01a9f7e
Moved tests to test_timeseries
KalyanGokhale Jun 19, 2018
cbcb089
Rebased and whatsnew
KalyanGokhale Jun 19, 2018
111efb0
Removed tests from test_generic
KalyanGokhale Jun 19, 2018
608c09e
Updated test parameter name
KalyanGokhale Jun 19, 2018
d8fface
Minor update to whatsnew to force TravisCI build
KalyanGokhale Jun 19, 2018
751046d
Mirrored logic for how == first and last
KalyanGokhale Jun 20, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion doc/source/whatsnew/v0.23.2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ Fixed Regressions
~~~~~~~~~~~~~~~~~

- Fixed regression in :meth:`to_csv` when handling file-like object incorrectly (:issue:`21471`)
-
- Bug in both :meth:`DataFrame.first_valid_index` and :meth:`Series.first_valid_index` raised for a row index having duplicate values (:issue:`21441`)
-

.. _whatsnew_0232.performance:

Expand Down
28 changes: 16 additions & 12 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -8969,18 +8969,22 @@ def _find_valid_index(self, how):
is_valid = is_valid.any(1) # reduce axis 1

if how == 'first':
# First valid value case
i = is_valid.idxmax()
if not is_valid[i]:
return None
return i

elif how == 'last':
# Last valid value case
i = is_valid.values[::-1].argmax()
if not is_valid.iat[len(self) - i - 1]:
return None
return self.index[len(self) - i - 1]
idx = is_valid.idxmax()
if isinstance(is_valid[idx], ABCSeries):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are you trying to do here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - this block is supposed to check that for multiple counts of same index, at least one is not NA.

However, while testing this with following data, the expected output is not being returned

x = pd.DataFrame({'b': [1,np.NaN,3]}, index=[1,1,2])

Expected 1, returned None

I'll rework this patch and commit again - Thanks again for the question prompt, it was fallacy of assumption on my part (had not checked explicitly for NaN value among the multiple index)

Copy link
Contributor Author

@KalyanGokhale KalyanGokhale Jun 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the loop was incorrect leading to an error, not sure what I was thinking earlier :) - fixed now and committing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - rebased and committed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still not clear on the logic here, why can't this be a mirror of the 'last' logic?

chk_notna = False
for idxinstance in is_valid[idx]:
chk_notna = chk_notna or idxinstance
else:
chk_notna = is_valid[idx]

if how == 'last':
idxpos = len(self) - 1 - is_valid.values[::-1].argmax()
chk_notna = is_valid.iat[idxpos]
idx = self.index[idxpos]

if not chk_notna:
return None
return idx

@Appender(_shared_docs['valid_index'] % {'position': 'first',
'klass': 'NDFrame'})
Expand Down
15 changes: 14 additions & 1 deletion pandas/tests/frame/test_timeseries.py
Original file line number Diff line number Diff line change
Expand Up @@ -506,7 +506,15 @@ def test_asfreq_fillvalue(self):
actual_series = ts.asfreq(freq='1S', fill_value=9.0)
assert_series_equal(expected_series, actual_series)

def test_first_last_valid(self):
@pytest.mark.parametrize("data,idx,expected_first,expected_last", [
({'A': [1, 2, 3]}, [1, 1, 2], 1, 2),
({'A': [1, 2, 3]}, [1, 2, 2], 1, 2),
({'A': [1, 2, 3, 4]}, ['d', 'd', 'd', 'd'], 'd', 'd'),
({'A': [1, np.nan, 3]}, [1, 1, 2], 1, 2),
({'A': [np.nan, np.nan, 3]}, [1, 1, 2], 2, 2),
({'A': [1, np.nan, 3]}, [1, 2, 2], 1, 2)])
def test_first_last_valid(self, data, idx,
expected_first, expected_last):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved tests here from pandas/tests/generic/test_generic.py - all related tests for first_valid_index and last_valid_index are co-located

N = len(self.frame.index)
mat = randn(N)
mat[:5] = nan
Expand Down Expand Up @@ -539,6 +547,11 @@ def test_first_last_valid(self):
assert frame.first_valid_index().freq == frame.index.freq
assert frame.last_valid_index().freq == frame.index.freq

# GH 21441
df = DataFrame(data, index=idx)
assert expected_first == df.first_valid_index()
assert expected_last == df.last_valid_index()

def test_first_subset(self):
ts = tm.makeTimeDataFrame(freq='12h')
result = ts.first('10d')
Expand Down