REGR: Fixes first_valid_index when DataFrame or Series has duplicate row index (GH21441) #21497

KalyanGokhale · 2018-06-15T12:15:45Z

closes first_valid_index fails when dataframe has non-unique row index #21441
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

GH21441

jschendel · 2018-06-15T15:10:56Z

pandas/tests/generic/test_generic.py

@@ -612,6 +612,16 @@ def test_pct_change(self, periods, fill_method, limit, exp):
        else:
            tm.assert_series_equal(res, Series(exp))

+    @pytest.mark.parametrize("DF,idx,first_idx,last_idx", [


Some renaming suggestions for readability:

DF --> data (match DataFrame constructor)

idx --> index (match DataFrame constructor)

first_idx --> expected_first (follow standard expected/result unit test setup)

last_idx --> expected_last (follow standard expected/result unit test setup)

Thanks - done

jschendel · 2018-06-15T15:12:31Z

pandas/tests/generic/test_generic.py

+        ({'A': [1, 2, 3, 4]}, ['d', 'd', 'd', 'd'], 'd', 'd')])
+    def test_valid_index(self, DF, idx, first_idx, last_idx):
+        # GH 21441
+        df1 = pd.DataFrame(DF, index=idx)


You can just call this df; there's no ambiguity since there's only one frame in the test. Also DataFrame is imported, so the pd. isn't needed.

jschendel · 2018-06-15T15:13:24Z

doc/source/whatsnew/v0.23.2.txt

+
+**Other Fixes**
+
+- Bug in :meth:`first_valid_index` that raised for row index with duplicate values (:issue:`21441`)


I think this should be :meth:`DataFrame.first_valid_index`

you don't need a separate sub-section here, just list the issue

'raised for a row index'

Thanks - updated
Have left it as :meth:first_valid_index as this issue affects both DataFrame and Series (though the example and title of the original issue points only to DataFrame)

jreback · 2018-06-15T16:11:04Z

doc/source/whatsnew/v0.23.2.txt

+
+**Other Fixes**
+
+- Bug in :meth:`first_valid_index` that raised for row index with duplicate values (:issue:`21441`)


you don't need a separate sub-section here, just list the issue

'raised for a row index'

jreback · 2018-06-15T16:14:33Z

pandas/core/generic.py

-            if not is_valid[i]:
-                return None
-            return i
+            i = is_valid.values[::].argmin()


just call this idxpos, no need for i any longer

Thanks - done

jreback · 2018-06-15T16:14:48Z

pandas/core/generic.py

-                return None
-            return i
+            i = is_valid.values[::].argmin()
+            idxpos = i

        elif how == 'last':
            # Last valid value case
            i = is_valid.values[::-1].argmax()


make this idxpos

GH21441

KalyanGokhale · 2018-06-16T06:10:46Z

pandas/tests/test_resample.py

@@ -649,13 +649,6 @@ def test_asfreq_fill_value(self):
        expected = frame.reindex(new_index, fill_value=4.0)
        assert_frame_equal(result, expected)

-    def test_resample_interpolate(self):


Removed this test as it was failing - investigated and seems that this test is from a closed PR #12974 opened for issue #12925
Not sure if this is the right call....
probably there are other tests failing which I haven't investigated yet - my sense is those might be related to this one - will check the TravisCI and other checks for it again.

Restored this test now - this test along with others were failing due to error in interpolation, which is fixed now

GH21441

jschendel · 2018-06-17T16:26:15Z

doc/source/whatsnew/v0.23.2.txt

@@ -16,7 +16,7 @@ and bug fixes. We recommend that all users upgrade to this version.
 Fixed Regressions
 ~~~~~~~~~~~~~~~~~

-
+- Bug in :meth:`first_valid_index` raised for a row index with duplicate values (:issue:`21441`)


Have left it as :meth:`first_valid_index` as this issue affects both DataFrame and Series

I don't think the :meth: will correctly link to anything as written since there's no global pd.first_valid_index. You can write something like "Bug in both :meth:`Series.first_valid_index` and :meth:`DataFrame.first_valid_index` ..." if you want to be explicit that both are affected, which would link to both Series and DataFrame separately.

Thanks - Done

GH21441

codecov · 2018-06-18T02:52:12Z

Codecov Report

Merging #21497 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #21497      +/-   ##
==========================================
- Coverage   91.92%   91.91%   -0.01%     
==========================================
  Files         153      153              
  Lines       49570    49574       +4     
==========================================
+ Hits        45566    45568       +2     
- Misses       4004     4006       +2

Flag	Coverage Δ
#multiple	`90.31% <100%> (-0.01%)`	⬇️
#single	`41.8% <0%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/generic.py	`96.13% <100%> (ø)`	⬆️
pandas/util/testing.py	`85.75% <0%> (-0.21%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c2da06c...751046d. Read the comment docs.

jreback · 2018-06-18T22:18:39Z

pandas/core/generic.py

-                return None
-            return self.index[len(self) - i - 1]
+            idx = is_valid.idxmax()
+            if isinstance(is_valid[idx], ABCSeries):


what are you trying to do here?

Thanks - this block is supposed to check that for multiple counts of same index, at least one is not NA.

However, while testing this with following data, the expected output is not being returned

x = pd.DataFrame({'b': [1,np.NaN,3]}, index=[1,1,2])

Expected 1, returned None

I'll rework this patch and commit again - Thanks again for the question prompt, it was fallacy of assumption on my part (had not checked explicitly for NaN value among the multiple index)

the loop was incorrect leading to an error, not sure what I was thinking earlier :) - fixed now and committing

Fixed - rebased and committed

still not clear on the logic here, why can't this be a mirror of the 'last' logic?

jreback · 2018-06-18T22:19:21Z

pandas/tests/generic/test_generic.py

+        ({'A': [1, 2, 3]}, [1, 1, 2], 1, 2),
+        ({'A': [1, 2, 3]}, [1, 2, 2], 1, 2),
+        ({'A': [1, 2, 3, 4]}, ['d', 'd', 'd', 'd'], 'd', 'd')])
+    def test_valid_index(self, data, index, expected_first, expected_last):


do we not already have some tests for this? pls put near the others. does this duplicate existing tests at all?

do we not already have some tests for this? pls put near the others.

Thanks - The only test involving first_valid_index and last_valid_index is in ./pandas/tests/frame/test_timeseries.py - and does not specifically check for duplicate
first or last index values. Would you suggest I move this test there?

…uplicate index GH21441

KalyanGokhale · 2018-06-19T02:43:55Z

will rebase and resolve conflict with whatsnew file and push later today

GH21441

…uplicate index GH21441

GH21441

KalyanGokhale · 2018-06-19T13:55:17Z

pandas/tests/frame/test_timeseries.py

+        ({'A': [np.nan, np.nan, 3]}, [1, 1, 2], 2, 2),
+        ({'A': [1, np.nan, 3]}, [1, 2, 2], 1, 2)])
+    def test_first_last_valid(self, data, index,
+                              expected_first, expected_last):


moved tests here from pandas/tests/generic/test_generic.py - all related tests for first_valid_index and last_valid_index are co-located

GH21441

jreback · 2018-06-19T20:47:37Z

pandas/core/generic.py

-                return None
-            return self.index[len(self) - i - 1]
+            idx = is_valid.idxmax()
+            if isinstance(is_valid[idx], ABCSeries):


still not clear on the logic here, why can't this be a mirror of the 'last' logic?

GH21441

jreback · 2018-06-20T10:33:11Z

thanks @KalyanGokhale

…row index (GH21441) (#21497) (cherry picked from commit ec20207)

…row index (GH21441) (pandas-dev#21497)

Initial commit

6151181

GH21441

KalyanGokhale changed the title ~~REGR: Fixes first_valid_index when dataframe has duplicate row index (GH21441)~~ REGR/BUG: Fixes first_valid_index when dataframe has duplicate row index (GH21441) Jun 15, 2018

jschendel reviewed Jun 15, 2018

View reviewed changes

jschendel added Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version labels Jun 15, 2018

jreback requested changes Jun 15, 2018

View reviewed changes

Removed failing test from closed PR

003f801

GH21441

KalyanGokhale commented Jun 16, 2018

View reviewed changes

KalyanGokhale added 2 commits June 17, 2018 21:21

Updated logic

952758a

GH21441

Changed logic from AND to OR for chk_notna

1f4beb0

GH21441

jschendel reviewed Jun 17, 2018

View reviewed changes

Series reverse logic for how == last, edits to whatsnew

675201d

GH21441

KalyanGokhale changed the title ~~REGR/BUG: Fixes first_valid_index when dataframe has duplicate row index (GH21441)~~ REGR: Fixes first_valid_index when DataFrame or Series has duplicate row index (GH21441) Jun 17, 2018

Reverting how==last logic to original, restoring deleted test

0640279

GH21441

jreback requested changes Jun 18, 2018

View reviewed changes

Fixed the if / for for chk_notna, added test cases for NA values in d…

177a3f4

…uplicate index GH21441

KalyanGokhale added 10 commits June 19, 2018 18:46

Initial commit

e94aad5

GH21441

Removed failing test from closed PR

ff58ffd

GH21441

Updated logic

d326b0a

GH21441

Changed logic from AND to OR for chk_notna

b53bb11

GH21441

Series reverse logic for how == last, edits to whatsnew

0cb3405

GH21441

Reverting how==last logic to original, restoring deleted test

11edb51

GH21441

Fixed the if / for for chk_notna, added test cases for NA values in d…

ed410e1

…uplicate index GH21441

Rebased and updated whatsnew

05e8a99

GH21441

Moved tests to test_timeseries

01a9f7e

GH21441

Rebased and whatsnew

cbcb089

GH21441

Removed tests from test_generic

111efb0

GH21441

KalyanGokhale commented Jun 19, 2018

View reviewed changes

KalyanGokhale added 2 commits June 19, 2018 20:50

Updated test parameter name

608c09e

GH21441

Minor update to whatsnew to force TravisCI build

d8fface

GH21441

jreback requested changes Jun 19, 2018

View reviewed changes

Mirrored logic for how == first and last

751046d

GH21441

jreback added this to the 0.23.2 milestone Jun 20, 2018

jreback approved these changes Jun 20, 2018

View reviewed changes

jreback merged commit ec20207 into pandas-dev:master Jun 20, 2018

KalyanGokhale deleted the GH21441 branch June 20, 2018 10:56

jorisvandenbossche added Needs Backport and removed Needs Backport labels Jun 29, 2018

jorisvandenbossche pushed a commit that referenced this pull request Jun 29, 2018

REGR: Fixes first_valid_index when DataFrame or Series has duplicate …

421d847

…row index (GH21441) (#21497) (cherry picked from commit ec20207)

jorisvandenbossche pushed a commit that referenced this pull request Jul 2, 2018

REGR: Fixes first_valid_index when DataFrame or Series has duplicate …

d44fddb

…row index (GH21441) (#21497) (cherry picked from commit ec20207)

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

REGR: Fixes first_valid_index when DataFrame or Series has duplicate …

85541cf

…row index (GH21441) (pandas-dev#21497)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: Fixes first_valid_index when DataFrame or Series has duplicate row index (GH21441) #21497

REGR: Fixes first_valid_index when DataFrame or Series has duplicate row index (GH21441) #21497

KalyanGokhale commented Jun 15, 2018 •

edited

Loading

jschendel Jun 15, 2018

KalyanGokhale Jun 16, 2018

jschendel Jun 15, 2018 •

edited

Loading

jschendel Jun 15, 2018

jreback Jun 15, 2018

KalyanGokhale Jun 16, 2018

jreback Jun 15, 2018

jreback Jun 15, 2018

KalyanGokhale Jun 16, 2018

jreback Jun 15, 2018

KalyanGokhale Jun 16, 2018

KalyanGokhale Jun 18, 2018

jschendel Jun 17, 2018

KalyanGokhale Jun 17, 2018

codecov bot commented Jun 18, 2018 •

edited

Loading

jreback Jun 18, 2018

KalyanGokhale Jun 19, 2018

KalyanGokhale Jun 19, 2018 •

edited

Loading

KalyanGokhale Jun 19, 2018

jreback Jun 19, 2018

jreback Jun 18, 2018

KalyanGokhale Jun 19, 2018

KalyanGokhale commented Jun 19, 2018

KalyanGokhale Jun 19, 2018

jreback Jun 19, 2018

jreback commented Jun 20, 2018


		Other Fixes

		- Bug in :meth:`first_valid_index` that raised for row index with duplicate values (:issue:`21441`)

REGR: Fixes first_valid_index when DataFrame or Series has duplicate row index (GH21441) #21497

REGR: Fixes first_valid_index when DataFrame or Series has duplicate row index (GH21441) #21497

Conversation

KalyanGokhale commented Jun 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jschendel Jun 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jun 18, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KalyanGokhale Jun 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KalyanGokhale commented Jun 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jun 20, 2018

KalyanGokhale commented Jun 15, 2018 •

edited

Loading

jschendel Jun 15, 2018 •

edited

Loading

codecov bot commented Jun 18, 2018 •

edited

Loading

KalyanGokhale Jun 19, 2018 •

edited

Loading