REF: IntervalArray comparisons #37124

jbrockmendel · 2020-10-14T23:02:47Z

…f-ops-4

jreback · 2020-10-16T01:12:21Z

pandas/core/arrays/interval.py

-                result[i] = True
-
+        try:
+            result = np.zeros(len(self), dtype=bool)


you should do the np.zeros outside of the try/except

jreback · 2020-10-16T01:12:56Z

pandas/core/arrays/_mixins.py

@@ -139,7 +138,6 @@ def repeat(self: _T, repeats, axis=None) -> _T:
        --------
        numpy.ndarray.repeat
        """
-        nv.validate_repeat(tuple(), dict(axis=axis))


why is this no longer needed? or unrelated?

can you comment here @jbrockmendel

In IntervalArray.repeat we call self._combined.repeat(repeats, 0), which when _combined is DTA/TDA gets here with axis=0

Since there is still discussion about whether _combined is meant to stay (#37047), can you leave out this clean-up from this PR?

(unless the clean-up is not tied to having _combined, but from your comment it seems so)

pandas/core/arrays/interval.py

jorisvandenbossche

@jbrockmendel can you provide some context for this PR?
Is it one that tries to show the benefit of having a 2D array (as you mentioned you could in the meeting)? Otherwise it seems to "build further upon" the changes of #37047, which I thought we would wait with until the discussion about it is resolved.

jbrockmendel · 2020-10-16T14:49:58Z

@jorisvandenbossche this is unrelated to the backing-data. In the next commit I'll change the implementation to use left/right so as to be agnostic between the new/old backing formats.

…f-ops-4

jreback

comment otherwise lgtm.

jreback · 2020-10-20T23:06:25Z

pandas/core/arrays/_mixins.py

@@ -139,7 +138,6 @@ def repeat(self: _T, repeats, axis=None) -> _T:
        --------
        numpy.ndarray.repeat
        """
-        nv.validate_repeat(tuple(), dict(axis=axis))


can you comment here @jbrockmendel

jreback · 2020-10-20T23:20:18Z

ok i am fine with this. @jorisvandenbossche any comments.

jorisvandenbossche

Generally looks good, few comments

jorisvandenbossche · 2020-10-21T08:10:24Z

pandas/core/arrays/interval.py

-            return (self._left == other.left) & (self._right == other.right)
+                return invalid_comparison(self, other, op)
+            if isinstance(other, Interval):
+                other = type(self)._from_sequence([other])


Can't we use other.left / other.right scalars here? (and then the broadcasting for array vs scalar will work fine, and we don't have to deal with len-1 arrays?)

jorisvandenbossche · 2020-10-21T08:34:48Z

pandas/core/arrays/interval.py

+            for i, obj in enumerate(other):
+                result[i] = op(self[i], obj)
+        except TypeError:
+            # pd.NA


Do you have an example (or test) that runs into this?

Ah, I see the removed special case below in the tests. Now, this is certainly in a messy state, but I am not sure the conversion to object dtype is needed here. Currently, we still return False (and not NA as for nullable dtypes) on comparisons with NA:

In [90]: arr = pd.interval_range(0,3).array In [91]: arr == pd.NA Out[91]: array([False, False, False]) In [92]: arr[0] == pd.NA Out[92]: False

…f-ops-4

jbrockmendel · 2020-10-26T21:16:43Z

@jorisvandenbossche i think ive addressed your comments, can you double check when convenient

jorisvandenbossche · 2020-10-26T22:01:49Z

Thanks for the ping, will take a look tomorrow!

jorisvandenbossche

Thanks for the update!

jorisvandenbossche · 2020-10-27T10:38:05Z

pandas/core/arrays/_mixins.py

@@ -139,7 +138,6 @@ def repeat(self: _T, repeats, axis=None) -> _T:
        --------
        numpy.ndarray.repeat
        """
-        nv.validate_repeat(tuple(), dict(axis=axis))


Since there is still discussion about whether _combined is meant to stay (#37047), can you leave out this clean-up from this PR?

(unless the clean-up is not tied to having _combined, but from your comment it seems so)

jorisvandenbossche · 2020-10-27T10:48:21Z

pandas/core/arrays/interval.py

+                result[i] = op(self[i], obj)
+            except TypeError:
+                if obj is NA:
+                    # github.com/pandas-dev/pandas/pull/37124#discussion_r509095092


Can you add a comment with an actual explanation instead of (only) the link?

…f-ops-4

jorisvandenbossche · 2020-10-28T07:55:19Z

pandas/core/arrays/interval.py

@@ -583,6 +583,8 @@ def _cmp_method(self, other, op):
                result[i] = op(self[i], obj)
            except TypeError:
                if obj is NA:
+                    # comparison returns NA, which we (for now?) treat like


It's actually only for comparison of np.nan with pd.NA, comparison with Intervals is already (for now) returning False

can you suggest an edit so i can just hit "commit" on it

updated per request + green

…f-ops-4

jorisvandenbossche

Sorry for the slow follow-up, thanks for the update!

… (#37655) * Moving the file test_frame.py to a new directory * Сreated file test_frame_color.py * Transfer tests of test_frame.py to test_frame_color.py * PEP 8 fixes * Transfer tests of test_frame.py to test_frame_groupby.py and test_frame_subplots.py * Removing unnecessary imports * PEP 8 fixes * Fixed class name * Transfer tests of test_frame.py to test_frame_subplots.py * Transfer tests of test_frame.py to test_frame_groupby.py, test_frame_subplots.py, test_frame_color.py * Changed class names * Removed unnecessary imports * Removed import * catch FutureWarnings (#37587) * TST/REF: collect indexing tests by method (#37590) * REF: prelims for single-path setitem_with_indexer (#37588) * ENH: __repr__ for 2D DTA/TDA (#37164) * CLN: de-duplicate _validate_where_value with _validate_setitem_value (#37595) * TST/REF: collect tests by method (#37589) * TST/REF: move remaining setitem tests from test_timeseries * TST/REF: rehome test_timezones test * move misplaced arithmetic test * collect tests by method * move misplaced file * REF: Categorical.is_dtype_equal -> categories_match_up_to_permutation (#37545) * CLN refactor non-core (#37580) * refactor core/computation (#37585) * TST/REF: share method tests between DataFrame and Series (#37596) * BUG: Index.where casting ints to str (#37591) * REF: IntervalArray comparisons (#37124) * regression fix for merging DF with datetime index with empty DF (#36897) * ERR: fix error message in Period for invalid frequency (#37602) * CLN: remove rebox_native (#37608) * TST/REF: tests.generic (#37618) * TST: collect tests by method (#37617) * TST/REF: collect test_timeseries tests by method * misplaced DataFrame.values tst * misplaced dataframe.values test * collect test by method * TST/REF: share tests across Series/DataFrame (#37616) * Gh 36562 typeerror comparison not supported between float and str (#37096) * docs: fix punctuation (#37612) * REGR: pd.to_hdf(..., dropna=True) not dropping missing rows (#37564) * parametrize set_axis tests (#37619) * CLN: clean color selection in _matplotlib/style (#37203) * DEPR: DataFrame/Series.slice_shift (#37601) * REF: re-use validate_setitem_value in Categorical.fillna (#37597) * PERF: release gil for ewma_time (#37389) * BUG: Groupy dropped nan groups from result when grouping over single column (#36842) * ENH: implement timeszones support for read_json(orient='table') and astype() from 'object' (#35973) * REF/BUG/TYP: read_csv shouldn't close user-provided file handles (#36997) * BUG/REF: read_csv shouldn't close user-provided file handles * get_handle: typing, returns is_wrapped, use dataclass, and make sure that all created handlers are returned * remove unused imports * added IOHandleArgs.close * added IOArgs.close * mostly comments * move memory_map from TextReader to CParserWrapper * moved IOArgs and IOHandles * more comments Co-authored-by: Jeff Reback <jeff@reback.net> * more typing checks to pre-commit (#37539) * TST: 32bit dtype compat test_groupby_dropna (#37623) * BUG: Metadata propagation for groupby iterator (#37461) * BUG: read-only values in cython funcs (#37613) * CLN refactor core/arrays (#37581) * Fixed Metadata Propogation in DataFrame (#37381) * TYP: add Shape alias to pandas._typing (#37128) * DOC: Fix typo (#37630) * CLN: parametrize test_nat_comparisons (#37195) * dataframe dataclass docstring updated (#37632) * refactor core/groupby (#37583) * BUG: set index of DataFrame.apply(f) when f returns dict (#37544) (#37606) * BUG: to_dict should return a native datetime object for NumPy backed dataframes (#37571) * ENH: memory_map for compressed files (#37621) * DOC: add example & prose of slicing with labels when index has duplicate labels (#36814) * DOC: add example & prose of slicing with labels when index has duplicate labels #36251 * DOC: proofread the sentence. Co-authored-by: Jun Kudo <jun-lab@junnoMacBook-Pro.local> * DOC: Fix typo (#37636) "columns(s)" sounded odd, I believe it was supposed to be just "column(s)". * CI: troubleshoot win py38 builds (#37652) * TST/REF: collect indexing tests by method (#37638) * TST/REF: collect tests for get_numeric_data (#37634) * misplaced loc test * TST/REF: collect get_numeric_data tests * REF: de-duplicate _validate_insert_value with _validate_scalar (#37640) * CI: catch windows py38 OSError (#37659) * share test (#37679) * TST: match matplotlib warning message (#37666) * TST: match matplotlib warning message * TST: match full message * pd.Series.loc.__getitem__ promotes to float64 instead of raising KeyError (#37687) * REF/TST: misplaced Categorical tests (#37678) * REF/TST: collect indexing tests by method (#37677) * CLN: only call _wrap_results one place in nanmedian (#37673) * TYP: Index._concat (#37671) * BUG: CategoricalIndex.equals casting non-categories to np.nan (#37667) * CLN: _replace_single (#37683) * REF: simplify _replace_single by noting regex kwarg is bool * Annotate * CLN: remove never-False convert kwarg * TYP: make more internal funcs keyword-only (#37688) * REF: make Series._replace_single a regular method (#37691) * REF: simplify cycling through colors (#37664) * REF: implement _wrap_reduction_result (#37660) * BUG: preserve fold in Timestamp.replace (#37644) * CLN: Clean indexing tests (#37689) * TST: fix warning for pie chart (#37669) * PERF: reverted change from commit 7d257c6 to solve issue #37081 (#37426) * DataFrameGroupby.boxplot fails when subplots=False (#28102) * ENH: Improve error reporting for wrong merge cols (#37547) * Transfer tests of test_frame.py to test_frame_color.py * PEP8 * Fixes for linter * Сhange pd.DateFrame to DateFrame * Move inconsistent namespace check to pre-commit, fixup more files (#37662) * check for inconsistent namespace usage * doc * typos * verbose regex * use verbose flag * use verbose flag * match both directions * add test * don't import annotations from future * update extra couple of cases * 🚚 rename * typing * don't use subprocess * don't type tests * use pathlib * REF: simplify NDFrame.replace, ObjectBlock.replace (#37704) * REF: implement Categorical.encode_with_my_categories (#37650) * REF: implement Categorical.encode_with_my_categories * privatize * BUG: unpickling modifies Block.ndim (#37657) * REF: dont support dt64tz in nanmean (#37658) * CLN: Simplify groupby head/tail tests (#37702) * Bug in loc raised for numeric label even when label is in Index (#37675) * REF: implement replace_regex, remove unreachable branch in ObjectBlock.replace (#37696) * TYP: Check untyped defs (except vendored) (#37556) * REF: remove ObjectBlock._replace_single (#37710) * Transfer tests of test_frame.py to test_frame_color.py * TST/REF: collect indexing tests by method (#37590) * PEP8 * Сhange DateFrame to pd.DateFrame * Сhange pd.DateFrame to DateFrame * Removing imports * Bug fixes * Bug fixes * Fix incorrect merge * test_frame_color.py edit * Transfer tests of test_frame.py to test_frame_color.py, test_frame_groupby.py and test_frame_subplots.py * Removing unnecessary imports * PEP8 * # Conflicts: # pandas/tests/plotting/frame/test_frame.py # pandas/tests/plotting/frame/test_frame_color.py # pandas/tests/plotting/frame/test_frame_subplots.py * Moving the file test_frame.py to a new directory * Transfer tests of test_frame.py to test_frame_color.py, test_frame_groupby.py and test_frame_subplots.py * Removing unnecessary imports * PEP8 * CLN: clean categorical indexes tests (#37721) * Fix merge error * PEP 8 fixes * Fix merge error * Removing unnecessary imports * PEP 8 fixes * Fixed class name * Transfer tests of test_frame.py to test_frame_subplots.py * Transfer tests of test_frame.py to test_frame_groupby.py, test_frame_subplots.py, test_frame_color.py * Changed class names * Removed unnecessary imports * Removed import * TST/REF: collect indexing tests by method (#37590) * TST: match matplotlib warning message (#37666) * TST: match matplotlib warning message * TST: match full message * TST: fix warning for pie chart (#37669) * Transfer tests of test_frame.py to test_frame_color.py * PEP8 * Fixes for linter * Сhange pd.DateFrame to DateFrame * Transfer tests of test_frame.py to test_frame_color.py * PEP8 * Сhange DateFrame to pd.DateFrame * Сhange pd.DateFrame to DateFrame * Removing imports * Bug fixes * Bug fixes * Fix incorrect merge * test_frame_color.py edit * Fix merge error * Fix merge error * Removing unnecessary features * Resolving Commit Conflicts daf999f 365d843 * black fix Co-authored-by: jbrockmendel <jbrockmendel@gmail.com> Co-authored-by: Marco Gorelli <m.e.gorelli@gmail.com> Co-authored-by: Philip Cerles <philip.cerles@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Sven <sven.schellenberg@paradynsystems.com> Co-authored-by: Micael Jarniac <micael@jarniac.com> Co-authored-by: Andrew Wieteska <48889395+arw2019@users.noreply.github.com> Co-authored-by: Maxim Ivanov <41443370+ivanovmg@users.noreply.github.com> Co-authored-by: Erfan Nariman <34067903+erfannariman@users.noreply.github.com> Co-authored-by: Fangchen Li <fangchen.li@outlook.com> Co-authored-by: patrick <61934744+phofl@users.noreply.github.com> Co-authored-by: attack68 <24256554+attack68@users.noreply.github.com> Co-authored-by: Torsten Wörtwein <twoertwein@users.noreply.github.com> Co-authored-by: Jeff Reback <jeff@reback.net> Co-authored-by: Janus <janus@insignificancegalore.net> Co-authored-by: Joel Whittier <rootbeerfriend@gmail.com> Co-authored-by: taytzehao <jtth95@gmail.com> Co-authored-by: ma3da <34522496+ma3da@users.noreply.github.com> Co-authored-by: junk <juntrp0207@gmail.com> Co-authored-by: Jun Kudo <jun-lab@junnoMacBook-Pro.local> Co-authored-by: Alex Kirko <alexander.kirko@gmail.com> Co-authored-by: Yassir Karroum <ukarroum17@gmail.com> Co-authored-by: Kaiqi Dong <kaiqi@kth.se> Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com> Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>

jbrockmendel added 3 commits October 14, 2020 14:47

ENH: IntervalArray comparisons

a3b7c45

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

60197d2

…f-ops-4

CLN: get IntervalIndex comparisons from IntervalArray

cf20846

jreback requested changes Oct 16, 2020

View reviewed changes

jreback added the Interval Interval data type label Oct 16, 2020

jorisvandenbossche requested changes Oct 16, 2020

View reviewed changes

jbrockmendel closed this Oct 16, 2020

jbrockmendel reopened this Oct 16, 2020

jbrockmendel added 2 commits October 20, 2020 09:12

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

d3c1041

…f-ops-4

update per requests

fcfe47d

jreback requested changes Oct 20, 2020

View reviewed changes

jreback added this to the 1.2 milestone Oct 20, 2020

jreback added the Refactor Internal refactoring of code label Oct 20, 2020

jreback approved these changes Oct 20, 2020

View reviewed changes

jorisvandenbossche requested changes Oct 21, 2020

View reviewed changes

jbrockmendel added 4 commits October 23, 2020 16:37

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

3dfc008

…f-ops-4

Avoid having to tile

fa6cecd

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

0d32acd

…f-ops-4

handle NA per suggestion

ff640ea

jorisvandenbossche reviewed Oct 27, 2020

View reviewed changes

jbrockmendel added 2 commits October 27, 2020 11:06

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

f501b4e

…f-ops-4

comment

247ce90

jorisvandenbossche reviewed Oct 28, 2020

View reviewed changes

jbrockmendel added 3 commits October 31, 2020 14:04

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

84b0409

…f-ops-4

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

1ea125d

…f-ops-4

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

6e43726

…f-ops-4

update comment

badb99d

jorisvandenbossche approved these changes Nov 3, 2020

View reviewed changes

jorisvandenbossche merged commit 337bf20 into pandas-dev:master Nov 3, 2020

jbrockmendel deleted the ref-ops-4 branch November 3, 2020 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REF: IntervalArray comparisons #37124

REF: IntervalArray comparisons #37124

jbrockmendel commented Oct 14, 2020

jreback Oct 16, 2020

jbrockmendel Oct 16, 2020

jreback Oct 16, 2020

jreback Oct 20, 2020

jbrockmendel Oct 20, 2020

jorisvandenbossche Oct 27, 2020

jorisvandenbossche left a comment

jbrockmendel commented Oct 16, 2020

jreback left a comment

jreback Oct 20, 2020

jreback commented Oct 20, 2020

jorisvandenbossche left a comment

jorisvandenbossche Oct 21, 2020

jorisvandenbossche Oct 21, 2020

jorisvandenbossche Oct 21, 2020

jbrockmendel commented Oct 26, 2020

jorisvandenbossche commented Oct 26, 2020

jorisvandenbossche left a comment

jorisvandenbossche Oct 27, 2020

jorisvandenbossche Oct 27, 2020

jorisvandenbossche Oct 28, 2020

jbrockmendel Oct 28, 2020

jbrockmendel Nov 2, 2020

jorisvandenbossche left a comment

REF: IntervalArray comparisons #37124

REF: IntervalArray comparisons #37124

Conversation

jbrockmendel commented Oct 14, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

jbrockmendel commented Oct 16, 2020

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Oct 20, 2020

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Oct 26, 2020

jorisvandenbossche commented Oct 26, 2020

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment