Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: right merge not preserve row order (#27453) #27762

Closed

Conversation

bongolegend
Copy link
Contributor

@bongolegend
Copy link
Contributor Author

bongolegend commented Aug 5, 2019

Not sure how to interpret these errors. Azure is reporting a failure on test_drop_duplicates_categorical_non_bool, but that is marked to xfail in the test code. Any advice appreciated. @TomAugspurger

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a whatsnew for v1.0.0

pandas/tests/reshape/merge/test_merge.py Outdated Show resolved Hide resolved
pandas/tests/reshape/merge/test_merge.py Outdated Show resolved Hide resolved
assert_frame_equal(expected, result)


def test_left_merge_preserves_row_order():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not already accounted for in other tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None that I could find.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having these as separate tests can you just parametrize on how = left and right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also outer and inner should match left ordering

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - yeah I can do this. Just started a new job so my schedule got crammed all of a sudden.

@@ -1276,6 +1276,11 @@ def _get_join_indexers(left_keys, right_keys, sort=False, how="inner", **kwargs)
indexers into the left_keys, right_keys

"""
_how = how
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this is rather difficult to grok - could you alternately just change _right_outer_join within the same module to accept a sort keyword? I think that would make things easier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the sorting actually happens in _factorize_keys a few lines down, not downstream in _right_outer_merge. So the left and right keys need to be swapped before that call. _right_outer_join just implements _left_outer_join with left and right swapped; no sorting happens there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what I'm saying is that sorting already happens, but it happens before a distinction is made between left and right joins. So this PR doesn't move where the sorting happens, it just moves where the distinction between left and right joins is made.

@WillAyd WillAyd added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Aug 7, 2019
@WillAyd WillAyd added this to the 1.0 milestone Aug 7, 2019
@TomAugspurger
Copy link
Contributor

Azure is reporting a failure on test_drop_duplicates_categorical_non_bool, but that is marked to xfail in the test code. Any advice appreciated.

That xfail references #7996 if it's any help.

The 3.7 CI failures should be fixed if you merge master & repush.

@bongolegend
Copy link
Contributor Author

@WillAyd I cleaned up the sloppy logic here. Should be much better now.

@bongolegend
Copy link
Contributor Author

The failing tests are unrelated to this PR.

=================================== FAILURES ===================================
_ TestCategoricalSeriesAnalytics.test_drop_duplicates_categorical_non_bool[True-datetime64[D]] _
[gw1] linux -- Python 3.7.3 /home/travis/miniconda3/envs/pandas-dev/bin/python
[XPASS(strict)] GH#7996
_ TestCategoricalSeriesAnalytics.test_drop_duplicates_categorical_non_bool[False-datetime64[D]] _
[gw1] linux -- Python 3.7.3 /home/travis/miniconda3/envs/pandas-dev/bin/python
[XPASS(strict)] GH#7996
_ TestCategoricalSeriesAnalytics.test_drop_duplicates_categorical_non_bool[None-datetime64[D]] _
[gw1] linux -- Python 3.7.3 /home/travis/miniconda3/envs/pandas-dev/bin/python
[XPASS(strict)] GH#7996

source

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed response as I've been on vacation but this looks to be moving in the right direction!


if how == "right":
rkey, lkey, count = fkeys(rkey, lkey)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more appropriate variable name to use here for the left hand sides? General approach looks good but I think variable reuse can be confusing for future debugging. OK with more appropriate names even if it increases diff

pandas/tests/reshape/merge/test_merge.py Show resolved Hide resolved
assert_frame_equal(expected, result)


def test_left_merge_preserves_row_order():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having these as separate tests can you just parametrize on how = left and right?

doc/source/whatsnew/v1.0.0.rst Outdated Show resolved Hide resolved
pandas/core/reshape/merge.py Show resolved Hide resolved
pandas/tests/reshape/merge/test_merge.py Show resolved Hide resolved
assert_frame_equal(expected, result)


def test_left_merge_preserves_row_order():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this

assert_frame_equal(expected, result)


def test_left_merge_preserves_row_order():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also outer and inner should match left ordering

@bongolegend
Copy link
Contributor Author

I addressed all the issues that were brought up. Ready for another round of review.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implementation and tests lgtm just need some updates on whatsnew

doc/source/whatsnew/v1.0.0.rst Outdated Show resolved Hide resolved
doc/source/whatsnew/v1.0.0.rst Outdated Show resolved Hide resolved
@WillAyd
Copy link
Member

WillAyd commented Oct 11, 2019

Can you resolve merge conflict? Also should only need one whatsnew entry

@bongolegend
Copy link
Contributor Author

Can you resolve merge conflict? Also should only need one whatsnew entry

AS per @jreback 's request, I have a whatsnew entry to both the breaking API changes and the Reshaping sections.

@pep8speaks
Copy link

pep8speaks commented Oct 13, 2019

Hello @ncernek! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-01-23 03:21:01 UTC

@WillAyd
Copy link
Member

WillAyd commented Oct 14, 2019

AS per @jreback 's request, I have a whatsnew entry to both the breaking API changes and the Reshaping sections.

I think the request there was to have this show up in Backwards Incompatible API Changes with its own section, not to duplicate it. You can find examples of how to do that in prior whatsnew notes, like here:

https://github.com/pandas-dev/pandas/blob/master/doc/source/whatsnew/v0.25.0.rst#backwards-incompatible-api-changes

@WillAyd
Copy link
Member

WillAyd commented Oct 14, 2019

Also looks like a few lint issues need to be addressed

@bongolegend
Copy link
Contributor Author

Does anyone understand what the error is here? This should be good to merge.

@WillAyd
Copy link
Member

WillAyd commented Oct 25, 2019

This should be good to merge.

Still need to address the comment around the whatsnew. Please take a look at #27762 (comment)

The existing CI failure will probably go away on next push - we've been having intermittent timeouts the past few days that I think should be over now

@bongolegend
Copy link
Contributor Author

Still need to address the comment around the whatsnew. Please take a look at #27762 (comment)

I already took care of this is in this commit.

@TomAugspurger
Copy link
Contributor

@ncernek some of the CI failures look related. e.g. https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=20073

@WillAyd
Copy link
Member

WillAyd commented Dec 9, 2019

@ncernek any chance you can check CI failures and resolve conflicts?

@bongolegend
Copy link
Contributor Author

bongolegend commented Dec 9, 2019 via email

@TomAugspurger
Copy link
Contributor

@ncernek able to update? We're planning to release the 1.0 rc this week.

@bongolegend
Copy link
Contributor Author

Hmm I'm getting an error locally when running tests. Any ideas?

╰─$ pytest pandas/tests/reshape/merge/test_merge.py::test_merge_preserves_row_order                                      1 ↵
ImportError while loading conftest '/Users/nico/code/pandas-ncernek/pandas/conftest.py'.
pandas/__init__.py:56: in <module>
    from pandas.core.api import (
pandas/core/api.py:4: in <module>
    from pandas._libs.missing import NA
E   ImportError: cannot import name 'NA' from 'pandas._libs.missing' (/Users/nico/code/pandas-ncernek/pandas/_libs/missing.cpython-37m-darwin.so)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good, can you merge master, ping on green.

@@ -274,6 +274,7 @@ New repr for :class:`~pandas.arrays.IntervalArray`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- :class:`pandas.arrays.IntervalArray` adopts a new ``__repr__`` in accordance with other array classes (:issue:`25022`)
- :class:`pandas.core.arrays.IntervalArray` adopts a new ``__repr__`` in accordance with other array classes (:issue:`25022`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a rebasing dupe

@@ -292,6 +293,32 @@ New repr for :class:`~pandas.arrays.IntervalArray`

pd.arrays.IntervalArray.from_tuples([(0, 1), (2, 3)])

- :meth:`DataFrame.merge` now preserves right frame's row order when executing a right merge (:issue:`27453`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a sub-section (need a line under the title), and put a shorter title, move what you have to the first sentence.


.. ipython:: python

left_df = pd.DataFrame({"colors": ["blue", "red"]}, index=pd.Index([0, 1]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show these as well, call then just left and rigth


*pandas 0.25.x*

.. ipython:: python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be a code-block


.. ipython:: python

left_df.merge(right_df, left_index=True, right_index=True, how="right")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let this execute (don't put the results in)

@@ -949,6 +976,7 @@ Reshaping
- Bug in :func:`melt` where supplying mixed strings and numeric values for ``id_vars`` or ``value_vars`` would incorrectly raise a ``ValueError`` (:issue:`29718`)
- Dtypes are now preserved when transposing a ``DataFrame`` where each column is the same extension dtype (:issue:`30091`)
- Bug in :func:`merge_asof` merging on a tz-aware ``left_index`` and ``right_on`` a tz-aware column (:issue:`29864`)
- :meth:`DataFrame.merge` now preserves right frame's row order when executing a right merge (:issue:`27453`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't put this here, you already have a sub-section

@jreback jreback removed this from the 1.0 milestone Jan 5, 2020
@MarcoGorelli
Copy link
Member

Hmm I'm getting an error locally when running tests. Any ideas?

╰─$ pytest pandas/tests/reshape/merge/test_merge.py::test_merge_preserves_row_order                                      1 ↵
ImportError while loading conftest '/Users/nico/code/pandas-ncernek/pandas/conftest.py'.
pandas/__init__.py:56: in <module>
    from pandas.core.api import (
pandas/core/api.py:4: in <module>
    from pandas._libs.missing import NA
E   ImportError: cannot import name 'NA' from 'pandas._libs.missing' (/Users/nico/code/pandas-ncernek/pandas/_libs/missing.cpython-37m-darwin.so)

Can you try rebuilding the C extensions?

@bongolegend
Copy link
Contributor Author

I'm not working on this PR anymore. This dev experience is painful. I would have appreciated commits from others to help me get across the finish line a long time ago, esp. when the only issues remaining for the past few months were concerning documentation.

Separately, uncertain as to why there are hundreds of test failures. These aren't caused by my contribution that I can tell. That doesn't leave me excited to resolve anything anymore.

Anyone else is free to take ownership of this PR.

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Jan 23, 2020

I'm not working on this PR anymore.

That's OK, thanks for letting us know :)

Separately, uncertain as to why there are hundreds of test failures. These aren't caused by my contribution that I can tell. That doesn't leave me excited to resolve anything anymore.

For a start, it looks like you're calling assert_frame_equal(expected, result) (should be tm.assert_frame_equal(expected, result)), which is causing a NameError

Anyone else is free to take ownership of this PR.

Sure, will do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Right merge not preserve order
6 participants