Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional DOC and BUG fix related to merging with mix of columns and… #20475

Merged
merged 9 commits into from
Dec 4, 2018
45 changes: 38 additions & 7 deletions doc/source/merging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1125,17 +1125,42 @@ This is equivalent but less verbose and more memory efficient / faster than this
Joining with two multi-indexes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is not implemented via ``join`` at-the-moment, however it can be done using
the following code.
This is supported in a limited way, provided that the index for the right
argument is completely used in the join, and is a subset of the indices in
the left argument, as in this example:

.. ipython:: python

index = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'),
('K1', 'X2')],
names=['key', 'X'])
leftindex = pd.MultiIndex.from_product([list('abc'), list('xy'), [1, 2]],
names=['abc', 'xy', 'num'])
left = pd.DataFrame({'v1' : range(12)}, index=leftindex)
left

rightindex = pd.MultiIndex.from_product([list('abc'), list('xy')],
names=['abc', 'xy'])
right = pd.DataFrame({'v2': [100*i for i in range(1, 7)]}, index=rightindex)
right

left.join(right, on=['abc', 'xy'], how='inner')

If that condition is not satisfied, a join with two multi-indexes can be
done using the following code.

.. ipython:: python

leftindex = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'),
('K1', 'X2')],
names=['key', 'X'])
left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']},
index=index)
index=leftindex)

rightindex = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),
('K2', 'Y2'), ('K2', 'Y3')],
names=['key', 'Y'])
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=rightindex)

result = pd.merge(left.reset_index(), right.reset_index(),
on=['key'], how='inner').set_index(['key','X','Y'])
Expand All @@ -1153,7 +1178,7 @@ the following code.
Merging on a combination of columns and index levels
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. versionadded:: 0.22
.. versionadded:: 0.23

Strings passed as the ``on``, ``left_on``, and ``right_on`` parameters
may refer to either column names or index level names. This enables merging
Expand Down Expand Up @@ -1191,6 +1216,12 @@ resetting indexes.
When DataFrames are merged on a string that matches an index level in both
frames, the index level is preserved as an index level in the resulting
DataFrame.

.. note::
When DataFrames are merged using only some of the levels of a `MultiIndex`,
the extra levels will be dropped from the resulting merge. In order to
preserve those levels, use ``reset_index`` on those level names to move
those levels to columns prior to doing the merge.

.. note::

Expand Down
1 change: 1 addition & 0 deletions pandas/core/reshape/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -738,6 +738,7 @@ def _maybe_add_join_keys(self, result, left_indexer, right_indexer):
result[name] = key_col
elif result._is_level_reference(name):
if isinstance(result.index, MultiIndex):
key_col.name = name
idx_list = [result.index.get_level_values(level_name)
if level_name != name else key_col
for level_name in result.index.names]
Expand Down
21 changes: 20 additions & 1 deletion pandas/tests/test_join.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# -*- coding: utf-8 -*-

import numpy as np
from pandas import Index, DataFrame, Categorical, merge
from pandas import Index, DataFrame, Categorical, merge, MultiIndex

from pandas._libs import join as _join
import pandas.util.testing as tm
Expand Down Expand Up @@ -233,3 +233,22 @@ def test_merge_join_categorical_multiindex():
result = a.join(b, on=['Cat1', 'Int1'])
expected = expected.drop(['Cat', 'Int'], axis=1)
assert_frame_equal(expected, result)


def test_join_multi_to_multi():
leftindex = MultiIndex.from_product([list('abc'), list('xy'), [1, 2]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add this issue number here

ideally can you use the join_type fixture here (to run this for all join types)

names=['abc', 'xy', 'num'])
left = DataFrame({'v1': range(12)}, index=leftindex)

rightindex = MultiIndex.from_product([list('abc'), list('xy')],
names=['abc', 'xy'])
right = DataFrame({'v2': [100 * i for i in range(1, 7)]},
index=rightindex)

result = left.join(right, on=['abc', 'xy'], how='inner')
expected = (left.reset_index()
.merge(right.reset_index(),
on=['abc', 'xy'], how='inner')
.set_index(['abc', 'xy', 'num'])
)
assert_frame_equal(expected, result)