Skip to content

Commit

Permalink
DOC: merge docs
Browse files Browse the repository at this point in the history
  • Loading branch information
jreback committed Mar 10, 2017
1 parent a4b2ee6 commit 3671dad
Show file tree
Hide file tree
Showing 2 changed files with 76 additions and 0 deletions.
3 changes: 3 additions & 0 deletions doc/source/categorical.rst
Original file line number Diff line number Diff line change
Expand Up @@ -646,6 +646,9 @@ In this case the categories are not the same and so an error is raised:
The same applies to ``df.append(df_different)``.

See also the section on :ref:`merge dtypes<merging.dtypes>` for notes about preserving merge dtypes and performance.


.. _categorical.union:

Unioning
Expand Down
73 changes: 73 additions & 0 deletions doc/source/merging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -746,6 +746,79 @@ The ``indicator`` argument will also accept string arguments, in which case the
pd.merge(df1, df2, on='col1', how='outer', indicator='indicator_column')
.. _merging.dtypes:

Merge Dtypes
~~~~~~~~~~~~

.. versionadded:: 0.19.0

Merging will preserve the dtype of the join keys.

.. ipython:: python
left = pd.DataFrame({'key': [1], 'v1': [10]})
left
right = pd.DataFrame({'key': [1, 2], 'v1': [20, 30]})
right
We are able to preserve the join keys

.. ipython:: python
pd.merge(left, right, how='outer')
pd.merge(left, right, how='outer').dtypes
Of course if you have missing values that are introduced, then the
resulting dtype will be upcast.

.. ipython:: python
pd.merge(left, right, how='outer', on='key')
pd.merge(left, right, how='outer', on='key').dtypes
.. versionadded:: 0.20.0

Merging will preserve ``category`` dtypes of the mergands.

The left frame.

.. ipython:: python
X = pd.Series(np.random.choice(['foo', 'bar'], size=(10,)))
X = X.astype('category', categories=['foo', 'bar'])
left = DataFrame({'X': X,
'Y': np.random.choice(['one', 'two', 'three'], size=(10,))})
left
left.dtypes
The right frame.

.. ipython:: python
right = DataFrame({'X': Series(['foo', 'bar']).astype('category', categories=['foo', 'bar']),
'Z': [1, 2]})
right
right.dtypes
The merged result

.. ipython:: python
result = pd.merge(left, right, how='outer')
result
result.dtypes
.. note::

The category dtypes must be *exactly* the same, meaning the same categories and the ordered attribute.
Otherwise the result will coerce to ``object`` dtype.

.. note::

Merging on ``category`` dtypes that are the same can be quite performant compared to ``object`` dtype merging.

.. _merging.join.index:

Joining on index
Expand Down

0 comments on commit 3671dad

Please sign in to comment.