-
-
Notifications
You must be signed in to change notification settings - Fork 17.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
BUG/API: .merge() and .join() on category dtype columns will now pres…
…erve category dtype closes #10409 Author: Jeff Reback <jeff@reback.net> Closes #15321 from jreback/merge_cat and squashes the following commits: 3671dad [Jeff Reback] DOC: merge docs a4b2ee6 [Jeff Reback] BUG/API: .merge() and .join() on category dtype columns will now preserve the category dtype when possible
- Loading branch information
Showing
10 changed files
with
364 additions
and
71 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -746,6 +746,79 @@ The ``indicator`` argument will also accept string arguments, in which case the | |
pd.merge(df1, df2, on='col1', how='outer', indicator='indicator_column') | ||
.. _merging.dtypes: | ||
|
||
Merge Dtypes | ||
~~~~~~~~~~~~ | ||
|
||
.. versionadded:: 0.19.0 | ||
This comment has been minimized.
Sorry, something went wrong.
This comment has been minimized.
Sorry, something went wrong.
jreback
Author
Contributor
|
||
|
||
Merging will preserve the dtype of the join keys. | ||
|
||
.. ipython:: python | ||
left = pd.DataFrame({'key': [1], 'v1': [10]}) | ||
left | ||
right = pd.DataFrame({'key': [1, 2], 'v1': [20, 30]}) | ||
right | ||
We are able to preserve the join keys | ||
|
||
.. ipython:: python | ||
pd.merge(left, right, how='outer') | ||
pd.merge(left, right, how='outer').dtypes | ||
Of course if you have missing values that are introduced, then the | ||
resulting dtype will be upcast. | ||
|
||
.. ipython:: python | ||
pd.merge(left, right, how='outer', on='key') | ||
pd.merge(left, right, how='outer', on='key').dtypes | ||
.. versionadded:: 0.20.0 | ||
|
||
Merging will preserve ``category`` dtypes of the mergands. | ||
|
||
The left frame. | ||
|
||
.. ipython:: python | ||
X = pd.Series(np.random.choice(['foo', 'bar'], size=(10,))) | ||
X = X.astype('category', categories=['foo', 'bar']) | ||
left = DataFrame({'X': X, | ||
'Y': np.random.choice(['one', 'two', 'three'], size=(10,))}) | ||
left | ||
left.dtypes | ||
The right frame. | ||
|
||
.. ipython:: python | ||
right = DataFrame({'X': Series(['foo', 'bar']).astype('category', categories=['foo', 'bar']), | ||
'Z': [1, 2]}) | ||
right | ||
right.dtypes | ||
The merged result | ||
|
||
.. ipython:: python | ||
result = pd.merge(left, right, how='outer') | ||
result | ||
result.dtypes | ||
.. note:: | ||
|
||
The category dtypes must be *exactly* the same, meaning the same categories and the ordered attribute. | ||
Otherwise the result will coerce to ``object`` dtype. | ||
|
||
.. note:: | ||
|
||
Merging on ``category`` dtypes that are the same can be quite performant compared to ``object`` dtype merging. | ||
|
||
.. _merging.join.index: | ||
|
||
Joining on index | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
0.20 ?