BUG: DataFrame.merge(suffixes=) does not respect None #24819

charlesdong1991 · 2019-01-17T20:28:06Z

closes BUG: DataFrame.merge(suffixes=) does not respect None #24782
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2019-01-17T20:28:19Z

Hello @charlesdong1991! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on February 04, 2019 at 17:16 Hours UTC

codecov · 2019-01-17T21:04:04Z

Codecov Report

Merging #24819 into master will decrease coverage by 49.47%.
The diff coverage is 0%.

@@             Coverage Diff             @@
##           master   #24819       +/-   ##
===========================================
- Coverage   92.38%    42.9%   -49.48%     
===========================================
  Files         166      166               
  Lines       52379    52385        +6     
===========================================
- Hits        48392    22478    -25914     
- Misses       3987    29907    +25920

Flag	Coverage Δ
#multiple	`?`
#single	`42.9% <0%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/reshape/merge.py	`9.43% <0%> (-84.84%)`	⬇️
pandas/io/formats/latex.py	`0% <0%> (-100%)`	⬇️
pandas/core/categorical.py	`0% <0%> (-100%)`	⬇️
pandas/io/sas/sas_constants.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/plotting.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/converter.py	`0% <0%> (-100%)`	⬇️
pandas/io/formats/html.py	`0% <0%> (-99.35%)`	⬇️
pandas/core/groupby/categorical.py	`0% <0%> (-95.46%)`	⬇️
pandas/io/sas/sas7bdat.py	`0% <0%> (-91.17%)`	⬇️
pandas/io/sas/sas_xport.py	`0% <0%> (-90.15%)`	⬇️
... and 124 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 512830b...cbb36b2. Read the comment docs.

codecov · 2019-01-17T21:04:05Z

Codecov Report

Merging #24819 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #24819      +/-   ##
==========================================
- Coverage   92.37%   92.37%   -0.01%     
==========================================
  Files         166      166              
  Lines       52420    52406      -14     
==========================================
- Hits        48423    48409      -14     
  Misses       3997     3997

Flag	Coverage Δ
#multiple	`90.79% <100%> (-0.01%)`	⬇️
#single	`42.87% <0%> (-0.02%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/reshape/merge.py	`94.48% <ø> (ø)`	⬆️
pandas/core/internals/managers.py	`96.06% <100%> (-0.01%)`	⬇️
pandas/core/common.py	`98.37% <0%> (-0.04%)`	⬇️
pandas/core/frame.py	`96.81% <0%> (-0.02%)`	⬇️
pandas/io/formats/html.py	`99.34% <0%> (-0.01%)`	⬇️
pandas/core/accessor.py	`98.79% <0%> (ø)`	⬆️
pandas/core/ops.py	`94.28% <0%> (ø)`	⬆️
pandas/core/internals/blocks.py	`94.17% <0%> (ø)`	⬆️
pandas/core/sparse/scipy_sparse.py	`100% <0%> (ø)`	⬆️
pandas/core/generic.py	`96.63% <0%> (ø)`	⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb43726...71729b2. Read the comment docs.

jschendel

Thanks! Can you add a whatsnew note?

pandas/core/reshape/merge.py

pandas/tests/reshape/merge/test_merge.py

jreback

There is a bug here, where None in a tuple is coerced to a string, however, I am a bit iffy about automagically coercing None to empty string. Yes its convient, but if you have non-string labels then it is unexpected.

jreback · 2019-01-18T12:33:39Z

pandas/tests/reshape/merge/test_merge.py

+    b = pd.DataFrame({col2: [4, 5, 6]})
+
+    df = a.merge(b, left_index=True, right_index=True, suffixes=suffixes)
+    assert df.columns.tolist() == expected_cols


fully construct the result frame here

thanks, will change

jreback · 2019-01-18T12:40:41Z

pandas/core/reshape/merge.py

@@ -488,7 +488,13 @@ def __init__(self, left, right, how='inner', on=None,
        self.right_on = com.maybe_make_list(right_on)

        self.copy = copy
+


does this not break if suffixes=None is passed?

it will indeed break, but do we expect to get None for suffixes? the default is ("_x", "_y"). I could understand that people may want to use suffixes like: (None, "_y") if they want to keep the name of one of original columns... but use case for suffixes=None seems rare...

jreback · 2019-01-18T12:41:29Z

doc/source/whatsnew/v0.24.0.rst

@@ -1210,6 +1210,7 @@ update the ``ExtensionDtype._metadata`` tuple to match the signature of your
 - :meth:`Series.unstack` and :meth:`DataFrame.unstack` no longer convert extension arrays to object-dtype ndarrays. Each column in the output ``DataFrame`` will now have the same dtype as the input (:issue:`23077`).
 - Bug when grouping :meth:`Dataframe.groupby()` and aggregating on ``ExtensionArray`` it was not returning the actual ``ExtensionArray`` dtype (:issue:`23227`).
 - Bug in :func:`pandas.merge` when merging on an extension array-backed column (:issue:`23020`).
+- Bug in :func:`pandas.merge` when setting None in suffixes (:issue: `24782`).


be more explicit on what is changing here

thanks, I will change later once collecting more feedbacks from other committers. @jreback

jreback · 2019-01-18T12:42:42Z

@TomAugspurger @jorisvandenbossche

TomAugspurger

The docstring needs to be updated. Currently it's

suffixes : tuple of (str, str), default ('_x', '_y')
    Suffix to apply to overlapping column names in the left and right
    side, respectively. To raise an exception on overlapping columns use
    (False, False).

That'll need to be something like tuple of (str or None, str or None).

Even that type isn't quite right... Since (False, False) is apparently valid.

TomAugspurger · 2019-01-18T16:58:28Z

pandas/tests/reshape/merge/test_merge.py

+
+@pytest.mark.parametrize("col1, col2, suffixes, expected_cols", [
+    (0, 0, ("", "_dup"), ["0", "0_dup"]),
+    (0, 0, (None, "_dup"), ["0", "0_dup"]),


I don't think this get's the expected output of @simonjayhawkins in #24782, does it?

IIUC, @simonjayhawkins expected for (None, '_dup') to not cast the original input to a string. So the expected columns would be [0, "0_dup"]. Is that correct?

Even that type isn't quite right... Since (False, False) is apparently valid.

Yeah, it looks like pretty much anything is actually valid right now, e.g.

In [8]: df.merge(df, left_index=True, right_index=True, suffixes=(np.sum, pd.DataFrame)) Out[8]: 0<function sum at 0x000002550602C840> 0<class 'pandas.core.frame.DataFrame'> 0 1 1 In [9]: df.merge(df, left_index=True, right_index=True, suffixes=(list('abc'), pd.Series(list('xyz')))) Out[9]: 0['a', 'b', 'c'] 00 x\n1 y\n2 z\ndtype: object 0 1 1 1 1

Looks to be caused by items_overlap_with_suffix just hitting whatever is passed with a format:

pandas/pandas/core/internals/managers.py

Lines 1959 to 1985 in 08f92c4

def items_overlap_with_suffix(left, lsuffix, right, rsuffix):

"""

If two indices overlap, add suffixes to overlapping entries.

If corresponding suffix is empty, the entry is simply converted to string.

"""

to_rename = left.intersection(right)

if len(to_rename) == 0:

return left, right

else:

if not lsuffix and not rsuffix:

raise ValueError('columns overlap but no suffix specified: '

'{rename}'.format(rename=to_rename))

def lrenamer(x):

if x in to_rename:

return '{x}{lsuffix}'.format(x=x, lsuffix=lsuffix)

return x

def rrenamer(x):

if x in to_rename:

return '{x}{rsuffix}'.format(x=x, rsuffix=rsuffix)

return x

return (_transform_index(left, lrenamer),

_transform_index(right, rrenamer))

IIUC, @simonjayhawkins expected for (None, '_dup') to not cast the original input to a string.

Yeah, I overlooked this earlier but agree; it makes sense to me that None should not cast to string and instead retain the original type of the column label in question.

thanks! @TomAugspurger @jschendel i overlooked it... i changed the code a bit, and added several more test cases
And @jschendel I also feel like we should add some restrictions to the suffix type, apparently, it looks wired if any type is given as suffixes... is it worth it if i open another PR to address it?

simonjayhawkins · 2019-01-20T08:35:40Z

pandas/tests/reshape/merge/test_merge.py

+    expected = pd.DataFrame([[1, 4], [2, 5], [3, 6]],
+                            columns=expected_cols)
+
+    result = a.merge(b, left_index=True, right_index=True, suffixes=suffixes)


many tests in test_merge.py repeat using both the left.merge(right, **kwargs) and the pd.merge(left, right, **kwargs) syntax.

i'm not sure how important that is, so unless one of the maintainers comments otherwise this lgtm

thanks for your feedback, @simonjayhawkins ! if one of maintainers brings this up and wants it to be standardised in this PR, i will then fix it! otherwise, as you said, I will open another PR to tackle it!

simonjayhawkins · 2019-01-20T08:48:44Z

does this not break if suffixes=None is passed?

i guess that since duplicate labels are allowed in an index then suffixes=None should be a valid option. future PR though.

TomAugspurger · 2019-01-21T21:22:29Z

pandas/core/reshape/merge.py

@@ -161,7 +161,8 @@ def merge_ordered(left, right, on=None,
        Interpolation method for data
    suffixes : 2-length sequence (tuple, list, ...)
        Suffix to apply to overlapping column names in the left and right
-        side, respectively
+        side, respectively. Except for tuple of (str, str), it also allows


I don't understand this. Is suffixes=[None, None] valid?

Would the following be correct?

Length-2 sequence. Each element should be a str or None. * str : the string is appended to the overlapping column label. * None : the column label is left as-is.

Ah, I see now that it's invalid, because we won't return a frame with duplicate labels. But presumably, a [None, 'a'] is valid (could you add a test for that?). So perhaps in the docstring note that at least one of the values must not be None.

thanks Tom for your comment! @TomAugspurger
you are right, a (None, None) will raise an error which was in the code already. And a (None, 'a') will be valid, and None refers to the column label is left as it is... I think i already had this (None, 'a') in the pytest file. Shall I add another one?

I didn't see a test where suffixes was a list like [None, 'a'], just tuples.

ahh, sorry, i misunderstood... my bad...
Thx for your review! tests added! And also docstring is updated! @TomAugspurger

Ah, I see now that it's invalid, because we won't return a frame with duplicate labels

So perhaps in the docstring note that at least one of the values must not be None.

@TomAugspurger Not following you there... we do return frames with duplicate labels if 1) we don't pass "suffixes", or 2) we pass "suffixes" such that they result duplicated labels. So I don't see anything offensive in having suffixes=(None, None) result in duplicated labels.

EDIT: had missed @simonjayhawkins 's comment below, stating basically the same thing.

TomAugspurger · 2019-01-21T21:56:42Z

pandas/core/reshape/merge.py

@@ -161,8 +161,9 @@ def merge_ordered(left, right, on=None,
        Interpolation method for data
    suffixes : 2-length sequence (tuple, list, ...)
        Suffix to apply to overlapping column names in the left and right
-        side, respectively. Except for tuple of (str, str), it also allows
-        tuple of (None, str) or (str, None)
+        side, respectively. And each element should be either a str or None,


Sorry, this is still a bit confusing, and I don't think listing each combination really helps. How about.

suffixes : Sequence A length-2 sequence where each element is a optionally a string indicating the suffix to add to overlapping column names in `left` and `right` respectively. Pass a value of `None` instead of a string to indicate that the column names from `left` or `right` should be left as-is, with no suffix. At least one of the values must not be None.

Thanks, Tom! and sorry for my bad docstring! and yours does look much clearer! I just removed a redundant 'a' ^^ @TomAugspurger

simonjayhawkins · 2019-01-21T22:13:04Z

pandas/core/reshape/merge.py

-    suffixes : 2-length sequence (tuple, list, ...)
-        Suffix to apply to overlapping column names in the left and right
-        side, respectively
+    suffixes : Sequence


the docstring will also need the default values included

oh, yeah, added! thx for the carefulness!! @simonjayhawkins

simonjayhawkins · 2019-01-21T22:31:28Z

pandas/tests/reshape/merge/test_merge.py

+    ("a", "a", ("_x", None), ["a_x", "a"]),
+    ("a", "b", ("_x", None), ["a", "b"]),
+    ("a", "a", [None, "_x"], ["a", "a_x"]),
+    (0, 0, ["_a", None], ["0_a", 0])


for added assurance can you also add a couple of regression tests here

("a", "a", None, ["a_x", "a_y"]), (0, 0, None, ["0_x", "0_y"])

should be sufficient.

you mean ("a", "a", ("_x", "_y"), ["a_x", "a_y"]) probably? suffix cannot accept None for now. @simonjayhawkins

thanks! just added a small new test function to cover it!

does this not break if suffixes=None is passed?

i guess that since duplicate labels are allowed in an index then suffixes=None should be a valid option. future PR though.

suffixes=None currently raises TypeError: 'NoneType' object is not iterable.

Is suffixes=[None, None] valid?

suffixes=[None, None] currently raises ValueError: columns overlap but no suffix specified:

so i guess suffixes=None should be equivalent to suffixes=[None, None]. future PR thoiugh.

pandas/core/internals/managers.py

simonjayhawkins · 2019-01-22T00:35:27Z

pandas/tests/reshape/merge/test_merge.py

+
+@pytest.mark.parametrize("suffixes", [
+    (None, None),
+    ('', None),


with the changes you've made the left label would be cast to a string and the right label unchanged. so technically this shouldn't fail.

you are right, they are indeed different types, however, before applying the renamer function, there is a suffix check, which means None and '' cannot be assigned at the same time, so ('', None) will raise an error.

if not lsuffix and not rsuffix: raise ValueError('columns overlap but no suffix specified: ' '{rename}'.format(rename=to_rename))

Do you want it to be changed? @simonjayhawkins

the docstring reads At least one of the values must not be None. and ('',None) satisfies that.

Do you want it to be changed?

what would be the extent of the changes to make this work?

added already!
allow (None, '') for number column, and if column name is string, (None, '') or (None, None) will raise error

simonjayhawkins · 2019-01-22T14:30:59Z

pandas/core/reshape/merge.py

@@ -165,7 +165,8 @@ def merge_ordered(left, right, on=None,
        `left` and `right` respectively. Pass a value of `None` instead
        of a string to indicate that the column name from `left` or
        `right` should be left as-is, with no suffix. At least one of the
-        values must not be None.
+        values must not be None. A combination of `''` and `None` is will
+        raise error for columns which type is string


my opinion is that adding A combination of `''` and `None` is will raise error for columns which type is string to the docstring is probably unnecessary.

it makes me wonder whether the (None,None) combination should be addressed within this PR after all. if the 'columns overlap but no suffix specified' check is removed, what breaks?

what do you mean (None, None) combination, i am a bit confused? we don't expect column names to be overlapped, do we?
I think now the suffixes takes in length-2 sequence correctly given different types of suffix, the only thing now is suffixes=None will raise an error, since now it requires a length-2 sequence, the code should slightly be changed in _MergeOperation class. But looks like it's out of scope of this PR although it's easy to fix...
I would advise to create another PR to address this issue, and i can work on this new PR to tackle it. and thanks for your review, look forward to hearing your opinion! @simonjayhawkins

what do you mean (None, None) combination

i mean the ability to return a DataFrame with duplicate column labels if so desired. i'm not sure why this isn't allowed. maybe it breaks things or maybe that check is not necessary.

if it was allowed would it make for a simpler api description in the docstring?

we could leave it for maintainer's discussion, if they want it, I will open a follow-up PR to address it.
Thanks for your feedback, and I just shortened the docstring! @simonjayhawkins

charlesdong1991 · 2019-01-27T11:35:08Z

@jreback @TomAugspurger @jschendel @simonjayhawkins I just moved whatsnew from 0.24.0 to 0.24.1 because the new version of Pandas had been released. Please take a look at this PR to see what else I should change. Thanks!

jreback · 2019-02-01T21:00:15Z

@charlesdong1991 this would be for 0.25.0 anyhow, pls merge master

charlesdong1991 · 2019-02-03T10:54:14Z

After discussing with @jorisvandenbossche , and as he suggested, this PR will only solve the case when at most one element is None, like (None, str) or (str, None) will remain column as-is when suffix is None. And for cases like (None, None) or None or False, (False, False), I will open a follow-up issue and PR to discuss and fix.

jorisvandenbossche · 2019-02-03T10:57:45Z

To be clear, the reason to leave out the (None, None) for now is that it already does something else: it does the same as (False, False) (raising an error if there will be duplicates). This is not documented or tested, so we could maybe just change it (or otherwise first deprecate it), but let's leave that for the follow-up issue for discussion.

charlesdong1991 · 2019-02-03T12:44:51Z

I just did another change based on what we discussed before i left the sprint @jorisvandenbossche , and @toobaz please also take a look to see if we could merge and close this, and then, I will open another issue and PR addressing the potential issue and change for suffixes.

jorisvandenbossche

Looks good! Some small comments.
Can you also update the docstring of pd.merge ?

jorisvandenbossche · 2019-02-03T15:32:43Z

pandas/tests/reshape/merge/test_merge.py

+    a = pd.DataFrame({col1: [1, 2, 3]})
+    b = pd.DataFrame({col2: [3, 4, 5]})
+
+    msg = "columns overlap but no suffix specified"


Can you add here a comment like # TODO reconsider current behaviour of raising, see #.... (with a link to the issue).
Just in case later on when we want to change this, we won't think like "oh no, we cannot change it, because it is tested behaviour")

thanks, added!

jorisvandenbossche · 2019-02-03T15:33:17Z

pandas/core/reshape/merge.py

-    suffixes : 2-length sequence (tuple, list, ...)
-        Suffix to apply to overlapping column names in the left and right
-        side, respectively
+    suffixes : Sequence or None, default is ("_x", "_y")


The or None can be removed I think (we removed that change from this PR)

jreback

lgtm. minor comments, ping on green.

jreback · 2019-02-04T13:23:45Z

doc/source/whatsnew/v0.25.0.rst

@@ -181,6 +181,7 @@ Groupby/Resample/Rolling
 Reshaping
 ^^^^^^^^^

+- Bug in :func:`pandas.merge` doesn't work correctly if None is in suffixes (:issue: `24782`).


can you be a bit more clear on what the previous sympton was, instead of 'doesn't work correctly'

double backticks on None

no space after the colon (:issue:`24782`)

jreback · 2019-02-04T13:24:31Z

pandas/core/internals/managers.py

-                return '{x}{lsuffix}'.format(x=x, lsuffix=lsuffix)
-            return x
+        def renamer(x, suffix):
+            """Rename the left and right indices.


can you make a proper doc-string (Parameters / Returns)

jreback · 2019-02-04T13:25:59Z

pandas/tests/reshape/merge/test_merge.py

+    (0, 0, [None, None]),
+    (0, 0, (None, ""))
+])
+def test_merge_error(col1, col2, suffixes):


test_merge_suffix_error

jreback · 2019-02-04T13:26:24Z

pandas/core/reshape/merge.py

+        `left` and `right` respectively. Pass a value of `None` instead
+        of a string to indicate that the column name from `left` or
+        `right` should be left as-is, with no suffix. At least one of the
+        values must not be None.


can you add a versionchanged 0.25.0 here

jreback · 2019-02-04T13:27:07Z

suffixes=None is still invalid right (e.g. passing an actual None value)? this has a test?

charlesdong1991 · 2019-02-04T15:33:33Z

thanks @jreback for your review... all changed based on review, pls let me know if you have other questions.

jreback · 2019-02-06T03:51:09Z

thanks @charlesdong1991

jorisvandenbossche · 2019-02-06T07:23:17Z

@charlesdong1991 do you want open a new issue for changing the (None, None) case and allowing None and False ?

charlesdong1991 · 2019-02-06T09:07:21Z

@jorisvandenbossche yes! I will open a new issue addressing None related case later today or tomorrow!!

* upstream/master: DOC: Fix validation type error SA05 (pandas-dev#25208) REF: Add more pytest idiom to test_holiday.py (pandas-dev#25204) DOC/CLN: Fix errors in Series docstrings (pandas-dev#24945) TST: follow-up to Test nested pandas array pandas-dev#24993 (pandas-dev#25155) modernize compat imports (pandas-dev#25192) fix MacPython pandas-wheels failure (pandas-dev#25186) BUG: DataFrame.merge(suffixes=) does not respect None (pandas-dev#24819) DEPR: remove PanelGroupBy, disable DataFrame.to_panel (pandas-dev#25047) DOC: update docstring for series.nunique (pandas-dev#25116) CLN: Use ABCs in set_index (pandas-dev#25128) BLD: pin cython language level to '2' (pandas-dev#25145) DOC: Updates to Timestamp document (pandas-dev#25163) STY: use pytest.raises context manager (indexes/multi) (pandas-dev#25175) Fixed tuple to List Conversion in Dataframe class (pandas-dev#25089)

simonjayhawkins · 2019-02-09T00:11:48Z

@charlesdong1991 : the Panel.join tests have now been removed, xref #25191. so there should no longer be issues with the code shared between Panel.join and DataFrame.merge with regard to the suffixes=(False, False) behaviour that was causing us problems at europandas.

charlesdong1991 · 2019-02-09T16:05:24Z

@simonjayhawkins thanks a lot!! I also noticed it when merging my branch to master!!

@jorisvandenbossche sorry that recently i was quite busy so didn't have time to work on new PR, i just created a new PR to address None/False behaviour.

* ERR/TST: Add pytest idiom to dtypes/test_cast.py (pandas-dev#24847) * fix MacPython pandas-wheels failue (pandas-dev#24851) * DEPS: Bump pyarrow min version to 0.9.0 (pandas-dev#24854) Closes pandas-devgh-24767 * DOC: Document AttributeError for accessor (pandas-dev#24855) Closes pandas-dev#20579 * Start whatsnew for 0.24.1 and 0.25.0 (pandas-dev#24848) * DEPR/API: Non-ns precision in Index constructors (pandas-dev#24806) * BUG: Format mismatch doesn't coerce to NaT (pandas-dev#24815) * BUG: Properly parse unicode usecols names in CSV (pandas-dev#24856) * CLN: fix typo in asv eval.Query suite (pandas-dev#24865) * BUG: DataFrame respects dtype with masked recarray (pandas-dev#24874) * REF/CLN: Move private method (pandas-dev#24875) * BUG : ValueError in case on NaN value in groupby columns (pandas-dev#24850) * BUG: fix floating precision formatting in presence of inf (pandas-dev#24863) * DOC: Creating top-level user guide section, and moving pages inside (pandas-dev#24677) * DOC: Creating top-level development section, and moving pages inside (pandas-dev#24691) * DOC: Creating top-level getting started section, and moving pages inside (pandas-dev#24678) * DOC: Implementing redirect system, and adding user_guide redirects (pandas-dev#24715) * DOC: Implementing redirect system, and adding user_guide redirects * Using relative urls for the redirect * Validating that no file is overwritten by a redirect * Adding redirects for getting started and development sections * DOC: fixups (pandas-dev#24888) * Fixed heading on whatnew * Remove empty scalars.rst * CLN: fix typo in ctors.SeriesDtypesConstructors setup (pandas-dev#24894) * DOC: No clean in sphinx_build (pandas-dev#24902) Closes pandas-dev#24727 * BUG (output formatting): use fixed with for truncation column instead of inferring from last column (pandas-dev#24905) * DOC: also redirect old whatsnew url (pandas-dev#24906) * Revert BUG-24212 fix usage of Index.take in pd.merge (pandas-dev#24904) * Revert BUG-24212 fix usage of Index.take in pd.merge xref pandas-dev#24733 xref pandas-dev#24897 * test 0.23.4 output * added note about buggy test * DOC: Add experimental note to DatetimeArray and TimedeltaArray (pandas-dev#24882) * DOC: Add experimental note to DatetimeArray and TimedeltaArray * Disable M8 in nanops (pandas-dev#24907) * Disable M8 in nanops Closes pandas-dev#24752 * CLN: fix typo in asv benchmark of non_unique_sorted, which was not sorted (pandas-dev#24917) * API/VIS: remove misc plotting methods from plot accessor (revert pandas-dev#23811) (pandas-dev#24912) * DOC: some 0.24.0 whatsnew clean-up (pandas-dev#24911) * DOC: Final reorganization of documentation pages (pandas-dev#24890) * DOC: Final reorganization of documentation pages * Move ecosystem to top level * DOC: Adding redirects to API moved pages (pandas-dev#24909) * DOC: Adding redirects to API moved pages * DOC: Making home page links more compact and clearer (pandas-dev#24928) * DOC: 0.24 release date (pandas-dev#24930) * DOC: Adding version to the whatsnew section in the home page (pandas-dev#24929) * API: Remove IntervalArray from top-level (pandas-dev#24926) * RLS: 0.24.0 * DEV: Start 0.25 cycle * DOC: State that we support scalars in to_numeric (pandas-dev#24944) We support it and test it already. xref pandas-devgh-24910. * DOC: Minor what's new fix (pandas-dev#24933) * TST: GH#23922 Add missing match params to pytest.raises (pandas-dev#24937) * Add tests for NaT when performing dt.to_period (pandas-dev#24921) * DOC: switch headline whatsnew to 0.25 (pandas-dev#24941) * BUG-24212 fix regression in pandas-dev#24897 (pandas-dev#24916) * CLN: reduce overhead in setup for categoricals benchmarks in asv (pandas-dev#24913) * Excel Reader Refactor - Base Class Introduction (pandas-dev#24829) * TST/REF: Add pytest idiom to test_numeric.py (pandas-dev#24946) * BLD: silence npy_no_deprecated warnings with numpy>=1.16.0 (pandas-dev#24864) * CLN: Refactor cython to use memory views (pandas-dev#24932) * DOC: Clean sort_values and sort_index docstrings (pandas-dev#24843) * STY: use pytest.raises context syntax (indexing) (pandas-dev#24960) * Fixed itertuples usage in to_dict (pandas-dev#24965) * Fixed itertuples usage in to_dict Closes pandas-dev#24940 Closes pandas-dev#24939 * STY: use pytest.raises context manager (resample) (pandas-dev#24977) * DOC: Document breaking change to read_csv (pandas-dev#24989) * DEPR: Fixed warning for implicit registration (pandas-dev#24964) * STY: use pytest.raises context manager (indexes/datetimes) (pandas-dev#24995) * DOC: move whatsnew note of pandas-dev#24916 (pandas-dev#24999) * BUG: Fix broken links (pandas-dev#25002) The previous location of contributing.rst file was /doc/source/contributing.rst but has been moved to /doc/source/development/contributing.rst * fix for BUG: grouping with tz-aware: Values falls after last bin (pandas-dev#24973) * REGR: Preserve order by default in Index.difference (pandas-dev#24967) Closes pandas-dev#24959 * CLN: do not use .repeat asv setting for storing benchmark data (pandas-dev#25015) * CLN: isort asv_bench/benchmark/algorithms.py (pandas-dev#24958) * fix+test to_timedelta('NaT', box=False) (pandas-dev#24961) * PERF: significant speedup in sparse init and ops by using numpy in check_integrity (pandas-dev#24985) * BUG: Fixed merging on tz-aware (pandas-dev#25033) * Test nested PandasArray (pandas-dev#24993) * DOC: fix error in documentation pandas-dev#24981 (pandas-dev#25038) * BUG: support dtypes in column_dtypes for to_records() (pandas-dev#24895) * Makes example from docstring work (pandas-dev#25035) * CLN: typo fixups (pandas-dev#25028) * BUG: to_datetime(strs, utc=True) used previous UTC offset (pandas-dev#25020) * BUG: Better handle larger numbers in to_numeric (pandas-dev#24956) * BUG: Better handle larger numbers in to_numeric * Warn about lossiness when passing really large numbers that exceed (u)int64 ranges. * Coerce negative numbers to float when requested instead of crashing and returning object. * Consistently parse numbers as integers / floats, even if we know that the resulting container has to be float. This is to ensure consistent error behavior when inputs numbers are too large. Closes pandas-devgh-24910. * MAINT: Address comments * BUG: avoid usage in_qtconsole for recent IPython versions (pandas-dev#25039) * Drop IPython<4.0 compat * Revert "Drop IPython<4.0 compat" This reverts commit 0cb0452. * update a * whatsnew * REGR: fix read_sql delegation for queries on MySQL/pymysql (pandas-dev#25024) * DOC: Start 0.24.2.rst (pandas-dev#25026) [ci skip] * REGR: rename_axis with None should remove axis name (pandas-dev#25069) * clarified the documentation for DF.drop_duplicates (pandas-dev#25056) * Clarification in docstring of Series.value_counts (pandas-dev#25062) * ENH: Support fold argument in Timestamp.replace (pandas-dev#25046) * CLN: to_pickle internals (pandas-dev#25044) * Implement+Test Tick.__rtruediv__ (pandas-dev#24832) * API: change Index set ops sort=True -> sort=None (pandas-dev#25063) * BUG: to_clipboard text truncated for Python 3 on Windows for UTF-16 text (pandas-dev#25040) * PERF: use new to_records() argument in to_stata() (pandas-dev#25045) * DOC: Cleanup 0.24.1 whatsnew (pandas-dev#25084) * Fix quotes position in pandas.core, typos and misspelled parameters. (pandas-dev#25093) * CLN: Remove sentinel_factory() in favor of object() (pandas-dev#25074) * TST: remove DST transition scenarios from tc pandas-dev#24689 (pandas-dev#24736) * BLD: remove spellcheck from Makefile (pandas-dev#25111) * DOC: small clean-up of 0.24.1 whatsnew (pandas-dev#25096) * DOC: small doc fix to Series.repeat (pandas-dev#25115) * TST: tests for categorical apply (pandas-dev#25095) * CLN: use dtype in constructor (pandas-dev#25098) * DOC: frame.py doctest fixing (pandas-dev#25097) * DOC: 0.24.1 release (pandas-dev#25125) [ci skip] * Revert set_index inspection/error handling for 0.24.1 (pandas-dev#25085) * DOC: Minor what's new fix (pandas-dev#24933) * Backport PR pandas-dev#24916: BUG-24212 fix regression in pandas-dev#24897 (pandas-dev#24951) * Revert "Backport PR pandas-dev#24916: BUG-24212 fix regression in pandas-dev#24897 (pandas-dev#24951)" This reverts commit 84056c5. * DOC/CLN: Timezone section in timeseries.rst (pandas-dev#24825) * DOC: Improve timezone documentation in timeseries.rst * edit some of the examples * Address review * DOC: Fix validation type error RT04 (pandas-dev#25107) (pandas-dev#25129) * Reading a HDF5 created in py2 (pandas-dev#25058) * BUG: Fixing regression in DataFrame.all and DataFrame.any with bool_only=True (pandas-dev#25102) * Removal of return variable names (pandas-dev#25123) * DOC: Improve docstring of Series.mul (pandas-dev#25136) * TST/REF: collect DataFrame reduction tests (pandas-dev#24914) * Fix validation error type `SS05` and check in CI (pandas-dev#25133) * Fixed tuple to List Conversion in Dataframe class (pandas-dev#25089) * STY: use pytest.raises context manager (indexes/multi) (pandas-dev#25175) * DOC: Updates to Timestamp document (pandas-dev#25163) * BLD: pin cython language level to '2' (pandas-dev#25145) Not explicitly pinning the language level has been producing future warnings from cython. The next release of cython is going to change the default level to '3str' under which the pandas cython extensions do not compile. The long term solution is to update the cython files to the next language level, but this is a stop-gap to keep pandas building. * CLN: Use ABCs in set_index (pandas-dev#25128) * DOC: update docstring for series.nunique (pandas-dev#25116) * DEPR: remove PanelGroupBy, disable DataFrame.to_panel (pandas-dev#25047) * BUG: DataFrame.merge(suffixes=) does not respect None (pandas-dev#24819) * fix MacPython pandas-wheels failure (pandas-dev#25186) * modernize compat imports (pandas-dev#25192) * TST: follow-up to Test nested pandas array pandas-dev#24993 (pandas-dev#25155) * revert changes to tests in pandas-devgh-24993 * Test nested PandasArray * isort test_numpy.py * change NP_VERSION_INFO * use LooseVersion * add _np_version_under1p16 * remove blank line from merge master * add doctstrings to fixtures * DOC/CLN: Fix errors in Series docstrings (pandas-dev#24945) * REF: Add more pytest idiom to test_holiday.py (pandas-dev#25204) * DOC: Fix validation type error SA05 (pandas-dev#25208) Create check for SA05 errors in CI * BUG: Fix Series.is_unique with single occurrence of NaN (pandas-dev#25182) * REF: Remove many Panel tests (pandas-dev#25191) * DOC: Fixes to docstrings and add PR10 (space before colon) to validation (pandas-dev#25109) * DOC: exclude autogenerated c/cpp/html files from 'trailing whitespace' checks (pandas-dev#24549) * STY: use pytest.raises context manager (indexes/period) (pandas-dev#25199) * fix ci failures (pandas-dev#25225) * DEPR: remove tm.makePanel and all usages (pandas-dev#25231) * DEPR: Remove Panel-specific parts of io.pytables (pandas-dev#25233) * DEPR: Add Deprecated warning for timedelta with passed units M and Y (pandas-dev#23264) * BUG-25061 fix printing indices with NaNs (pandas-dev#25202) * BUG: Fix regression in DataFrame.apply causing RecursionError (pandas-dev#25230) * BUG: Fix regression in DataFrame.apply causing RecursionError * Add feedback from PR * Add feedback after further code review * Add feedback after further code review 2 * BUG: Fix read_json orient='table' without index (pandas-dev#25170) (pandas-dev#25171) * BLD: prevent asv from calling sys.stdin.close() by using different launch method (pandas-dev#25237) * (Closes pandas-dev#25029) Removed extra bracket from cheatsheet code example. (pandas-dev#25032) * CLN: For loops, boolean conditions, misc. (pandas-dev#25206) * Refactor groupby group_add from tempita to fused types (pandas-dev#24954) * CLN: Remove ipython 2.x compat (pandas-dev#25150) * CLN: Remove ipython 2.x compat * trivial change to trigger asv * Update v0.25.0.rst * revert whatsnew * BUG: Duplicated returns boolean dataframe (pandas-dev#25234) * REF/TST: resample/test_base.py (pandas-dev#25262) * Revert "BLD: prevent asv from calling sys.stdin.close() by using different launch method (pandas-dev#25237)" (pandas-dev#25253) This reverts commit f67b7fd. * BUG: pandas Timestamp tz_localize and tz_convert do not preserve `freq` attribute (pandas-dev#25247) * DEPR: remove assert_panel_equal (pandas-dev#25238) * PR04 errors fix (pandas-dev#25157) * Split Excel IO Into Sub-Directory (pandas-dev#25153) * API: Ensure DatetimeTZDtype standardizes pytz timezones (pandas-dev#25254) * API: Ensure DatetimeTZDtype standardizes pytz timezones * Add whatsnew * BUG: Fix exceptions when Series.interpolate's `order` parameter is missing or invalid (pandas-dev#25246) * BUG: raise accurate exception from Series.interpolate (pandas-dev#24014) * Actually validate `order` before use in spline * Remove unnecessary check and dead code * Clean up comparison/tests based on feedback * Include invalid order value in exception * Check for NaN order in spline validation * Add whatsnew entry for bug fix * CLN: Make unit tests assert one error at a time * CLN: break test into distinct test case * PEP8 fix in test module * CLN: Test fixture for interpolate methods * BUG: DataFrame.join on tz-aware DatetimeIndex (pandas-dev#25260) * REF: use _constructor and ABCFoo to avoid runtime imports (pandas-dev#25272) * Refactor groupby group_prod, group_var, group_mean, group_ohlc (pandas-dev#25249) * Fix typo in Cheat sheet with regex (pandas-dev#25215) * Edit parameter type in pandas.core.frame.py DataFrame.count (pandas-dev#25198) * TST/CLN: remove test_slice_ints_with_floats_raises (pandas-dev#25277) * Removed Panel class from HDF ASVs (pandas-dev#25281) * DOC: Fix minor typo in docstring (pandas-dev#25285) * DOC/CLN: Fix errors in DataFrame docstrings (pandas-dev#24952) * Skipped broken Py2 / Windows test (pandas-dev#25323) * Rt05 documentation error fix issue 25108 (pandas-dev#25309) * Fix typos in docs (pandas-dev#25305) * Doc: corrects spelling in generic.py (pandas-dev#25333) * BUG: groupby.transform retains timezone information (pandas-dev#25264) * Fixes Formatting Exception (pandas-dev#25088) * Bug: OverflowError in resample.agg with tz data (pandas-dev#25297) * DOC/CLN: Fix various docstring errors (pandas-dev#25295) * COMPAT: alias .to_numpy() for timestamp and timedelta scalars (pandas-dev#25142) * ENH: Support times with timezones in at_time (pandas-dev#25280) * BUG: Fix passing of numeric_only argument for categorical reduce (pandas-dev#25304) * TST: use a fixed seed to have the same uniques across python versions (pandas-dev#25346) TST: add pytest-mock to handle mocker fixture * TST: xfail excel styler tests, xref GH25351 (pandas-dev#25352) * TST: xfail excel styler tests, xref GH25351 * CI: cleanup .c files for cpplint>1.4 * DOC: Correct doc mistake in combiner func (pandas-dev#25360) Closes pandas-devgh-25359. * DOC/BLD: fix --no-api option (pandas-dev#25209) * DOC: modify typos in Contributing section (pandas-dev#25365) * Remove spurious MultiIndex creation in `_set_axis_name` (pandas-dev#25371) * Resovles pandas-dev#25370 * Introduced by pandas-dev#22969 * pandas-dev#23049: test for Fatal Stack Overflow stemming From Misuse of astype('category') (pandas-dev#25366) * 9236: test for the DataFrame.groupby with MultiIndex having pd.NaT (pandas-dev#25310) * [BUG] exception handling of MultiIndex.__contains__ too narrow (pandas-dev#25268) * 14873: test for groupby.agg coercing booleans (pandas-dev#25327) * BUG/ENH: Timestamp.strptime (pandas-dev#25124) * BUG: constructor Timestamp.strptime() does not support %z. * Add doc string to NaT and Timestamp * updated the error message * Updated whatsnew entry. * Interval dtype fix (pandas-dev#25338) * [CLN] Excel Module Cleanups (pandas-dev#25275) Closes pandas-devgh-25153 Authored-By: tdamsma <tdamsma@gmail.com> * ENH: indexing and __getitem__ of dataframe and series accept zerodim integer np.array as int (pandas-dev#24924) * REGR: fix TimedeltaIndex sum and datetime subtraction with NaT (pandas-dev#25282, pandas-dev#25317) (pandas-dev#25329) * edited whatsnew typo (pandas-dev#25381) * fix typo of see also in DataFrame stat funcs (pandas-dev#25388) * API: more consistent error message for MultiIndex.from_arrays (pandas-dev#25189) * CLN: (re-)enable infer_dtype to catch complex (pandas-dev#25382) * DOC: Edited docstring of Interval (pandas-dev#25410) The docstring contained a repeated segment, which I removed. * Mark test_pct_max_many_rows as high memory (pandas-dev#25400) Fixes issue pandas-dev#25384 * Correct a typo of version number for interpolate() (pandas-dev#25418) * DEP: add pytest-mock to environment.yml (pandas-dev#25417) * BUG: Fix type coercion in read_json orient='table' (pandas-dev#21345) (pandas-dev#25219) * ERR: doc update for ParsingError (pandas-dev#25414) Closes pandas-devgh-22881 * ENH: Add in sort keyword to DatetimeIndex.union (pandas-dev#25110) * DOC: Rewriting of ParserError doc + minor spacing (pandas-dev#25421) Follow-up to pandas-devgh-25414. * API/ERR: allow iterators in df.set_index & improve errors (pandas-dev#24984) * BUG: Indexing with UTC offset string no longer ignored (pandas-dev#25263) * PERF/REF: improve performance of Series.searchsorted, PandasArray.searchsorted, collect functionality (pandas-dev#22034) * TST: remove never-used singleton fixtures (pandas-dev#24885) * BUG: fixed merging with empty frame containing an Int64 column (pandas-dev#25183) (pandas-dev#25289) * DOC: fixed geo accessor example in extending.rst (pandas-dev#25420) I realised "lon" and "lat" had just been switched with "longitude" and "latitude" in the following code block. So I used those names here as well. * TST: numpy RuntimeWarning with Series.round() (pandas-dev#25432) * CI: add __init__.py to isort skip list (pandas-dev#25455) * DOC: CategoricalIndex doc string (pandas-dev#24852) * DataFrame.drop Raises KeyError definition (pandas-dev#25474) * BUG: Keep column level name in resample nunique (pandas-dev#25469) Closes pandas-devgh-23222 xref pandas-devgh-23645 * ERR: Correct error message in to_datetime (pandas-dev#25467) * ERR: Correct error message in to_datetime Closes pandas-devgh-23830 xref pandas-devgh-23969 * Fix minor typo (pandas-dev#25458) Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com> * CI: Set pytest minversion to 4.0.2 (pandas-dev#25402) * CI: Set pytest minversion to 4.0.2 * STY: use pytest.raises context manager (indexes) (pandas-dev#25447) * STY: use pytest.raises context manager (tests/test_*) (pandas-dev#25452) * STY: use pytest.raises context manager (tests/test_*) * fix ci failures * skip py2 ci failure * Fix minor error in dynamic load function (pandas-dev#25256) * Cythonized GroupBy Quantile (pandas-dev#20405) * BUG: Fix regression on DataFrame.replace for regex (pandas-dev#25266) * BUG: Fix regression on DataFrame.replace for regex The commit ensures that the replacement for regex is not confined to the beginning of the string but spans all the characters within. The behaviour is then consistent with versions prior to 0.24.0. One test has been added to account for character replacement when the character is not at the beginning of the string. * Correct contribution guide docbuild instruction (pandas-dev#25479) * TST/REF: Add pytest idiom to test_frequencies.py (pandas-dev#25430) * BUG: Fix index type casting in read_json with orient='table' and float index (pandas-dev#25433) (pandas-dev#25434) * BUG: Groupby.agg with reduction function with tz aware data (pandas-dev#25308) * BUG: Groupby.agg cannot reduce with tz aware data * Handle output always as UTC * Add whatsnew * isort and add another fixed groupby.first/last issue * bring condition at a higher level * Add try for _try_cast * Add comments * Don't pass the utc_dtype explicitly * Remove unused import * Use string dtype instead * DOC: Fix docstring for read_sql_table (pandas-dev#25465) * ENH: Add Series.str.casefold (pandas-dev#25419) * Fix PR10 error and Clean up docstrings from functions related to RT05 errors (pandas-dev#25132) * Fix unreliable test (pandas-dev#25496) * DOC: Clarifying doc/make.py --single parameter (pandas-dev#25482) * fix MacPython / pandas-wheels ci failures (pandas-dev#25505) * DOC: Reword Series.interpolate docstring for clarity (pandas-dev#25491) * Changed insertion order to sys.path (pandas-dev#25486) * TST: xfail non-writeable pytables tests with numpy 1.16x (pandas-dev#25517) * STY: use pytest.raises context manager (arithmetic, arrays, computati… (pandas-dev#25504) * BUG: Fix RecursionError during IntervalTree construction (pandas-dev#25498) * STY: use pytest.raises context manager (plotting, reductions, scalar...) (pandas-dev#25483) * STY: use pytest.raises context manager (plotting, reductions, scalar...) * revert removed testing in test_timedelta.py * remove TODO from test_frame.py * skip py2 ci failure * BUG: Fix potential segfault after pd.Categorical(pd.Series(...), categories=...) (pandas-dev#25368) * Make DataFrame.to_html output full content (pandas-dev#24841) * BUG-16807-1 SparseFrame fills with default_fill_value if data is None (pandas-dev#24842) Closes pandas-devgh-16807. * DOC: Add conda uninstall pandas to contributing guide (pandas-dev#25490) * fix pandas-dev#25487 add modify documentation * fix segfault when running with cython coverage enabled, xref cython#2879 (pandas-dev#25529) * TST: inline empty_frame = DataFrame({}) fixture (pandas-dev#24886) * DOC: Polishing typos out of doc/source/user_guide/indexing.rst (pandas-dev#25528) * STY: use pytest.raises context manager (frame) (pandas-dev#25516) * DOC: Fix pandas-dev#24268 by updating description for keep in Series.nlargest (pandas-dev#25358) * DOC: Fix pandas-dev#24268 by updating description for keep * fix MacPython / pandas-wheels ci failures (pandas-dev#25537) * TST/CLN: Remove more Panel tests (pandas-dev#25550) * BUG: caught typeError in series.at (pandas-dev#25506) (pandas-dev#25533) * ENH: Add errors parameter to DataFrame.rename (pandas-dev#25535) * ENH: GH13473 Add errors parameter to DataFrame.rename * TST: Skip IntervalTree construction overflow test on 32bit (pandas-dev#25558) * DOC: Small fixes to 0.24.2 whatsnew (pandas-dev#25559) * minor typo error (pandas-dev#25574) * BUG: in error message raised when invalid axis parameter (pandas-dev#25553) * BLD: Fixed pip install with no numpy (pandas-dev#25568) * Document the behavior of `axis=None` with `style.background_gradient` (pandas-dev#25551) * fix minor typos in dsintro.rst (pandas-dev#25579) * BUG: Handle readonly arrays in period_array (pandas-dev#25556) * BUG: Handle readonly arrays in period_array Closes pandas-dev#25403 * DOC: Fix typo in tz_localize (pandas-dev#25598) * BUG: secondary y axis could not be set to log scale (pandas-dev#25545) (pandas-dev#25586) * TST: add test for groupby on list of empty list (pandas-dev#25589) * TYPING: Small fixes to make stubgen happy (pandas-dev#25576) * CLN: Parmeterize test cases (pandas-dev#25355)

jschendel reviewed Jan 17, 2019

View reviewed changes

pandas/core/reshape/merge.py Outdated Show resolved Hide resolved

pandas/tests/reshape/merge/test_merge.py Show resolved Hide resolved

pandas/tests/reshape/merge/test_merge.py Outdated Show resolved Hide resolved

jschendel added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jan 17, 2019

jreback requested changes Jan 18, 2019

View reviewed changes

TomAugspurger reviewed Jan 18, 2019

View reviewed changes

simonjayhawkins reviewed Jan 20, 2019

View reviewed changes

TomAugspurger reviewed Jan 21, 2019

View reviewed changes

simonjayhawkins reviewed Jan 21, 2019

View reviewed changes

jschendel reviewed Jan 21, 2019

View reviewed changes

pandas/core/internals/managers.py Outdated Show resolved Hide resolved

jschendel reviewed Jan 21, 2019

View reviewed changes

pandas/core/internals/managers.py Outdated Show resolved Hide resolved

simonjayhawkins reviewed Jan 22, 2019

View reviewed changes

charlesdong1991 force-pushed the bug_suffix_none branch from d44cdf0 to ed91045 Compare January 27, 2019 11:26

one in all

82c52a4

charlesdong1991 force-pushed the bug_suffix_none branch from 583c722 to 82c52a4 Compare February 2, 2019 16:15

charlesdong1991 added 5 commits February 2, 2019 16:16

change whatsnew to 0.25

dd605e0

add back other comment

af7f9ad

double check test

4d5e1a9

changes based on discussion

3f65bf1

changes based on discussion with joris

ce7e4b8

slight change

90ca9cd

jorisvandenbossche reviewed Feb 3, 2019

View reviewed changes

slight changes

e995a04

jreback requested changes Feb 4, 2019

View reviewed changes

changes based on jeff review

9c3dfbd

charlesdong1991 added 2 commits February 4, 2019 17:34

fix test error

441e9a5

ci fail, try again

71729b2

jreback added this to the 0.25.0 milestone Feb 6, 2019

jreback approved these changes Feb 6, 2019

View reviewed changes

jreback merged commit 09633b8 into pandas-dev:master Feb 6, 2019

charlesdong1991 mentioned this pull request Feb 9, 2019

ENH: accept None behaviour for suffixes in DataFrame.merge #25242

Closed

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

BUG: DataFrame.merge(suffixes=) does not respect None (pandas-dev#24819)

4953e66

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

BUG: DataFrame.merge(suffixes=) does not respect None (pandas-dev#24819)

8dea933

simonjayhawkins mentioned this pull request May 16, 2022

Merge nonstring columns #46879

Closed

4 tasks

		@@ -488,7 +488,13 @@ def __init__(self, left, right, how='inner', on=None,
		self.right_on = com.maybe_make_list(right_on)

		self.copy = copy

	def items_overlap_with_suffix(left, lsuffix, right, rsuffix):
	"""
	If two indices overlap, add suffixes to overlapping entries.

	If corresponding suffix is empty, the entry is simply converted to string.

	"""
	to_rename = left.intersection(right)
	if len(to_rename) == 0:
	return left, right
	else:
	if not lsuffix and not rsuffix:
	raise ValueError('columns overlap but no suffix specified: '
	'{rename}'.format(rename=to_rename))

	def lrenamer(x):
	if x in to_rename:
	return '{x}{lsuffix}'.format(x=x, lsuffix=lsuffix)
	return x

	def rrenamer(x):
	if x in to_rename:
	return '{x}{rsuffix}'.format(x=x, rsuffix=rsuffix)
	return x

	return (_transform_index(left, lrenamer),
	_transform_index(right, rrenamer))

BUG: DataFrame.merge(suffixes=) does not respect None #24819

BUG: DataFrame.merge(suffixes=) does not respect None #24819

Conversation

charlesdong1991 commented Jan 17, 2019

pep8speaks commented Jan 17, 2019 • edited Loading

Comment last updated on February 04, 2019 at 17:16 Hours UTC

codecov bot commented Jan 17, 2019

Codecov Report

codecov bot commented Jan 17, 2019 • edited Loading

Codecov Report

jschendel left a comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jan 18, 2019

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonjayhawkins commented Jan 20, 2019

Choose a reason for hiding this comment

charlesdong1991 Jan 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

toobaz Feb 2, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charlesdong1991 Jan 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charlesdong1991 commented Jan 27, 2019 • edited Loading

jreback commented Feb 1, 2019

charlesdong1991 commented Feb 3, 2019 • edited Loading

jorisvandenbossche commented Feb 3, 2019

charlesdong1991 commented Feb 3, 2019 • edited Loading

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Feb 4, 2019 • edited Loading

charlesdong1991 commented Feb 4, 2019

jreback commented Feb 6, 2019

jorisvandenbossche commented Feb 6, 2019

charlesdong1991 commented Feb 6, 2019

simonjayhawkins commented Feb 9, 2019

charlesdong1991 commented Feb 9, 2019

pep8speaks commented Jan 17, 2019 •

edited

Loading

codecov bot commented Jan 17, 2019 •

edited

Loading

charlesdong1991 Jan 21, 2019 •

edited

Loading

toobaz Feb 2, 2019 •

edited

Loading

charlesdong1991 Jan 22, 2019 •

edited

Loading

charlesdong1991 commented Jan 27, 2019 •

edited

Loading

charlesdong1991 commented Feb 3, 2019 •

edited

Loading

charlesdong1991 commented Feb 3, 2019 •

edited

Loading

jreback commented Feb 4, 2019 •

edited

Loading