Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Deprecate regex=True default in Series.str.replace #36695

Merged
merged 45 commits into from
Oct 10, 2020
Merged

API: Deprecate regex=True default in Series.str.replace #36695

merged 45 commits into from
Oct 10, 2020

Conversation

dsaxton
Copy link
Member

@dsaxton dsaxton commented Sep 28, 2020

From some old discussion it looks like there was interest in changing the default value of regex from True to False within Series.str.replace. I think this makes sense since it would align this method with others within pandas along with the standard library, and would also make fixing #24804 slightly less disruptive. I'm assuming this would warrant a deprecation note in the next whatsnew?

I think this part of the docs will need to be updated: https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html#splitting-and-replacing-strings

#24809

@dsaxton dsaxton added replace replace method API - Consistency Internal Consistency of API/Behavior labels Sep 28, 2020
@jorisvandenbossche
Copy link
Member

There are many tests that are using this (and are now raising a warning) that would need updating.

While I agree that a default of regex=False would probably be better ideally, warning about this in all cases also seems quite annoying ..
Would it somehow be possible to detect in advance if the pattern is regex-like or rather a plain string (in which case replacing it with regex=False would not give a different result). Probably not that straightforward given the complexity of regexes .. (but maybe some simple cases, like only characters, could be detected)

@dsaxton
Copy link
Member Author

dsaxton commented Sep 28, 2020

There are many tests that are using this (and are now raising a warning) that would need updating.

While I agree that a default of regex=False would probably be better ideally, warning about this in all cases also seems quite annoying ..
Would it somehow be possible to detect in advance if the pattern is regex-like or rather a plain string (in which case replacing it with regex=False would not give a different result). Probably not that straightforward given the complexity of regexes .. (but maybe some simple cases, like only characters, could be detected)

Yeah this is a good point, I think looking for regex special characters and only then warning could make the most sense

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a whatsnew note?

pandas/core/strings.py Outdated Show resolved Hide resolved
doc/source/user_guide/text.rst Outdated Show resolved Hide resolved
@dsaxton
Copy link
Member Author

dsaxton commented Sep 29, 2020

Can you also add a whatsnew note?

Would this be an update for 1.1.4 or 1.2?

@dsaxton
Copy link
Member Author

dsaxton commented Oct 7, 2020

yes, you are excluding them from the warning now though; can we provide a more specific warning message for single char regexes?

Added an additional warning for single character regexes:

[ins] In [3]: s.str.replace(".", "")
<ipython-input-3-ee1a585145f8>:1: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.
  s.str.replace(".", "")
Out[3]: 
0    a
1    b
dtype: object

[ins] In [4]: s.str.replace("^.", "")
<ipython-input-4-3940e91d28a6>:1: FutureWarning: The default value of regex will change from True to False in a future version.
  s.str.replace("^.", "")
Out[4]: 
0    
1    
dtype: object

@dsaxton
Copy link
Member Author

dsaxton commented Oct 7, 2020

looks good. doc comment & pls add a deprecation note in the whatsnew (also add to the deprecation master issue) #30228

Updated the OP in that issue

Some caution must be taken to keep regular expressions in mind! For example, the
following code will cause trouble because of the regular expression meaning of
``$``:
Some caution must be taken when dealing with regular expressions! The current behavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this

Some caution must be taken to keep regular expressions in mind! For example, the
following code will cause trouble because of the regular expression meaning of
``$``:
Some caution must be taken when dealing with regular expressions! The current behavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add a versionchanged tag

doc/source/user_guide/text.rst Outdated Show resolved Hide resolved

def test_str_replace_regex_default_raises_warning(self):
# https://github.com/pandas-dev/pandas/pull/24809
s = pd.Series(["a", "b", "c"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check the messages on this warning

if isinstance(pat, str) and any(c in pat for c in ".+*|^$?[](){}\\"):
# warn only in cases where regex behavior would differ from literal
msg = (
"The default value of regex will change from True to False "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an you make sure that we are testing both of these warnings (as i see you only added a single test)

Some caution must be taken to keep regular expressions in mind! For example, the
following code will cause trouble because of the regular expression meaning of
``$``:
Some caution must be taken when dealing with regular expressions! The current behavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no i mean a ..note:: a sphinx-note (which puts a highlite box around this)

Some caution must be taken to keep regular expressions in mind! For example, the
following code will cause trouble because of the regular expression meaning of
``$``:
Some caution must be taken when dealing with regular expressions! The current behavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also do a ..warning:: which is a red-box and more prominent, but either way want to call out this

doc/source/user_guide/text.rst Show resolved Hide resolved
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice


# This does what you'd naively expect:
dollars.str.replace("$", "")
.. warning::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great

@jreback
Copy link
Contributor

jreback commented Oct 10, 2020

merge on most builds green (some are giving issues now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior replace replace method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants