API: change default behaviour of str.match from deprecated extract to match (GH5224) #15257

jorisvandenbossche · 2017-01-29T21:48:52Z

I just stumbled on this, and seems we didn't have this in our deprecations to do list (#6581).

This PR changes the default behaviour of str.match from extracting groups to just a match (True/False). The previous default behaviour was deprecated since 0.13.0 (#5224)

jreback · 2017-01-29T21:51:19Z

pandas/core/strings.py

@@ -444,60 +442,33 @@ def str_match(arr, pat, case=True, flags=0, na=np.nan, as_indexer=False):
    flags : int, default 0 (no flags)
        re module flags, e.g. re.IGNORECASE
    na : default NaN, fill value for missing values.
-    as_indexer : False, by default, gives deprecated behavior better achieved
-        using str_extract. True return boolean indexer.
+    as_indexer : ignored


i would just take this out

alternatively accept kwargs and raise of the kw is passed ;to be helpful) ; but that is more complicated

Yes, I was not sure what to do with the keyword:

remove -> but this will give a lot of errors, as up to now you had to specify the keyword to get the right behaviour (if you had groups in the regex)

raise warning -> changed default to None, and when user specifies it, raise a warning to say it is ignored

just ignore -> but then there is potential confusion on why it does nothing + people will keep specifying it although not needed

For now I choose the 'raise warning' option. The 'just ignore' would cause the less impact, but is also less informative.

Actually, maybe I should give a more specific explanation when as_indexer=False (the previous default behaviour) if people would have specified this explicitly, because in those cases there is actually a breaking change in behaviour.

it has been deprecated for quite some time
i think taking it out is fine (and as i said i can still capture kwargs so you have a nice message), but it's not listed in tab completion (nor doc string) that way

codecov-io · 2017-01-30T10:51:53Z

Codecov Report

Merging #15257 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15257      +/-   ##
==========================================
- Coverage   91.02%   90.99%   -0.03%     
==========================================
  Files         143      143              
  Lines       49403    49396       -7     
==========================================
- Hits        44967    44950      -17     
- Misses       4436     4446      +10

Impacted Files	Coverage Δ
pandas/core/strings.py	`98.48% <100%> (-0.02%)`	⬇️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/common.py	`90.96% <0%> (-0.34%)`	⬇️
pandas/core/frame.py	`97.86% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb7af6e...0ab36b6. Read the comment docs.

… match (GH5224)

jorisvandenbossche · 2017-03-22T13:39:17Z

Rebased this (and remove the "as_indexer : ignored" from the docstring).
@jreback I can't remember if there were others things left to do, but this seems good to go to me.

jreback · 2017-03-22T13:49:09Z

pandas/core/strings.py

@@ -464,11 +464,9 @@ def rep(x, r):
        return result


-def str_match(arr, pat, case=True, flags=0, na=np.nan, as_indexer=False):
+def str_match(arr, pat, case=True, flags=0, na=np.nan, as_indexer=None):


shouldn't you take this arg out?

No, this keyword needs to stay, because it was how people could specify the 'new' behaviour before (although we said we would change this in 0.14, we never did).
So all people still using match are probably specifying this keyword, AFAIU.

See the removed warning from the documentation in the diff for some context.

In principle we could make it a FutureWarning instead of UserWarning, so we can remove it later on.

ok, this should have been changed a long time ago. no reason to keep a dead API around.

and change to FutureWarning. can remove in next major version.

no reason to keep a dead API around.

To be clear, this is no dead API. Although it is ignored after this PR, everybody using this function uses that keyword.
So I certainly won't raise (FutureWarning is fine, probably even better as UserWarning anyway)

well its going to be removed. So should for sure use FutureWarning. UserWarning is pretty useless as a warning IMHO. (not that FutureWarning is much better but at least signals that we are going to remove it).

jreback · 2017-03-22T13:49:31Z

pandas/core/strings.py

-    if as_indexer and regex.groups > 0:
-        warnings.warn("This pattern has match groups. To actually get the"
-                      " groups, use str.extract.", UserWarning, stacklevel=3)
+    if (as_indexer is False) and (regex.groups > 0):


why aren't you taking this out?

jreback · 2017-03-22T14:07:50Z

pandas/core/strings.py

+        # Previously, this keyword was used for changing the default but
+        # deprecated behaviour. This keyword is now no longer needed.
+        warnings.warn("'as_indexer' keyword was specified but will be ignored;"
+                      " match now returns a boolean indexer by default.",


should for sure be a FutureWarning. to be honest I would just raise. really no reason to continue supporting this. but if you want to make for 1 more cycle ok too.

jreback · 2017-03-22T18:51:12Z

thanks!

… match (GH5224) This PR changes the default behaviour of `str.match` from extracting groups to just a match (True/False). The previous default behaviour was deprecated since 0.13.0 (pandas-dev#5224) Author: Joris Van den Bossche <jorisvandenbossche@gmail.com> Closes pandas-dev#15257 from jorisvandenbossche/str-match and squashes the following commits: 0ab36b6 [Joris Van den Bossche] Raise FutureWarning instead of UserWarning for as_indexer a2bae51 [Joris Van den Bossche] raise error in case of regex with groups and as_indexer=False 87446c3 [Joris Van den Bossche] fix test 0788de2 [Joris Van den Bossche] API: change default behaviour of str.match from deprecated extract to match (GH5224)

jorisvandenbossche added Deprecate Functionality to remove in pandas Strings String extension data type and string data labels Jan 29, 2017

jorisvandenbossche added this to the 0.20.0 milestone Jan 29, 2017

jreback requested changes Jan 29, 2017

View reviewed changes

jorisvandenbossche added 3 commits March 22, 2017 14:32

API: change default behaviour of str.match from deprecated extract to…

0788de2

… match (GH5224)

fix test

87446c3

raise error in case of regex with groups and as_indexer=False

a2bae51

jorisvandenbossche force-pushed the str-match branch from 821f128 to a2bae51 Compare March 22, 2017 13:38

jreback requested changes Mar 22, 2017

View reviewed changes

jreback reviewed Mar 22, 2017

View reviewed changes

Raise FutureWarning instead of UserWarning for as_indexer

0ab36b6

jsexauer mentioned this pull request Mar 22, 2017

DEPR: Clean up list of deprecations from prior versions #6581

Closed

1 task

jreback approved these changes Mar 22, 2017

View reviewed changes

jreback closed this in 94720d9 Mar 22, 2017

jreback mentioned this pull request Oct 27, 2018

DEPR: deprecations log for removed issues #13777

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: change default behaviour of str.match from deprecated extract to match (GH5224) #15257

API: change default behaviour of str.match from deprecated extract to match (GH5224) #15257

jorisvandenbossche commented Jan 29, 2017

jreback Jan 29, 2017

jorisvandenbossche Jan 29, 2017

jorisvandenbossche Jan 29, 2017

jreback Jan 29, 2017

codecov-io commented Jan 30, 2017 •

edited by codecov bot

Loading

jorisvandenbossche commented Mar 22, 2017

jreback Mar 22, 2017

jorisvandenbossche Mar 22, 2017

jreback Mar 22, 2017

jorisvandenbossche Mar 22, 2017

jreback Mar 22, 2017

jreback Mar 22, 2017

jreback Mar 22, 2017

jreback commented Mar 22, 2017

API: change default behaviour of str.match from deprecated extract to match (GH5224) #15257

API: change default behaviour of str.match from deprecated extract to match (GH5224) #15257

Conversation

jorisvandenbossche commented Jan 29, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Jan 30, 2017 • edited by codecov bot Loading

Codecov Report

jorisvandenbossche commented Mar 22, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Mar 22, 2017

codecov-io commented Jan 30, 2017 •

edited by codecov bot

Loading