-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
re.IGNORECASE does not match literal "_" (underscore) #56156
Comments
Regular expressions which are written match literal underscores ("_", ASCII The following session log shows the problem: Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> subject = "[Conclave-Mendoi]_ef_-_a_tale_of_memories_00-12_H264"
>>> print subject.encode("base64") # Incase my environment encoding is to blame
W0NvbmNsYXZlLU1lbmRvaV1fZWZfLV9hX3RhbGVfb2ZfbWVtb3JpZXNfMDAtMTJfSDI2NA==
>>> re.sub("_", "X", subject) # No flags, does what I expect
'[Conclave-Mendoi]XefX-XaXtaleXofXmemoriesX00-12XH264'
>>>
>>> re.sub("_", "X", subject, re.IGNORECASE) # Misses some matches
'[Conclave-Mendoi]XefX-_a_tale_of_memories_00-12_H264'
>>>
>>> re.sub("_", "X", subject, re.IGNORECASE | re.LOCALE) # Misses fewer matches
'[Conclave-Mendoi]XefX-XaXtaleXofXmemories_00-12_H264'
>>>
>>> re.sub("_", "X", subject, re.IGNORECASE | re.LOCALE | re.UNICODE) # Works OK
'[Conclave-Mendoi]XefX-XaXtaleXofXmemoriesX00-12XH264'
>>>
>>> re.sub("_", "X", subject, re.IGNORECASE | re.UNICODE) # Works OK
'[Conclave-Mendoi]XefX-XaXtaleXofXmemoriesX00-12XH264'
>>>
>>> type(subject) # Don't think this is a unicode string
<type 'str'>
>>> Since my |
help(re.sub) says: sub(pattern, repl, string, count=0) and re.IGNORECASE has a value of 2. Therefore this: re.sub("_", "X", subject, re.IGNORECASE) is telling it to replace at most 2 occurrences of "_". |
Closing as invalid. |
I don't know how much code that might break. It might not be that much; I can't remember when I last used re.sub without the default count. |
Oh, that's embarrassing. :-) Could a type-check be used to alert the user to their mistake? I suppose that would require re.IGNORECASE (et al) to be of some new type (presumably sub-classed from Integer). (Thanks for the quick response, and sorry to waste your time) |
See also bpo-11957. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: