-
Notifications
You must be signed in to change notification settings - Fork 502
Escape user-provided text passed to regex #571
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works beautifully
837c2f7
to
422a785
Compare
Took the liberty to fix the pycodestyle errors shown in travis. It was just line length. That whole matching section gives me shivers, though. Every if seems written in a different style and I don't get why we have to |
We still have test failures with these patches. Going to take a look. |
|
diff --git a/src/documents/models.py b/src/documents/models.py
index 7d12f91..f368f31 100644
--- a/src/documents/models.py
+++ b/src/documents/models.py
@@ -98,15 +98,14 @@ class MatchingModel(models.Model):
if self.matching_algorithm == self.MATCH_ALL:
for word in self._split_match():
search_result = re.search(
- r"\b{}\b".format(re.escape(word)), text, **search_kwargs)
+ r"\b{}\b".format(word), text, **search_kwargs)
if not search_result:
return False
return True
if self.matching_algorithm == self.MATCH_ANY:
for word in self._split_match():
- if re.search(r"\b{}\b".format(re.escape(word)), text,
- **search_kwargs):
+ if re.search(r"\b{}\b".format(word), text, **search_kwargs):
return True
return False
@@ -142,7 +141,7 @@ class MatchingModel(models.Model):
findterms = re.compile(r'"([^"]+)"|(\S+)').findall
normspace = re.compile(r"\s+").sub
return [
- normspace(" ", (t[0] or t[1]).strip()).replace(" ", r"\s+")
+ re.escape(normspace(" ", (t[0] or t[1]).strip())).replace(r"\ ", r"\s+")
for t in findterms(self.match)
]
This makes it work but is pretty much unreadable. |
I'd prefer something like this. Opinions? findterms = re.compile(r'"([^"]+)"|(\S+)').findall
normspace = re.compile(r"\s+").sub
# find all terms and replace multiple spaces with a single one
terms = [normspace(" ", (t[0] or t[1]).strip())
for t in findterms(self.match)]
# escape each term so we don't have regexes where we don't want them.
# This escapes spaces, too.
terms = [re.escape(t).replace(r"\ ", "\s+")
for t in terms]
return terms |
3e6d177
to
989af90
Compare
Rather than using the user/document-provided values directly, we instead escape them to use them verbatim. This fixes issue #568.
989af90
to
359e236
Compare
@MasterofJOKers sorry I didn't react after your review, thank you very much for analyzing this. I have applied pretty much exactly your solution to the problem, it seems like the simplest fix while still being correct! 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really liked the comments in my version though ;)
Rather than using the user/document-provided values directly, we instead escape them to use them verbatim.
This fixes issue 568.
I only checked the
models.py
file for occurrences of unescaped regex-"injections". There might be more across the project.