Skip to content
This repository has been archived by the owner on Feb 19, 2021. It is now read-only.

Escape user-provided text passed to regex #571

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pitkley
Copy link
Member

@pitkley pitkley commented Oct 6, 2019

Rather than using the user/document-provided values directly, we instead escape them to use them verbatim.

This fixes issue 568.


I only checked the models.py file for occurrences of unescaped regex-"injections". There might be more across the project.

kj21
kj21 previously approved these changes Oct 6, 2019
Copy link

@kj21 kj21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works beautifully

sbrunner
sbrunner previously approved these changes Oct 7, 2019
@MasterofJOKers
Copy link
Contributor

Took the liberty to fix the pycodestyle errors shown in travis. It was just line length.

That whole matching section gives me shivers, though. Every if seems written in a different style and I don't get why we have to re.compile() the match before passing it to re.search().

MasterofJOKers
MasterofJOKers previously approved these changes Nov 2, 2019
@MasterofJOKers
Copy link
Contributor

We still have test failures with these patches. Going to take a look.

@MasterofJOKers
Copy link
Contributor

_split_match() returns spaces replaced with \s+. Also re.escape() escapes spaces, so the whole _split_match() magic is a little hard to fix.

>>> print(re.escape('alpha charlie gamma'))
alpha\ charlie\ gamma

@MasterofJOKers
Copy link
Contributor

diff --git a/src/documents/models.py b/src/documents/models.py
index 7d12f91..f368f31 100644
--- a/src/documents/models.py
+++ b/src/documents/models.py
@@ -98,15 +98,14 @@ class MatchingModel(models.Model):
         if self.matching_algorithm == self.MATCH_ALL:
             for word in self._split_match():
                 search_result = re.search(
-                    r"\b{}\b".format(re.escape(word)), text, **search_kwargs)
+                    r"\b{}\b".format(word), text, **search_kwargs)
                 if not search_result:
                     return False
             return True
 
         if self.matching_algorithm == self.MATCH_ANY:
             for word in self._split_match():
-                if re.search(r"\b{}\b".format(re.escape(word)), text,
-                             **search_kwargs):
+                if re.search(r"\b{}\b".format(word), text, **search_kwargs):
                     return True
             return False
 
@@ -142,7 +141,7 @@ class MatchingModel(models.Model):
         findterms = re.compile(r'"([^"]+)"|(\S+)').findall
         normspace = re.compile(r"\s+").sub
         return [
-            normspace(" ", (t[0] or t[1]).strip()).replace(" ", r"\s+")
+            re.escape(normspace(" ", (t[0] or t[1]).strip())).replace(r"\ ", r"\s+")
             for t in findterms(self.match)
         ]

This makes it work but is pretty much unreadable.

@MasterofJOKers
Copy link
Contributor

I'd prefer something like this. Opinions?

        findterms = re.compile(r'"([^"]+)"|(\S+)').findall
        normspace = re.compile(r"\s+").sub
        # find all terms and replace multiple spaces with a single one
        terms = [normspace(" ", (t[0] or t[1]).strip())
                    for t in findterms(self.match)]
        # escape each term so we don't have regexes where we don't want them.
        # This escapes spaces, too.
        terms = [re.escape(t).replace(r"\ ", "\s+")
                    for t in terms]
        return terms

@pitkley pitkley force-pushed the pitkley-patch-1 branch 2 times, most recently from 3e6d177 to 989af90 Compare February 23, 2020 15:51
Rather than using the user/document-provided values directly, we instead
escape them to use them verbatim.

This fixes issue #568.
@pitkley
Copy link
Member Author

pitkley commented Feb 23, 2020

@MasterofJOKers sorry I didn't react after your review, thank you very much for analyzing this. I have applied pretty much exactly your solution to the problem, it seems like the simplest fix while still being correct! 👍

@pitkley pitkley requested review from MasterofJOKers and a team February 23, 2020 15:59
Copy link
Contributor

@MasterofJOKers MasterofJOKers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really liked the comments in my version though ;)

@pitkley pitkley requested a review from a team February 23, 2020 18:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants