Make regex and glob pattern matching stricter #130

bburky · 2020-04-16T00:25:01Z

I was trying to use glob patterns and ran into one bug and one quirk that made them impossible to use correctly:

Regex metacharacters (such as .) were not escaped in globs (which use regexes for parsing internally):

!*.example.com matched evilexample.com
!e.a.p.e.com matched example.com

Patterns do not match the entire domain, even with "match domain only" enabled:

@example.com matched evil.example.com.evil.com (user fixable with regex patterns: @^example.com$ behaves as expected, but I don't want this to be necessary)
!example.com matched evil.example.com.evil.com (no anchors in glob patterns, not fixable by the user)

I believe it is more intuitive for patterns to always match the entire string when "match domain only" is enabled.

The first commit fixes the regex metacharacter bug and the second adds anchors when matching patterns with "match domain only" is enabled. The final commit updates the tests (not 100% if the logic is correct with the matchDomainOnly/should/should not testing).

The "match domain only" change is a breaking change. You can drop that commit if you want, but please document the current behavior. The glob example in the README matches many more domains more than just the specified domain.

A weaker change might be to add anchors only when matching glob patterns, not regexes. But note that this is a breaking change too: users might've been using example.com to match subdomains or www.example.com.

If you want the patterns to behave like prefixes it's not sufficient to only add a right $ anchor, because example\.com$ still matches evilexample.com

I'm not sure what the best solution here is.

…s enabled

…regex and glob patterns

ghost · 2020-04-18T14:58:32Z

Hi,

Thanks for the contribution. I'll review it when I have time. Two comments I'd like to make beforehand are about

Patterns do not match the entire domain, even with "match domain only" enabled:

I think you're misunderstanding the feature. Normally, a pattern matches the whole URL e.g happy will match https://sad.com/happy. With "Match domain only", sad.com will be used as the target. Therefore @example.com matching evil.example.com.evil.com is expected.

@^example.com$ behaves as expected, but I don't want this to be necessary

That's how regexes work. Without the anchors, they match the whole string.

ghost · 2020-04-18T16:35:13Z

src/utils.js

-    const regex = host.substr(1);
+    let regex = host.substr(1);
+    if (matchDomainOnly) {
+      // This might generate double ^^ characters, but that works anyway
+      regex = "^" + regex + "$";
+    }


As mentioned in the comments earlier, this is an unnecessary change based on the assumption that regexes should match from the beginning of the string.

ghost · 2020-04-18T16:35:47Z

src/utils.js

-        .replace(/\*/g, '.*')
-        .replace(/\?/g, '.?'))
-        .test(toMatch);
+    // Because the string is regex escaped, you must match \* to instead of *


There's either a word too many here or one missing.

ghost · 2020-04-18T16:40:35Z

src/utils.js

+    let regex = host.substr(1);
+    if (matchDomainOnly) {
+      // This might generate double ^^ characters, but that works anyway
+      regex = "^" + regex + "$";


This is a linting error. Please run npm lint.

ghost · 2020-04-18T17:02:15Z

src/__tests__/utils.spec.js

            let prefix = matchDomainOnly ? 'should not' : 'should';
-            let description = `${prefix} match url with pattern only in path`;
+            let description = `${pattern} ${prefix} match ${evilUrl}`;
            it(description, () => {
              expect(
                  utils.matchesSavedMap(
-                      'https://google.com/?q=duckduckgo',
+                      evilUrl,
                      matchDomainOnly, {
-                        host: simplePattern,
+                        host: pattern,
                      })
              ).toBe(!matchDomainOnly);


This section's purpose is to make sure that matchDomain=true really only uses the domain for matching and not the URL path contain the domain. When matchDomain=false the url should match because the URL path contains the domain.

It looks like you repurposed the section to use evilUrl which makes sense when you put the domain in the URL's path. However, it becomes a lot less clear in the other cases.

I'd recommend explicitly adding an it for your test and reworking this part to add the domain of expectedUrl to the URL path.

bburky · 2020-04-19T01:36:05Z

The main problem is that it is impossible to match the entire domain with a glob. Is that something you want to enable?

I agree it's not required to make regexes behave this way, but I was trying to find any way to use globs like I expected to.

ghost · 2020-04-19T08:27:47Z

The main problem is that it is impossible to match the entire domain with a glob. Is that something you want to enable?

You might need to explain how you got to that conclusion 🤔

*, *google* and *.com all match the entire google domains (and more). And just to make sure we're talking about the same thing: <protocol>://<domain>/<rest>

https://developer.mozilla.org/en-US/docs/Web/API/URL/hostname

const domain = url.hostname
url.hostname = domain

This is what you mean when you say "domain", right?

I agree it's not required to make regexes behave this way, but I was trying to find any way to use globs like I expected to.

Yes, I'm not disagreeing with your solution for globs.

benyaminl · 2022-03-04T11:40:05Z

Any update to this? Only need to be merged, one year already past.. yet no update :/

mikedlr · 2022-11-16T11:30:37Z

@benyaminl think that the maintainer has been busy. I opened an issue to discuss support. Are you able to help?

benyaminl · 2022-11-16T15:29:13Z

@mikedlr I already move to Tab Groups as it provide all function I need more than containerise, so I will pas this time.

bburky added 3 commits April 15, 2020 18:53

Escape metacharacters when building regexes from glob patterns

9527989

Always match the entire domain with patterns when match domain only i…

27378d0

…s enabled

Add tests for matching domain substrings and wildcard subdomains for …

0152d79

…regex and glob patterns

bburky changed the title ~~Pattern~~ Make regex and glob pattern matching stricter Apr 16, 2020

ghost suggested changes Apr 18, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make regex and glob pattern matching stricter #130

Make regex and glob pattern matching stricter #130

bburky commented Apr 16, 2020

ghost commented Apr 18, 2020 •

edited by ghost

Loading

ghost Apr 18, 2020

ghost Apr 18, 2020

ghost Apr 18, 2020

ghost Apr 18, 2020

bburky commented Apr 19, 2020

ghost commented Apr 19, 2020 •

edited by ghost

Loading

benyaminl commented Mar 4, 2022

mikedlr commented Nov 16, 2022

benyaminl commented Nov 16, 2022

Make regex and glob pattern matching stricter #130

Are you sure you want to change the base?

Make regex and glob pattern matching stricter #130

Conversation

bburky commented Apr 16, 2020

ghost commented Apr 18, 2020 • edited by ghost Loading

ghost Apr 18, 2020

Choose a reason for hiding this comment

ghost Apr 18, 2020

Choose a reason for hiding this comment

ghost Apr 18, 2020

Choose a reason for hiding this comment

ghost Apr 18, 2020

Choose a reason for hiding this comment

bburky commented Apr 19, 2020

ghost commented Apr 19, 2020 • edited by ghost Loading

benyaminl commented Mar 4, 2022

mikedlr commented Nov 16, 2022

benyaminl commented Nov 16, 2022

ghost commented Apr 18, 2020 •

edited by ghost

Loading

ghost commented Apr 19, 2020 •

edited by ghost

Loading