Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make regex and glob pattern matching stricter #130

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

bburky
Copy link

@bburky bburky commented Apr 16, 2020

I was trying to use glob patterns and ran into one bug and one quirk that made them impossible to use correctly:

Regex metacharacters (such as .) were not escaped in globs (which use regexes for parsing internally):

  • !*.example.com matched evilexample.com
  • !e.a.p.e.com matched example.com

Patterns do not match the entire domain, even with "match domain only" enabled:

  • @example.com matched evil.example.com.evil.com (user fixable with regex patterns: @^example.com$ behaves as expected, but I don't want this to be necessary)
  • !example.com matched evil.example.com.evil.com (no anchors in glob patterns, not fixable by the user)

I believe it is more intuitive for patterns to always match the entire string when "match domain only" is enabled.

The first commit fixes the regex metacharacter bug and the second adds anchors when matching patterns with "match domain only" is enabled. The final commit updates the tests (not 100% if the logic is correct with the matchDomainOnly/should/should not testing).

The "match domain only" change is a breaking change. You can drop that commit if you want, but please document the current behavior. The glob example in the README matches many more domains more than just the specified domain.

A weaker change might be to add anchors only when matching glob patterns, not regexes. But note that this is a breaking change too: users might've been using example.com to match subdomains or www.example.com.

If you want the patterns to behave like prefixes it's not sufficient to only add a right $ anchor, because example\.com$ still matches evilexample.com

I'm not sure what the best solution here is.

@bburky bburky changed the title Pattern Make regex and glob pattern matching stricter Apr 16, 2020
@ghost
Copy link

ghost commented Apr 18, 2020

Hi,

Thanks for the contribution. I'll review it when I have time. Two comments I'd like to make beforehand are about

Patterns do not match the entire domain, even with "match domain only" enabled:

I think you're misunderstanding the feature. Normally, a pattern matches the whole URL e.g happy will match https://sad.com/happy. With "Match domain only", sad.com will be used as the target. Therefore @example.com matching evil.example.com.evil.com is expected.

@^example.com$ behaves as expected, but I don't want this to be necessary

That's how regexes work. Without the anchors, they match the whole string.

Comment on lines -86 to +90
const regex = host.substr(1);
let regex = host.substr(1);
if (matchDomainOnly) {
// This might generate double ^^ characters, but that works anyway
regex = "^" + regex + "$";
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the comments earlier, this is an unnecessary change based on the assumption that regexes should match from the beginning of the string.

.replace(/\*/g, '.*')
.replace(/\?/g, '.?'))
.test(toMatch);
// Because the string is regex escaped, you must match \* to instead of *
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's either a word too many here or one missing.

let regex = host.substr(1);
if (matchDomainOnly) {
// This might generate double ^^ characters, but that works anyway
regex = "^" + regex + "$";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a linting error. Please run npm lint.

Comment on lines 100 to 109
let prefix = matchDomainOnly ? 'should not' : 'should';
let description = `${prefix} match url with pattern only in path`;
let description = `${pattern} ${prefix} match ${evilUrl}`;
it(description, () => {
expect(
utils.matchesSavedMap(
'https://google.com/?q=duckduckgo',
evilUrl,
matchDomainOnly, {
host: simplePattern,
host: pattern,
})
).toBe(!matchDomainOnly);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section's purpose is to make sure that matchDomain=true really only uses the domain for matching and not the URL path contain the domain. When matchDomain=false the url should match because the URL path contains the domain.

It looks like you repurposed the section to use evilUrl which makes sense when you put the domain in the URL's path. However, it becomes a lot less clear in the other cases.

I'd recommend explicitly adding an it for your test and reworking this part to add the domain of expectedUrl to the URL path.

@bburky
Copy link
Author

bburky commented Apr 19, 2020

The main problem is that it is impossible to match the entire domain with a glob. Is that something you want to enable?

I agree it's not required to make regexes behave this way, but I was trying to find any way to use globs like I expected to.

@ghost
Copy link

ghost commented Apr 19, 2020

The main problem is that it is impossible to match the entire domain with a glob. Is that something you want to enable?

You might need to explain how you got to that conclusion 🤔

*, *google* and *.com all match the entire google domains (and more). And just to make sure we're talking about the same thing: <protocol>://<domain>/<rest>

https://developer.mozilla.org/en-US/docs/Web/API/URL/hostname

const domain = url.hostname
url.hostname = domain

This is what you mean when you say "domain", right?

I agree it's not required to make regexes behave this way, but I was trying to find any way to use globs like I expected to.

Yes, I'm not disagreeing with your solution for globs.

@benyaminl
Copy link

Any update to this? Only need to be merged, one year already past.. yet no update :/

@mikedlr
Copy link

mikedlr commented Nov 16, 2022

@benyaminl think that the maintainer has been busy. I opened an issue to discuss support. Are you able to help?

@benyaminl
Copy link

@mikedlr I already move to Tab Groups as it provide all function I need more than containerise, so I will pas this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants