-
Notifications
You must be signed in to change notification settings - Fork 735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: globs
& regexps
for SitemapRequestList
#2631
Conversation
!this.urlExcludePatternObjects.some((patternObject) => { | ||
const { regexp, glob } = patternObject; | ||
return (regexp && url.match(regexp)) || (glob && minimatch(url, glob, { nocase: true })); | ||
}) && | ||
(this.urlPatternObjects.length === 0 || | ||
this.urlPatternObjects.some((patternObject) => { | ||
const { regexp, glob } = patternObject; | ||
return (regexp && url.match(regexp)) || (glob && minimatch(url, glob, { nocase: true })); | ||
})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bruh, extract those massive boolean expressions into variables please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I first thought of a function that would return these arrow functions, closuring a URL (so we could only write this.urlPatternObjects.some(matchUrl(url))
- imo beautifully readable). Both of the arrow functions currently are the same thing. I just didn't want to define this matchUrl
function in the middle of isUrlMatchingPatterns
(as it's gonna be called many many many times) - and making it a private method in SitemapRequestList
didn't really feel right either.
Idk, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except for that one comment
private isUrlMatchingPatterns(url: string): boolean { | ||
return ( | ||
!this.urlExcludePatternObjects.some((patternObject) => { | ||
const { regexp, glob } = patternObject; | ||
return (regexp && url.match(regexp)) || (glob && minimatch(url, glob, { nocase: true })); | ||
}) && | ||
(this.urlPatternObjects.length === 0 || | ||
this.urlPatternObjects.some((patternObject) => { | ||
const { regexp, glob } = patternObject; | ||
return (regexp && url.match(regexp)) || (glob && minimatch(url, glob, { nocase: true })); | ||
})) | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about something like this:
private isUrlMatchingPatterns(url: string): boolean { | |
return ( | |
!this.urlExcludePatternObjects.some((patternObject) => { | |
const { regexp, glob } = patternObject; | |
return (regexp && url.match(regexp)) || (glob && minimatch(url, glob, { nocase: true })); | |
}) && | |
(this.urlPatternObjects.length === 0 || | |
this.urlPatternObjects.some((patternObject) => { | |
const { regexp, glob } = patternObject; | |
return (regexp && url.match(regexp)) || (glob && minimatch(url, glob, { nocase: true })); | |
})) | |
); | |
} | |
private isUrlMatchingPatterns(url: string): boolean { | |
const matchesSomeExcludePattern = this.urlExcludePatternObjects.some((patternObject) => { | |
const { regexp, glob } = patternObject; | |
return (regexp && url.match(regexp)) || (glob && minimatch(url, glob, { nocase: true })); | |
}) | |
const matchesSomeIncludePattern = this.urlPatternObjects.some((patternObject) => { | |
const { regexp, glob } = patternObject; | |
return (regexp && url.match(regexp)) || (glob && minimatch(url, glob, { nocase: true })); | |
}) | |
return ( | |
!matchesSomeExcludePattern && | |
(this.urlPatternObjects.length === 0 || matchesSomeIncludePattern) | |
); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right, I guess I went a bit overboard... but I'd argue that my solution makes it ever so slightly even DRYier.
Tomayto, tomahto, I think both of the solutions solve the same issue of readability in the condition, albeit each one a bit differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, what you did is cool. Feel free to merge.
Adds
globs
,regexps
andexclude
options toSitemapRequestList
to make it list only URLs that match those patterns.While this looked like a single-responsibility violation to me at first, these changes are imo quite justifiable, as the possible use cases (wanting to crawl only a part of a website while utilizing the official sitemap) seem to be very sane.