Add caret negation ([^...]
) in addition to exclamation mark
#141
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #116.
This feature has some interesting history. Let's start with the source of everything. Here's POSIX:
(emphasis mine)
The
glob(7)
man page supports this:This seems to have had the effect that many implementations have opted to support both
[!...]
and[^...]
. This leaves us in a situation where this crate is correct, but sometimes leads to unexpected behaviour for users expecting[^...]
to work. This happened in uutils and could become an issue for other projects as well, such assudo-rs
(see trifectatechfoundation/sudo-rs#834). Fixing this issue with a workaround is quite tricky (see e.g. uutils/coreutils#5584).Now for a fun round of "What do the others do?":
Python: only `[!...]` is negation
prints
libc: both work
sudo's custom fnmatch: both work
See this relevant line in the code: https://github.com/sudo-project/sudo/blob/b6175b78ad1c4c9535cad48cb76addf53352a28f/lib/util/fnmatch.c#L174
npm `glob` package: both work
Documented here: https://www.npmjs.com/package/glob#glob-primer
ruby: both work (but `^` is emphasized in the docs)
prints
Java: only `[!...]` is negation
prints
Downsides and alternatives
The major downside of this is that it is a breaking change and people might be relying on the current behaviour, but that is difficult to tell. It might also be that people have unknowingly written incorrect patterns using
^
, because that's what they expect from regex. Maybe we can estimate that with a GitHub code search.So, there is a case to be made that this should belong in a fork. The reason that I believe that this change belongs here is that I think it helps all the projects that re-implement existing C code into Rust and removes an edge case.
Another option is to make the parsing of patterns in this crate configurable, where the default behaviour stays the same. This could potentially also be extended to allow for escape sequences and other features. Again, though, that comes with downsides. Mostly because all combinations of options would have to be tested and maintained. Given that this crate is a cornerstone in the Rust ecosystem, that combinatorical explosion might lead to (security) issues in various projects.
Relevant GitHub Code Searches
I did a couple searches for this pattern on GitHub to estimate the impact of this change. All searches are with
lang:rust
:/glob\(\"[^"]*\[\^/
: no relevant results, all hits are with custom glob implementations/Pattern::new\(\"[^"]*\[\^/
: no relevant results, mostly matches non-glob type of patterns.