-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filtering on the presence or absence of captures #606
Comments
Thanks for this very thorough write up! I kind of feel like that semantics of this are too complex, which will probably lead to a feature that almost nobody uses. By that, I don't mean that the flags With that said, I'd be willing to adopt a feature like this because I do agree that it could be useful, but I'd have to strongly insist on the following:
|
Now that PRCE is (optionally) supported, can either of you think of a use-case for this that isn't handled by lookahead and lookbehind? I think this would be strictly more powerful than negative lookbehind, since lookbehind can't contain variable-length patterns, but that's the only advantage I can see. (Granted, that's an advantage I think I would occasionally find useful.) |
I think it could be possible to define a simpler UX than needing to resort to look-around. With that said, it's a good point and I was never a big fan of adding this feature anyway. So I'm going to close this. |
Lookaround assertions still have the issues mentioned above. For anyone looking for a clean solution to this with the PCRE engine, the backtracking-control verbs123 are your friends: Input
Command$ rg --pcre2 '(?:"Tarzan")(*SKIP)(*FAIL)|\bTarzan\b' test.txt Output
Or, to exclude lines which contain Command$ rg --pcre2 '(?:.*?"Tarzan".*)(*SKIP)(*FAIL)|\bTarzan\b' test.txt
$ rg --pcre2 '(?:.*?"Tarzan".*)(*COMMIT)(*FAIL)|\bTarzan\b' test.txt Output
Footnotes |
@chocolateboy Wow, I had never heard of those before. Thanks for sharing. |
TL;DR
Select all lines which match
\bTarzan\b
but not"Tarzan"
:AKA
Suppose I want to select all lines which contain the unquoted word
Tarzan
i.e.\bTarzan\b
but not"Tarzan"
e.g. the first 4 lines of:test.txt
It can be done with a pipeline e.g.:
But that particular example rejects lines which contain both, which is not what we want in this case. The same would be true if ripgrep added e.g. an
-E
(--no-regexp
) option to complement-e
/--regexp
:It can be done in one pass with PCRE-flavored greps such as GNU grep and ack, with varying degrees of difficulty/unreadability, by using negative lookahead/look-behind assertions e.g.:
That's already pretty gnarly for a single exclusion, and quickly becomes impractical/incomprehensible for multiple exclusions. It also matches lines which don't contain
Tarzan
and, again, excludes lines which contain both patterns.In programming languages, there's a common pattern for performing exclusions in a simple, readable way without multiple passes:
e.g.:
JavaScript
ES.next[1]
Ruby
etc.
This isn't available in any greps I'm aware of, but since the machinery is already there to capture and reference subexpressions by index and name, it seems like a small step to use them in predicates to reproduce the flexibility and simplicity of this pattern on the command line e.g.:
output
Notes
1) I assume that the predicate can be inverted e.g.:
AKA
There aren't many single-letter options left. The last remaining pairs are -
d
/-D
,-y
/-Y
and-z
/-Z
. The latter are commonly used to denote null/zero values, so they could be used instead, with the meaning of-d
and-D
inverted e.g.:2) I assume that indices increment across multiple patterns, and that multiple
-d
and-D
options can be combined e.g.:3) I also assume that numbered and named captures can be mixed e.g.:
4) The full version of the matching command would currently be:
Hopefully some of that boilerplate can be removed e.g. via #389 or #593.
5) For clarity,
"Tarzan" vs Tarzan
is omitted from the examples. Handling it only slightly complicates the regex:The text was updated successfully, but these errors were encountered: