-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New style entropy exclusions #229
Conversation
match = match and rule.path_pattern.match(path) is not None | ||
elif rule.re_match_type == "search": | ||
if rule.pattern: | ||
match = rule.pattern.search(line) is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we're mixing two terms. Match type should be search vs match. However match_type seems to be changing the match scope from string to line. Should we provide match_scope (entropy vs line)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I agree the naming is crap here. But this was my third iteration, and this split logic will only exist until v3.0, so I gave up on better naming. I'd be open to suggestions if you think it really is that bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone uses search
, this change will search by line. Will this introduce a breaking change for 3.0? As in, will 3.0 no longer use search
to search by line? If so and it's easy I'd recommend we add something like match-scope
to avoid a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this. It seems like it would be easy enough to use caller-supplied input to drive .re_match_type
, and default to the behavior you have now. The benefit of doing it now rather than waiting for 3.0 is that users who actually pay attention to the deprecation warning could rewrite their configs in a way that would preserve behavior while making it explicit what they intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, at the risk of making you tear out your hair, it might be better to allow (making this up on the fly) scope: {line|string}
and transfer this directly to .re_match_type
(or .re_scope
if you prefer), where "match" becomes "string" and "search" becomes "line". Then we are aligned more with the intent vs the actual implementation, and it's ever so slightly easier to remember why we're doing this. ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be handled in a follow-up easily enough, if it's desired badly enough. As-is, with the existing deprecation warning, users will be able to update their configs for the upcoming behavioral change.
.. code-block:: toml | ||
|
||
[tool.tartufo] | ||
exclude-entropy-patterns = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this!
docs/source/features.rst
Outdated
|
||
[tool.tartufo] | ||
exclude-entropy-patterns = [ | ||
{files = 'docs/.*\.md$', match = '^[a-zA-Z0-9]$', reason = 'exclude all git SHAs in the docs'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be using the term pattern where it's used in other places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good callout! I can make that change. That also reminds me that I forgot to add to the documentation, that's the only field that's absolutely required!
6a3c2ee
to
8764ba7
Compare
match = match and rule.path_pattern.match(path) is not None | ||
elif rule.re_match_type == "search": | ||
if rule.pattern: | ||
match = rule.pattern.search(line) is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone uses search
, this change will search by line. Will this introduce a breaking change for 3.0? As in, will 3.0 no longer use search
to search by line? If so and it's easy I'd recommend we add something like match-scope
to avoid a breaking change.
docs/source/features.rst
Outdated
.. code-block:: toml | ||
|
||
[[tool.tartufo.exclude-entropy-patterns]] | ||
files = 'docs/.*\.md$' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be file-pattern
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or path-pattern
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm yeah I think maybe file-pattern
would be good, since we're using it for matching file names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although now that I look at the code, it's already referred to as path_pattern
, so that seems to be the natural winner. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a few gratuitous suggestions, but the current code isn't bad. I'd lobby for the tweaks if the time is available, but if we're trying to get a quick fix out the door, go for it.
name=rule_name, | ||
pattern=re.compile(rule_definition["pattern"]), | ||
path_pattern=re.compile(path_pattern) if path_pattern else None, | ||
re_match_type="match", | ||
) | ||
except AttributeError: | ||
rule = Rule( | ||
name=rule_name, pattern=re.compile(rule_definition), path_pattern=None | ||
name=rule_name, | ||
pattern=re.compile(rule_definition), | ||
path_pattern=None, | ||
re_match_type="match", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your life would become much simpler if you used re.compile("")
instead of None
for the case that path_pattern
is not specified. The empty regex matches immediately, which is what you're looking for, and the exact result of the match test is never used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this part is largely pre-existing code/behavior, I will make note of this for when we go back over this in preparation for 3.0
@@ -255,20 +260,47 @@ def compile_rule(pattern: str) -> Rule: | |||
path, pattern = pattern.split("::", 1) | |||
except ValueError: # Raised when the split separator is not found | |||
path = ".*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. I would use ""
instead of ".*"
, as it (also) matches every target every time, but doesn't require scanning the target string.
tartufo/config.py
Outdated
Rule( | ||
name=pattern.get("reason", None), # type: ignore[union-attr] | ||
pattern=re.compile(pattern["pattern"]), # type: ignore[index] | ||
path_pattern=re.compile(pattern.get("files", ".*")), # type: ignore[union-attr] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path_pattern=re.compile(pattern.get("files", ".*")), # type: ignore[union-attr] | |
path_pattern=re.compile(pattern.get("files", "")), # type: ignore[union-attr] |
For the same reasons as above
if rule.pattern: | ||
match = rule.pattern.match(string) is not None | ||
if rule.path_pattern: | ||
match = match and rule.path_pattern.match(path) is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because rule.pattern
is mandatory and rule.path_pattern
is always a re
(if you took the advice above), I think this can be simplified:
if rule.pattern: | |
match = rule.pattern.match(string) is not None | |
if rule.path_pattern: | |
match = match and rule.path_pattern.match(path) is not None | |
match = rule.pattern.match(string) is not None and rule.path_pattern.match(path) is not None |
or presumably even match = bool(rule.pattern.match(string) and rule.path_pattern.match(path))
match = match and rule.path_pattern.match(path) is not None | ||
elif rule.re_match_type == "search": | ||
if rule.pattern: | ||
match = rule.pattern.search(line) is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this. It seems like it would be easy enough to use caller-supplied input to drive .re_match_type
, and default to the behavior you have now. The benefit of doing it now rather than waiting for 3.0 is that users who actually pay attention to the deprecation warning could rewrite their configs in a way that would preserve behavior while making it explicit what they intended.
match = match and rule.path_pattern.match(path) is not None | ||
elif rule.re_match_type == "search": | ||
if rule.pattern: | ||
match = rule.pattern.search(line) is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, at the risk of making you tear out your hair, it might be better to allow (making this up on the fly) scope: {line|string}
and transfer this directly to .re_match_type
(or .re_scope
if you prefer), where "match" becomes "string" and "search" becomes "line". Then we are aligned more with the intent vs the actual implementation, and it's ever so slightly easier to remember why we're doing this. ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🦅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still LGTM ;)
To help us get this pull request reviewed and merged quickly, please be sure to include the following items:
PR Type
What kind of change does this PR introduce?
Backward Compatibility
Is this change backward compatible with the most recently released version? Does it introduce changes which might change the user experience in any way? Does it alter the API in any way?
Issue Linking
What's new?
--exclude-entropy-patterns
to behave as it did before allow exclude-entropy-patterns to match lines containing partial matches #222 by default--exclude-entropy-patterns
behind an updated config specification styleDeprecationWarning
s