Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New style entropy exclusions #229

Merged
merged 9 commits into from
Oct 19, 2021
Merged

Conversation

tarkatronic
Copy link
Contributor

@tarkatronic tarkatronic commented Oct 14, 2021

To help us get this pull request reviewed and merged quickly, please be sure to include the following items:

  • Tests (if applicable)
  • Documentation (if applicable)
  • Changelog entry
  • A full explanation here in the PR description of the work done

PR Type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update (formatting, local variables)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Tests
  • Other

Backward Compatibility

Is this change backward compatible with the most recently released version? Does it introduce changes which might change the user experience in any way? Does it alter the API in any way?

  • Yes (backward compatible)
  • No (breaking changes)

Issue Linking

What's new?

match = match and rule.path_pattern.match(path) is not None
elif rule.re_match_type == "search":
if rule.pattern:
match = rule.pattern.search(line) is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we're mixing two terms. Match type should be search vs match. However match_type seems to be changing the match scope from string to line. Should we provide match_scope (entropy vs line)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I agree the naming is crap here. But this was my third iteration, and this split logic will only exist until v3.0, so I gave up on better naming. I'd be open to suggestions if you think it really is that bad.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone uses search, this change will search by line. Will this introduce a breaking change for 3.0? As in, will 3.0 no longer use search to search by line? If so and it's easy I'd recommend we add something like match-scope to avoid a breaking change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this. It seems like it would be easy enough to use caller-supplied input to drive .re_match_type, and default to the behavior you have now. The benefit of doing it now rather than waiting for 3.0 is that users who actually pay attention to the deprecation warning could rewrite their configs in a way that would preserve behavior while making it explicit what they intended.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, at the risk of making you tear out your hair, it might be better to allow (making this up on the fly) scope: {line|string} and transfer this directly to .re_match_type (or .re_scope if you prefer), where "match" becomes "string" and "search" becomes "line". Then we are aligned more with the intent vs the actual implementation, and it's ever so slightly easier to remember why we're doing this. ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be handled in a follow-up easily enough, if it's desired badly enough. As-is, with the existing deprecation warning, users will be able to update their configs for the upcoming behavioral change.

.. code-block:: toml

[tool.tartufo]
exclude-entropy-patterns = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this!


[tool.tartufo]
exclude-entropy-patterns = [
{files = 'docs/.*\.md$', match = '^[a-zA-Z0-9]$', reason = 'exclude all git SHAs in the docs'},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be using the term pattern where it's used in other places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout! I can make that change. That also reminds me that I forgot to add to the documentation, that's the only field that's absolutely required!

@tarkatronic tarkatronic force-pushed the new-style-entropy-exclusions branch from 6a3c2ee to 8764ba7 Compare October 14, 2021 17:05
match = match and rule.path_pattern.match(path) is not None
elif rule.re_match_type == "search":
if rule.pattern:
match = rule.pattern.search(line) is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone uses search, this change will search by line. Will this introduce a breaking change for 3.0? As in, will 3.0 no longer use search to search by line? If so and it's easy I'd recommend we add something like match-scope to avoid a breaking change.

.. code-block:: toml

[[tool.tartufo.exclude-entropy-patterns]]
files = 'docs/.*\.md$'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be file-pattern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or path-pattern?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm yeah I think maybe file-pattern would be good, since we're using it for matching file names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although now that I look at the code, it's already referred to as path_pattern, so that seems to be the natural winner. 😄

Copy link
Contributor

@rbailey-godaddy rbailey-godaddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a few gratuitous suggestions, but the current code isn't bad. I'd lobby for the tweaks if the time is available, but if we're trying to get a quick fix out the door, go for it.

Comment on lines 221 to +231
name=rule_name,
pattern=re.compile(rule_definition["pattern"]),
path_pattern=re.compile(path_pattern) if path_pattern else None,
re_match_type="match",
)
except AttributeError:
rule = Rule(
name=rule_name, pattern=re.compile(rule_definition), path_pattern=None
name=rule_name,
pattern=re.compile(rule_definition),
path_pattern=None,
re_match_type="match",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your life would become much simpler if you used re.compile("") instead of None for the case that path_pattern is not specified. The empty regex matches immediately, which is what you're looking for, and the exact result of the match test is never used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this part is largely pre-existing code/behavior, I will make note of this for when we go back over this in preparation for 3.0

@@ -255,20 +260,47 @@ def compile_rule(pattern: str) -> Rule:
path, pattern = pattern.split("::", 1)
except ValueError: # Raised when the split separator is not found
path = ".*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. I would use "" instead of ".*", as it (also) matches every target every time, but doesn't require scanning the target string.

Rule(
name=pattern.get("reason", None), # type: ignore[union-attr]
pattern=re.compile(pattern["pattern"]), # type: ignore[index]
path_pattern=re.compile(pattern.get("files", ".*")), # type: ignore[union-attr]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
path_pattern=re.compile(pattern.get("files", ".*")), # type: ignore[union-attr]
path_pattern=re.compile(pattern.get("files", "")), # type: ignore[union-attr]

For the same reasons as above

Comment on lines +300 to +303
if rule.pattern:
match = rule.pattern.match(string) is not None
if rule.path_pattern:
match = match and rule.path_pattern.match(path) is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because rule.pattern is mandatory and rule.path_pattern is always a re (if you took the advice above), I think this can be simplified:

Suggested change
if rule.pattern:
match = rule.pattern.match(string) is not None
if rule.path_pattern:
match = match and rule.path_pattern.match(path) is not None
match = rule.pattern.match(string) is not None and rule.path_pattern.match(path) is not None

or presumably even match = bool(rule.pattern.match(string) and rule.path_pattern.match(path))

match = match and rule.path_pattern.match(path) is not None
elif rule.re_match_type == "search":
if rule.pattern:
match = rule.pattern.search(line) is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this. It seems like it would be easy enough to use caller-supplied input to drive .re_match_type, and default to the behavior you have now. The benefit of doing it now rather than waiting for 3.0 is that users who actually pay attention to the deprecation warning could rewrite their configs in a way that would preserve behavior while making it explicit what they intended.

match = match and rule.path_pattern.match(path) is not None
elif rule.re_match_type == "search":
if rule.pattern:
match = rule.pattern.search(line) is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, at the risk of making you tear out your hair, it might be better to allow (making this up on the fly) scope: {line|string} and transfer this directly to .re_match_type (or .re_scope if you prefer), where "match" becomes "string" and "search" becomes "line". Then we are aligned more with the intent vs the actual implementation, and it's ever so slightly easier to remember why we're doing this. ;-)

Copy link
Contributor

@smimani-godaddy smimani-godaddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🦅

Copy link
Contributor

@mayuriesha mayuriesha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@rbailey-godaddy rbailey-godaddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM ;)

@tarkatronic tarkatronic merged commit 2731713 into main Oct 19, 2021
@tarkatronic tarkatronic deleted the new-style-entropy-exclusions branch October 19, 2021 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants