Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex-based Exclusions #87

Closed
dclayton-godaddy opened this issue Sep 18, 2020 · 3 comments · Fixed by #192
Closed

Regex-based Exclusions #87

dclayton-godaddy opened this issue Sep 18, 2020 · 3 comments · Fixed by #192
Labels
enhancement New feature or request Hacktoberfest
Milestone

Comments

@dclayton-godaddy
Copy link
Contributor

Feature Request

Is your feature request related to a problem? Please describe.

To avoid noise in a scan, we'd like to provide exclusions using regular expressions on the contents of files.

Describe the solution you'd like

An exclusions file that can contain the regular expression exclusion rules.

For example, to ignore a git hash in a url:

'([ =''/"]+)[a-z0-9]{40}([/''" ])?$'

Describe alternatives you've considered

Teachability, Documentation, Adoption, Migration Strategy

@dclayton-godaddy dclayton-godaddy added the enhancement New feature or request label Sep 18, 2020
@tarkatronic
Copy link
Contributor

This is proposing a 3rd type of exclusion for scans. We currently have:

  1. Exclude entire files via the -x / --exclude-paths option
  2. Exclude specific matches by "signature", which is a hash generated from the string matched and the filename.

The new exclusion type which this would introduce would use a set of regular expressions, run against all strings found during a scan, to mark them as false positives.

For example, if you have GitHub URLs in your code, especially links to specific revisions or to gists, those will often get flagged as matches. So with this feature, you could add an exclusion pattern to explicitly allow links from github.com.

One particular challenge of this feature is that it may need to include context around the matches. Let's look at a real world example:

  • Your code has a comment in it with the following link: https://github.com/godaddy/tartufo/blob/9e1fc5b577bf56cca1dd020e66bfec21ff1b96d4/README.md.
  • tartufo will mark the hash in that URL, 9e1fc5b577bf56cca1dd020e66bfec21ff1b96d4, as a high entropy match and potential issue.
  • The regex-exclusion will need to look at the entire URL, not just the matched hash, to determine if this can be excluded as an acceptable string.

@jolinger-godaddy
Copy link

jolinger-godaddy commented Oct 29, 2020

+1, this will be on ongoing challenge with many of our repositories as they frequently include updates with URLs containing various hash strings

@mxhenry-godaddy mxhenry-godaddy added this to the Version 3.1 milestone Nov 11, 2020
@tarkatronic tarkatronic linked a pull request Jun 15, 2021 that will close this issue
15 tasks
@tarkatronic
Copy link
Contributor

Implemented via #192

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Hacktoberfest
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants