Skip to content

Commit

Permalink
Add --entropy-sensitivity option for controlling entropy checks (#272)
Browse files Browse the repository at this point in the history
* Add --sensitivity option

Fixes #265

Provide more comprehensible alternative for tuning entropy checking.
This is applied consistently across all target character sets, and stated
in a way that is slightly easier to understand ("higher means more
likely to flag a given string").

The older `--b64-entropy-score` and `--hex-entropy-score` options are
marked as deprecated but retained for backwards compatibility (and they
override `--sensitivity` if used together with it).

* Change option name

Use `--entropy-sensitivity` instead of `--sensitivity`

* Wordsmithing

* Wordsmithing again

* Rebase to eliminate merge conflicts

* linter fixups

* Documentation fixups

...in response to review comments

* Expose magic numbers as properties

...and use them instead of private members

* Fix managed merge handling

* Remove entropy scoring members

Do everything from scratch instead of storing explicitly. This is a PITA
because you can't combine `@property` and `@lru_cache()` and skipping
the caching would be a killer.

* Review feedback tuneups

* Consolidate common code for entropy limit back into a single method,
  and rework properties related to it so they are cleaner.
* Invert sensitivity scale; adjust math and doc to match. It's still
  weird but aligns more closely to the underlying entropy metric.

* Fix change log

Co-authored-by: Scott Bailey <scott.bailey@godaddy.com>
  • Loading branch information
rbailey-godaddy and rscottbailey authored Nov 15, 2021
1 parent 1b0f778 commit a0eba89
Show file tree
Hide file tree
Showing 8 changed files with 159 additions and 96 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
vx.y.z - TBD
------------

Features:

* [#270](https://github.com/godaddy/tartufo/issues/270) - When no refs/branches
are found locally, tartufo will now scan the repo HEAD as a single commit,
effectively scanning the entire codebase at once.
* [#265](https://github.com/godaddy/tartufo/issues/265) - Adds new `--entropy-sensitivity`
option which provides a friendlier way to adjust entropy detection sensitivity.
This replaces `--b64-entropy-score` and `--hex-entropy-score`, which now are
marked as deprecated.

v3.0.0-alpha.1 - 11 November 2021
---------------------------------
Expand Down
62 changes: 36 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,6 @@ Options:

--entropy / --no-entropy Enable entropy checks. [default: True]
--regex / --no-regex Enable high signal regexes checks.
[default: False]
--scan-filenames / --no-scan-filenames
Check the names of files being scanned as well as their contents.
[default: True]

-ip, --include-path-patterns TEXT
Expand All @@ -77,6 +74,14 @@ Options:
excluded unless effectively excluded via the
--include-path-patterns option.

-of, --output-format [json|compact|text]
Specify the format in which the output needs
to be generated `--output-format
json/compact/text`. Either `json`, `compact`
or `text` can be specified. If not provided
(default) the output will be generated in
`text` format.

-xe, --exclude-entropy-patterns TEXT
Specify a regular expression which matches
entropy strings to exclude from the scan.
Expand All @@ -100,12 +105,6 @@ Options:
with keeping the results of individual runs
of tartufo separated.

-of, --output-format TEXT Specify the format in which the output needs
to be generated `--output-format json/compact/text`.
Either `json`, `compact` or `text` can be specified.
If not provided (default) the output will be generated
in `text` format.

--git-rules-repo TEXT A file path, or git URL, pointing to a git
repository containing regex rules to be used
for scanning. By default, all .json files
Expand Down Expand Up @@ -134,31 +133,42 @@ Options:
Enable or disable timestamps in logging
messages. [default: True]
-b64, --b64-entropy-score FLOAT
Modify the base64 entropy score. If you
specify a value greater than the default,
tartufo lists higher entropy base64 strings
--entropy-sensitivity INTEGER RANGE
Modify entropy detection sensitivity. This
is expressed as on a scale of 0 to 100,
where 0 means "totally nonrandom" and 100
means "totally random". Decreasing the
scanner's sensitivity increases the
likelihood that a given string will be
identified as suspicious. [default: 75]

-b64, --b64-entropy-score TEXT [DEPRECATED] Use `--entropy-sensitivity`.
Modify the base64 entropy score. If a value
greater than the default (4.5 in a range of
0.0-6.0) is specified, tartufo lists higher
entropy base64 strings (longer or more
randomized strings. A lower value lists
lower entropy base64 strings (shorter or
less randomized strings).

-hex, --hex-entropy-score TEXT [DEPRECATED] Use `--entropy-sensitivity`.
Modify the hexadecimal entropy score. If a
value greater than the default (3.0 in a
range of 0.0-4.0) is specified, tartufo
lists higher entropy hexadecimal strings
(longer or more randomized strings). A lower
value lists lower entropy base64 strings
(shorter or less randomized strings).
[default: 4.5]
-hex, --hex-entropy-score FLOAT
Modify the hexadecimal entropy score. If you
specify a value greater than the default,
tartufo lists higher entropy hexadecimal
strings (longer or more randomized strings).
A lower value lists lower entropy
hexadecimal strings (shorter or less
randomized strings). [default: 3.0]
value lists lower entropy hexadecimal
strings (shorter or less randomized
strings).

-V, --version Show the version and exit.
-h, --help Show this message and exit.

Commands:
scan-folder Scan a folder.
scan-remote-repo Automatically clone and scan a remote git repository.
pre-commit Scan staged changes in a pre-commit hook.
scan-local-repo Scan a repository already cloned to your local system.
scan-remote-repo Automatically clone and scan a remote git repository.

```
Expand Down
4 changes: 2 additions & 2 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ sphinx-click = {version = "^2.5.0", optional = true}
sphinx-rtd-theme = {version = "^0.5.0", optional = true}
sphinxcontrib-spelling = {version = "^5.4.0", optional = true}
tomlkit = "^0.7.2"
cached-property = "^1.5.2"

[tool.poetry.dev-dependencies]
black = {version = "21.5b2", allow-prereleases = true, markers = "platform_python_implementation == 'CPython'"}
Expand Down
32 changes: 20 additions & 12 deletions tartufo/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,25 +200,33 @@ def get_command(self, ctx: click.Context, cmd_name: str) -> Optional[click.Comma
show_default=True,
help="Enable or disable timestamps in logging messages.",
)
@click.option(
"--entropy-sensitivity",
type=click.IntRange(0, 100),
default=75,
show_default=True,
help="""Modify entropy detection sensitivity. This is expressed as on a scale
of 0 to 100, where 0 means "totally nonrandom" and 100 means "totally random".
Decreasing the scanner's sensitivity increases the likelihood that a given
string will be identified as suspicious.""",
)
@click.option(
"-b64",
"--b64-entropy-score",
default=4.5,
show_default=True,
help="Modify the base64 entropy score. If a value greater than the default is "
"specified, tartufo lists higher entropy base64 strings (longer or more randomized "
"strings). A lower value lists lower entropy base64 strings (shorter or less "
"randomized strings).",
help="""[DEPRECATED] Use `--entropy-sensitivity`. Modify the base64 entropy score. If
a value greater than the default (4.5 in a range of 0.0-6.0) is specified,
tartufo lists higher entropy base64 strings (longer or more randomized strings.
A lower value lists lower entropy base64 strings (shorter or less randomized
strings).""",
)
@click.option(
"-hex",
"--hex-entropy-score",
default=3.0,
show_default=True,
help="Modify the hexadecimal entropy score. If a value greater than the default is "
"specified, tartufo lists higher entropy hexadecimal strings (longer or more randomized "
"strings). A lower value lists lower entropy hexadecimal strings (shorter or less "
"randomized strings).",
help="""[DEPRECATED] Use `--entropy-sensitivity`. Modify the hexadecimal entropy score.
If a value greater than the default (3.0 in a range of 0.0-4.0) is specified,
tartufo lists higher entropy hexadecimal strings (longer or more randomized
strings). A lower value lists lower entropy hexadecimal strings (shorter or less
randomized strings).""",
)
# The first positional argument here would be a hard-coded version, hence the `None`
@click.version_option(None, "-V", "--version")
Expand Down
64 changes: 56 additions & 8 deletions tartufo/scanner.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,11 @@
Set,
Tuple,
)
import warnings

from cached_property import cached_property
import click
import git

import pygit2

from tartufo import config, types, util
Expand Down Expand Up @@ -146,6 +147,56 @@ def __init__(self, options: types.GlobalOptions) -> None:
self.global_options = options
self.logger = logging.getLogger(__name__)

def compute_scaled_entropy_limit(self, maximum_bitrate: float) -> float:
"""Determine low entropy cutoff for specified bitrate
:param maximum_bitrate: How many bits does each character represent?
:returns: Entropy detection threshold scaled to the input bitrate
"""

if self.global_options.entropy_sensitivity is None:
sensitivity = 75
else:
sensitivity = self.global_options.entropy_sensitivity
return float(sensitivity) / 100.0 * maximum_bitrate

@cached_property
def hex_entropy_limit(self) -> float:
"""Returns low entropy limit for suspicious hexadecimal encodings"""

# For backwards compatibility, allow the caller to manipulate this score
# # directly (but complain about it).
if self.global_options.hex_entropy_score:
warnings.warn(
"--hex-entropy-score is deprecated. Use --entropy-sensitivity instead.",
DeprecationWarning,
)
return self.global_options.hex_entropy_score

# Each hexadecimal digit represents a 4-bit number, so we want to scale
# the base score by this amount to account for the efficiency of the
# string representation we're examining.
return self.compute_scaled_entropy_limit(4.0)

@cached_property
def b64_entropy_limit(self) -> float:
"""Returns low entropy limit for suspicious base64 encodings"""

# For backwards compatibility, allow the caller to manipulate this score
# # directly (but complain about it).
if self.global_options.b64_entropy_score:
warnings.warn(
"--b64-entropy-score is deprecated. Use --entropy-sensitivity instead.",
DeprecationWarning,
)
return self.global_options.b64_entropy_score

# Each 4-character base64 group represents 3 8-bit bytes, i.e. an effective
# bit rate of 24/4 = 6 bits per character. We want to scale the base score
# by this amount to account for the efficiency of the string representation
# we're examining.
return self.compute_scaled_entropy_limit(6.0)

@property
def completed(self) -> bool:
"""Return True if scan has completed
Expand Down Expand Up @@ -398,22 +449,19 @@ def scan(self) -> Generator[Issue, None, None]:
if self.global_options.entropy:
for issue in self.scan_entropy(
chunk,
self.global_options.b64_entropy_score,
self.global_options.hex_entropy_score,
):
self._issues.append(issue)
yield issue
self._completed = True
self.logger.info("Found %d issues.", len(self._issues))

def scan_entropy(
self, chunk: types.Chunk, b64_entropy_score: float, hex_entropy_score: float
self,
chunk: types.Chunk,
) -> Generator[Issue, None, None]:
"""Scan a chunk of data for apparent high entropy.
:param chunk: The chunk of data to be scanned
:param b64_entropy_score: Base64 entropy score
:param hex_entropy_score: Hexadecimal entropy score
"""

for line in chunk.contents.split("\n"):
Expand All @@ -423,12 +471,12 @@ def scan_entropy(

for string in b64_strings:
yield from self.evaluate_entropy_string(
chunk, line, string, BASE64_CHARS, b64_entropy_score
chunk, line, string, BASE64_CHARS, self.b64_entropy_limit
)

for string in hex_strings:
yield from self.evaluate_entropy_string(
chunk, line, string, HEX_CHARS, hex_entropy_score
chunk, line, string, HEX_CHARS, self.hex_entropy_limit
)

def evaluate_entropy_string(
Expand Down
2 changes: 2 additions & 0 deletions tartufo/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ class GlobalOptions:
"output_format",
"b64_entropy_score",
"hex_entropy_score",
"entropy_sensitivity",
)
rules: Tuple[TextIO, ...]
default_regexes: bool
Expand All @@ -46,6 +47,7 @@ class GlobalOptions:
output_format: Optional[str]
b64_entropy_score: float
hex_entropy_score: float
entropy_sensitivity: int


@dataclass
Expand Down
Loading

0 comments on commit a0eba89

Please sign in to comment.