Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --entropy-sensitivity option for controlling entropy checks #272

Merged
merged 12 commits into from
Nov 15, 2021
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
vx.y.z - TBD
------------

Features:

* [#270](https://github.com/godaddy/tartufo/issues/270) - When no refs/branches
are found locally, tartufo will now scan the repo HEAD as a single commit,
effectively scanning the entire codebase at once.
* [#265](https://github.com/godaddy/tartufo/issues/265) - Adds new `--entropy-sensitivity`
option which provides a friendlier way to adjust entropy detection sensitivity.
This replaces `--b64-entropy-score` and `--hex-entropy-score`, which now are
marked as deprecated.

v3.0.0-alpha.1 - 11 November 2021
---------------------------------
Expand Down
62 changes: 36 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,6 @@ Options:

--entropy / --no-entropy Enable entropy checks. [default: True]
--regex / --no-regex Enable high signal regexes checks.
[default: False]
--scan-filenames / --no-scan-filenames
Check the names of files being scanned as well as their contents.
[default: True]

-ip, --include-path-patterns TEXT
Expand All @@ -77,6 +74,14 @@ Options:
excluded unless effectively excluded via the
--include-path-patterns option.

-of, --output-format [json|compact|text]
Specify the format in which the output needs
to be generated `--output-format
json/compact/text`. Either `json`, `compact`
or `text` can be specified. If not provided
(default) the output will be generated in
`text` format.

-xe, --exclude-entropy-patterns TEXT
Specify a regular expression which matches
entropy strings to exclude from the scan.
Expand All @@ -100,12 +105,6 @@ Options:
with keeping the results of individual runs
of tartufo separated.

-of, --output-format TEXT Specify the format in which the output needs
to be generated `--output-format json/compact/text`.
Either `json`, `compact` or `text` can be specified.
If not provided (default) the output will be generated
in `text` format.

--git-rules-repo TEXT A file path, or git URL, pointing to a git
repository containing regex rules to be used
for scanning. By default, all .json files
Expand Down Expand Up @@ -134,31 +133,42 @@ Options:
Enable or disable timestamps in logging
messages. [default: True]

-b64, --b64-entropy-score FLOAT
Modify the base64 entropy score. If you
specify a value greater than the default,
tartufo lists higher entropy base64 strings
--entropy-sensitivity INTEGER RANGE
Modify entropy detection sensitivity. This
is expressed as on a scale of 0 to 100,
where 0 means "totally nonrandom" and 100
means "totally random". Decreasing the
scanner's sensitivity increases the
likelihood that a given string will be
identified as suspicious. [default: 75]

-b64, --b64-entropy-score TEXT [DEPRECATED] Use `--entropy-sensitivity`.
Modify the base64 entropy score. If a value
greater than the default (4.5 in a range of
0.0-6.0) is specified, tartufo lists higher
entropy base64 strings (longer or more
randomized strings. A lower value lists
lower entropy base64 strings (shorter or
less randomized strings).

-hex, --hex-entropy-score TEXT [DEPRECATED] Use `--entropy-sensitivity`.
Modify the hexadecimal entropy score. If a
value greater than the default (3.0 in a
range of 0.0-4.0) is specified, tartufo
lists higher entropy hexadecimal strings
(longer or more randomized strings). A lower
value lists lower entropy base64 strings
(shorter or less randomized strings).
[default: 4.5]

-hex, --hex-entropy-score FLOAT
Modify the hexadecimal entropy score. If you
specify a value greater than the default,
tartufo lists higher entropy hexadecimal
strings (longer or more randomized strings).
A lower value lists lower entropy
hexadecimal strings (shorter or less
randomized strings). [default: 3.0]
value lists lower entropy hexadecimal
strings (shorter or less randomized
strings).

-V, --version Show the version and exit.
-h, --help Show this message and exit.

Commands:
scan-folder Scan a folder.
scan-remote-repo Automatically clone and scan a remote git repository.
pre-commit Scan staged changes in a pre-commit hook.
scan-local-repo Scan a repository already cloned to your local system.
scan-remote-repo Automatically clone and scan a remote git repository.

```

Expand Down
4 changes: 2 additions & 2 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ sphinx-click = {version = "^2.5.0", optional = true}
sphinx-rtd-theme = {version = "^0.5.0", optional = true}
sphinxcontrib-spelling = {version = "^5.4.0", optional = true}
tomlkit = "^0.7.2"
cached-property = "^1.5.2"

[tool.poetry.dev-dependencies]
black = {version = "21.5b2", allow-prereleases = true, markers = "platform_python_implementation == 'CPython'"}
Expand Down
32 changes: 20 additions & 12 deletions tartufo/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,25 +200,33 @@ def get_command(self, ctx: click.Context, cmd_name: str) -> Optional[click.Comma
show_default=True,
help="Enable or disable timestamps in logging messages.",
)
@click.option(
"--entropy-sensitivity",
type=click.IntRange(0, 100),
default=75,
show_default=True,
help="""Modify entropy detection sensitivity. This is expressed as on a scale
of 0 to 100, where 0 means "totally nonrandom" and 100 means "totally random".
Decreasing the scanner's sensitivity increases the likelihood that a given
string will be identified as suspicious.""",
)
@click.option(
"-b64",
"--b64-entropy-score",
default=4.5,
show_default=True,
help="Modify the base64 entropy score. If a value greater than the default is "
"specified, tartufo lists higher entropy base64 strings (longer or more randomized "
"strings). A lower value lists lower entropy base64 strings (shorter or less "
"randomized strings).",
help="""[DEPRECATED] Use `--entropy-sensitivity`. Modify the base64 entropy score. If
a value greater than the default (4.5 in a range of 0.0-6.0) is specified,
tartufo lists higher entropy base64 strings (longer or more randomized strings.
A lower value lists lower entropy base64 strings (shorter or less randomized
strings).""",
)
@click.option(
"-hex",
"--hex-entropy-score",
default=3.0,
show_default=True,
help="Modify the hexadecimal entropy score. If a value greater than the default is "
"specified, tartufo lists higher entropy hexadecimal strings (longer or more randomized "
"strings). A lower value lists lower entropy hexadecimal strings (shorter or less "
"randomized strings).",
help="""[DEPRECATED] Use `--entropy-sensitivity`. Modify the hexadecimal entropy score.
If a value greater than the default (3.0 in a range of 0.0-4.0) is specified,
tartufo lists higher entropy hexadecimal strings (longer or more randomized
strings). A lower value lists lower entropy hexadecimal strings (shorter or less
randomized strings).""",
)
# The first positional argument here would be a hard-coded version, hence the `None`
@click.version_option(None, "-V", "--version")
Expand Down
64 changes: 56 additions & 8 deletions tartufo/scanner.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,11 @@
Set,
Tuple,
)
import warnings

from cached_property import cached_property
import click
import git

import pygit2

from tartufo import config, types, util
Expand Down Expand Up @@ -146,6 +147,56 @@ def __init__(self, options: types.GlobalOptions) -> None:
self.global_options = options
self.logger = logging.getLogger(__name__)

def compute_scaled_entropy_limit(self, maximum_bitrate: float) -> float:
"""Determine low entropy cutoff for specified bitrate

:param maximum_bitrate: How many bits does each character represent?
:returns: Entropy detection threshold scaled to the input bitrate
"""

if self.global_options.entropy_sensitivity is None:
sensitivity = 75
else:
sensitivity = self.global_options.entropy_sensitivity
return float(sensitivity) / 100.0 * maximum_bitrate

@cached_property
def hex_entropy_limit(self) -> float:
"""Returns low entropy limit for suspicious hexadecimal encodings"""

# For backwards compatibility, allow the caller to manipulate this score
# # directly (but complain about it).
if self.global_options.hex_entropy_score:
warnings.warn(
"--hex-entropy-score is deprecated. Use --entropy-sensitivity instead.",
DeprecationWarning,
)
return self.global_options.hex_entropy_score

# Each hexadecimal digit represents a 4-bit number, so we want to scale
# the base score by this amount to account for the efficiency of the
# string representation we're examining.
return self.compute_scaled_entropy_limit(4.0)

@cached_property
def b64_entropy_limit(self) -> float:
"""Returns low entropy limit for suspicious base64 encodings"""

# For backwards compatibility, allow the caller to manipulate this score
# # directly (but complain about it).
if self.global_options.b64_entropy_score:
warnings.warn(
"--b64-entropy-score is deprecated. Use --entropy-sensitivity instead.",
DeprecationWarning,
)
return self.global_options.b64_entropy_score

# Each 4-character base64 group represents 3 8-bit bytes, i.e. an effective
# bit rate of 24/4 = 6 bits per character. We want to scale the base score
# by this amount to account for the efficiency of the string representation
# we're examining.
return self.compute_scaled_entropy_limit(6.0)

@property
def completed(self) -> bool:
"""Return True if scan has completed
Expand Down Expand Up @@ -398,22 +449,19 @@ def scan(self) -> Generator[Issue, None, None]:
if self.global_options.entropy:
for issue in self.scan_entropy(
chunk,
self.global_options.b64_entropy_score,
self.global_options.hex_entropy_score,
):
self._issues.append(issue)
yield issue
self._completed = True
self.logger.info("Found %d issues.", len(self._issues))

def scan_entropy(
self, chunk: types.Chunk, b64_entropy_score: float, hex_entropy_score: float
self,
chunk: types.Chunk,
) -> Generator[Issue, None, None]:
"""Scan a chunk of data for apparent high entropy.

:param chunk: The chunk of data to be scanned
:param b64_entropy_score: Base64 entropy score
:param hex_entropy_score: Hexadecimal entropy score
"""

for line in chunk.contents.split("\n"):
Expand All @@ -423,12 +471,12 @@ def scan_entropy(

for string in b64_strings:
yield from self.evaluate_entropy_string(
chunk, line, string, BASE64_CHARS, b64_entropy_score
chunk, line, string, BASE64_CHARS, self.b64_entropy_limit
)

for string in hex_strings:
yield from self.evaluate_entropy_string(
chunk, line, string, HEX_CHARS, hex_entropy_score
chunk, line, string, HEX_CHARS, self.hex_entropy_limit
)

def evaluate_entropy_string(
Expand Down
2 changes: 2 additions & 0 deletions tartufo/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ class GlobalOptions:
"output_format",
"b64_entropy_score",
"hex_entropy_score",
"entropy_sensitivity",
)
rules: Tuple[TextIO, ...]
default_regexes: bool
Expand All @@ -46,6 +47,7 @@ class GlobalOptions:
output_format: Optional[str]
b64_entropy_score: float
hex_entropy_score: float
entropy_sensitivity: int


@dataclass
Expand Down
Loading