Introduction of gitleaks as a pre-commit git hook to prevent secrets from getting committed once activated #1674

td-usds · 2021-07-20T20:18:11Z

Introduction

This PR introduces gitleaks as a pre-commit hook to git (to be set up by each individual developer) combined with scripts to both set up your hooks and run that hook in a stand-alone fashion.
It also makes gitleaks run on a schedule over (a subset of) the repository's history (see below for more info and explanation).

The purpose of this work is to lower the probability of a developer inadvertently committing sensitive values and having a commit-time check present to prevent this from happening (i.e. preventing the commit from being created). Since our repository is a public repository, GitHub will automatically scan it for committed secrets and if it finds them, it will reach out to the issuer to take appropriate action (and it is up to the issuer to take whichever action it deems fit if it decides to do so, which could be 'revoke this secret' - e.g. based on a check of "is this an actual live secret or something that just matches our pattern?"). However, this check is post-factum and at the point in time when it is already too late (the secret is committed and has been pushed up for everyone to see).
Thus we introduce a mechanism to prevent the commit in the first place using a git pre-commit hook. This introduces one check that is done pre-commit and scans all files that are marked in git as "staged" for secrets (to keep it fast and snappy - see "known limitations" below as well) and will prevent the commit from being created if the tool finds HITs for any of the defined patterns.

The hooks are also installed upon an invocation of prime-router/cleanslate.sh to make sure you get into a known good state. Git hooks are also explained in the getting-started.md document.

Most of work is stored in the .environment/ directory at the root of the repository; the rationale being this help you set up your environment. I'm not married to that name...

NOTE: this needs a security review and/or consideration because those who install the hook, "trust" that the called code is something they want to run. There is an attack scenario where an ill-wisher changes the called code in the runner (to do something bad) which will now get run by every developer. We need to consider this carefully and keep a close eye on changes to these hook runners!

The DevOps team has discussed this potential issue and feels that we have other mitigations in place that secure us against this particular vector. The DevOps team has also gathered some feedback from team members who are satisfied with the mitigations we have in place.

Some things to look at (all inside `.environment/`)

./githooks.sh: a helper script to install/enable or remove the git hook configuration. This script will need to be run by each and every committed to the repository in each and every clone. run ./githooks.sh without arguments to learn more about its options. This script is set up so that if we want to take advantage of more git hooks in the future (e.g. pre-push, pre-merge-commit, etc...), we can easily have this script do that for us.
pre-commit.hook.sh: this is the file that gets 'installed' as .git/hooks/pre-commit. It is a delegation script that just hands control over to the actual real implementation of the pre-commit hook, which in turn is stored in .environment/pre-commit.runner.sh. The reason for the existence of this file is so that once the hook is installed, we can add more actions to the pre-commit hook by modifying the the runner file without the need to reinstall the hook itself: you just commit "more" into the runner file and that gets picked up by your previously installed hook. This file is just a dumb pass-through. Changes to this file will require a new running of githooks.sh to be reflected in your clones.
pre-commit.runner.sh: this file is called by the pre-commit hook (which in turn, is installed at .git/hooks/pre-commit from the pre-commit.hook.sh file) and actually runs the different checks we want to enforce pre-committing. In its current shape, this file runs gitleaks solely on your uncommitted changes. It does this by invoking all files specified in the ${CHECKS_TO_RUN} variable, which must be marked as +x (executable). Adding additional checks in the future is as simple as creating a new shell script, and adding it to the ${CHECKS_TO_RUN} variable
./gitleaks/ directory: This directory contains the code that invokes the Gitleaks tool onto your uncommitted changes. There is the run-gitleaks.sh script which invokes gitleaks as a container and bind mounts your repository into it; and there's gitleaks-config.toml which contains the rules that gitleaks is scanning for. These rules are things that we will have to maintain to make sure we have the necessary coverage of sensitive values. On completion, the tool either reports 0 (success: no leaks) or non-zero (failure: leaks), which then gets rolled back up into the git commit invocation, which in turn only proceeds if the pre-commit hook reports back 0 as return code.

Note that the .environment/pre-commit.runner.sh and .environment/gitleaks/run-gitleaks.sh scripts can both be invoked manually in the event you should want to do that (e.g. to test that your modifications to the config file are effectual).

The gitleaks tool is set up to report out into both of these files at the same time (which are entered into .gitignore):

${REPO_ROOT}/gitleaks.log: its human readable output
${REPO_ROOT}/gitleaks.report.json: a json file containing either null (the literal "null") or json content that lets you hunt down the violations.

Some stuff outside of `.environment/`

This introduces a new GitHub action that runs on a schedule, defined in .github/run_gitleaks.yml: this action will run over the entire history of the repository SINCE (and including) the specified commit. In other words it runs on the interval [<since_commit_hash>, HEAD]. The reason we specify a boundary is because we have had some commits that contain 'leaks'. Those have either since been mitigated (and can be found again by disabling the allowlists in the gitleaks-config.toml file) and which we don't want to have show up as false positives.
In other words, we consider the specified commit as an LKG (Last Known Good) and work forward from there, assuming that there are no violations moving forward.
In the future, if we do have a violation, this will trip and we will have to fix it. which means that after fixing it, we'll have to update the commit hash specified in the run_gitleaks.yml file as well. This is intentional because now we have a clear audit trail of all the different points in time that we designated an LKG.

Known limitations

Hooks are client side: every developer must install these hooks into every clone to get protection; however, once installed, additionally enabled checks for those installed hooks will not require intervention (cf pre-commit.runner.sh invoked from the pre-commit hook)
Hooks can be circumvented, I can still commit sensitive values if I really want to by specifying --no-verify to my git commit invocation or by uninstalling the hook (but this leaves a trail, obviously)
Files with both staged and unstaged changes may result in 'unintuitive' (but explainable) behavior:
- you have a file foo with content that trips a rule; you stage this file
- you run the pre-commit hook which successfully trips and fails
- You fix the file to no longer contain the sensitive value; but you do not stage the file
- you re-run gitleaks
- gitleaks succeeds
- You could now commit the staged version of the file which still contains the sensitive value and gitleaks will not help you here
- explanation: gitleaks looks at the file listing of the files you have staged and then scans those files in your work tree. It does NOT scan the file as-staged. This is a limitation of the tool as it currently stands
Your best way to get feedback is on the command line; if your IDE/Development Environment does not play nice with this, then you may get a sub-prime UX

Outstanding work

Define the right regex'es that we need in the gitleaks configuration file
- ~~SendGrid~~
- ~~Twilio~~
- ~~Okta~~
- Azure (help needed with regex)
- ~~IPv(4|6) addresses~~
~~Run gitleaks on a schedule as a GitHub Action~~

…ks to scan your staged changes for secrets

…lity

…ation

…the repository

…ludes a check that the file you are running is indeed marked as +x

…ge underneath us, instead of using 'latest'

ronaldheft-gov

Concept is looking good. Ran into some issues running on my Mac.

.environment/githooks.sh

.environment/gitleaks/gitleaks-config.toml

.environment/githooks.sh

…d, we use 2 regular arrays that we map to one another using indices

…not the same

…is a binary in linux, so do not clash, I'm using 'note')

…b action

td-usds · 2021-07-28T20:08:55Z

.github/workflows/run_gitleaks.yaml

+      - uses: actions/checkout@v2
+        with:
+          # Fetch the full history
+          fetch-depth: 0


See https://github.com/actions/checkout/blob/main/README.md:

# Number of commits to fetch. 0 indicates all history for all branches and tags. # Default: 1 fetch-depth: ''

Also making cleanslate callable from anywhere

…d file

sonarcloud · 2021-07-29T14:05:21Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.0% Duplication

ronaldheft-gov · 2021-07-29T16:06:03Z

.environment/gitleaks/gitleaks-config.toml

+        regexes = [
+            '(?i)@cdc.local',
+            '(?i)@email.com',
+            '(?i)@organization.tld',
+            '(?i)a@cdc.gov',
+            '(?i)adhelpdsk@cdc.gov',
+            '(?i)data@cdc.gov',
+            '(?i)e.ripley@weylandyutani.com',
+            '(?i)jbrush@avantecenters.com',
+            '(?i)jj@phd.gov',
+            '(?i)joe.jones@az.pima.gov',
+            '(?i)local@test.com',
+            '(?i)noreply@cdc.gov',
+            '(?i)prime@cdc.gov',
+            '(?i)qom6@cdc.gov',
+            '(?i)qtv1@cdc.gov',
+            '(?i)qva8@cdc.gov',
+            '(?i)reportstream@cdc.gov',
+            '(?i)support@simplereport.gov',
+            '(?i)usds@cdc.gov',
+            '(?i)usds@omb.eop.gov',
+        ]


Looked up where these emails are. Interesting some are in docs. Not sure if we should have a client's email in a doc, but that's for a separate PR.

ronaldheft-gov · 2021-07-29T16:07:13Z

.environment/gitleaks/gitleaks-config.toml

@@ -0,0 +1,281 @@
+title = "PRIME ReportStream Gitleaks Configuration"


Config looks good.

…docker-content-trust

…t-trust' into josiahsiegel/#3775-deploy-frontend-efficiency

* closes #1647 sign docker image * run dct steps if key vault set * do not sign image if env.USE_DCT != true

Thomas D added 11 commits July 20, 2021 13:46

Adding gitleaks log to .gitignore

3f9eb2c

gitleaks: provides code to set up a pre-commit hook which uses gitlea…

a028b94

…ks to scan your staged changes for secrets

Updates to the git hook installation script

98c4d81

Fixing pre-commit hook

c9ec825

Adding indirection for running of pre-commit hooks for future flexibi…

364b344

…lity

Formatting

390b556

Minor comment update in the pre-commit hook

b2d442e

gitleaks: enabling the pre-commit runner to run as a standalone invoc…

e774ac1

…ation

gitleaks: better output of what will _not_ happen

347ea9b

Updating gitleaks configuration

85ecbd0

gitleaks: using the gitleaks configuration as per a relative path to …

e5d38d8

…the repository

td-usds self-assigned this Jul 20, 2021

Thomas D added 3 commits July 20, 2021 16:51

gitleaks: adding comments to the githooks installation script functions

9c8914d

gitleaks: pre-commit runner has better error out handling and now inc…

9492bb5

…ludes a check that the file you are running is indeed marked as +x

gitleaks: use a well known stable version of gitleaks that won't chan…

75eb5c3

…ge underneath us, instead of using 'latest'

td-usds requested review from ronaldheft-gov and MauriceReeves-usds July 20, 2021 21:24

Merge branch 'master' into td/gitleaks

28ece41

td-usds added the DevOps Work Type label to flag work related to DevOps label Jul 21, 2021

td-usds changed the title ~~Introduction of gitleaks as a pre-commit git hook to prevent secrets from getting committed once activated~~ #1645 Introduction of gitleaks as a pre-commit git hook to prevent secrets from getting committed once activated Jul 21, 2021

td-usds changed the title ~~#1645 Introduction of gitleaks as a pre-commit git hook to prevent secrets from getting committed once activated~~ Introduction of gitleaks as a pre-commit git hook to prevent secrets from getting committed once activated Jul 21, 2021

ronaldheft-gov suggested changes Jul 26, 2021

View reviewed changes

.environment/githooks.sh Show resolved Hide resolved

.environment/gitleaks/gitleaks-config.toml Show resolved Hide resolved

.environment/githooks.sh Show resolved Hide resolved

Thomas D added 7 commits July 26, 2021 12:00

gitleaks: we cannot use an associative array on all platforms; instea…

977b89e

…d, we use 2 regular arrays that we map to one another using indices

gitleaks: use _GHDST_HOOK_COUNT in remove_hooks

a731ef8

gitleaks: Adding SendGrid API Key Pattern rule

5122f33

gitleaks: ordering the rules by description field

85625e4

gitleaks: githooks script exits with 1 if the src and dst counts are …

0f2603c

…not the same

gitleaks: fixing global allow-list in gitleaks-config.toml

af8fb3c

gitleaks: removing rules without actionable data

e4ed838

td-usds added the Robustness label Jul 28, 2021

Thomas D added 9 commits July 28, 2021 12:32

run-gitleaks: fix usage

f3a7f88

gitleaks: run-gitleaks can now scan _since_ a particular commit too

39f89d0

gitleaks: adding error/warning/note (info) helpers (note that 'info' …

6407d4e

…is a binary in linux, so do not clash, I'm using 'note')

gitleaks: allow-listing an e-mail address

12fbf9d

gitleaks: run_gitleaks.yaml specifying which LKG to use

f22b7d3

gitleaks: make sure we get the full history in run_gitleaks.yml githu…

d175643

…b action

gitleaks: adding okta Rule with allowlist

baff524

gitleaks: updating LKG

df0d1f1

gitleaks: narrowing allowlist for okta rule

db2f16f

td-usds linked an issue Jul 28, 2021 that may be closed by this pull request

Secret scanner for GitHub on commit #1645

Closed

td-usds commented Jul 28, 2021

View reviewed changes

Thomas D added 4 commits July 29, 2021 09:42

gitleaks: invoke githooks.sh install from cleanslate.sh

a394e1a

Also making cleanslate callable from anywhere

gitleaks: Adding information about git hooks to the getting-started.m…

c546355

…d file

gitleaks: reworking getting-started section on gitleaks

72255b3

gitleaks: adding info about git hook installation into each clone

8f4dbfe

td-usds mentioned this pull request Jul 29, 2021

Create a git pre-commit hook that invokes ktlint #1745

Closed

MauriceReeves-usds approved these changes Jul 29, 2021

View reviewed changes

ronaldheft-gov approved these changes Jul 29, 2021

View reviewed changes

td-usds merged commit fbf8cc7 into master Jul 29, 2021

td-usds deleted the td/gitleaks branch July 29, 2021 16:19

td-usds linked an issue Jul 29, 2021 that may be closed by this pull request

Automatically scan our code base for security vulnerabilities #1396

Closed

td-usds mentioned this pull request Jul 29, 2021

Automatically scan our code base for security vulnerabilities #1396

Closed

td-usds removed a link to an issue Jul 29, 2021

Automatically scan our code base for security vulnerabilities #1396

Closed

JosiahSiegel added a commit that referenced this pull request Jan 6, 2022

Merge remote-tracking branch 'origin/master' into josiahsiegel/#1674-…

1fa2aaa

…docker-content-trust

JosiahSiegel added a commit that referenced this pull request Jan 11, 2022

Merge remote-tracking branch 'origin/josiahsiegel/#1674-docker-conten…

787a97d

…t-trust' into josiahsiegel/#3775-deploy-frontend-efficiency

JosiahSiegel added a commit that referenced this pull request Jan 12, 2022

Merge branch 'master' into josiahsiegel/#1674-docker-content-trust

5049563

JosiahSiegel added a commit that referenced this pull request Jan 12, 2022

Josiahsiegel/#1674 docker content trust (#3735)

ab00fc9

* closes #1647 sign docker image * run dct steps if key vault set * do not sign image if env.USE_DCT != true

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduction of gitleaks as a pre-commit git hook to prevent secrets from getting committed once activated #1674

Introduction of gitleaks as a pre-commit git hook to prevent secrets from getting committed once activated #1674

td-usds commented Jul 20, 2021 •

edited

Loading

ronaldheft-gov left a comment

td-usds Jul 28, 2021

sonarcloud bot commented Jul 29, 2021

ronaldheft-gov Jul 29, 2021

ronaldheft-gov Jul 29, 2021

		@@ -0,0 +1,281 @@
		title = "PRIME ReportStream Gitleaks Configuration"

Introduction of gitleaks as a pre-commit git hook to prevent secrets from getting committed once activated #1674

Introduction of gitleaks as a pre-commit git hook to prevent secrets from getting committed once activated #1674

Conversation

td-usds commented Jul 20, 2021 • edited Loading

Introduction

Some things to look at (all inside .environment/)

Some stuff outside of .environment/

Known limitations

Outstanding work

ronaldheft-gov left a comment

Choose a reason for hiding this comment

td-usds Jul 28, 2021

Choose a reason for hiding this comment

sonarcloud bot commented Jul 29, 2021

ronaldheft-gov Jul 29, 2021

Choose a reason for hiding this comment

ronaldheft-gov Jul 29, 2021

Choose a reason for hiding this comment

td-usds commented Jul 20, 2021 •

edited

Loading

Some things to look at (all inside `.environment/`)

Some stuff outside of `.environment/`