Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduction of gitleaks as a pre-commit git hook to prevent secrets from getting committed once activated #1674

Merged
merged 71 commits into from
Jul 29, 2021

Conversation

td-usds
Copy link
Contributor

@td-usds td-usds commented Jul 20, 2021

See also #1645

Introduction

This PR introduces gitleaks as a pre-commit hook to git (to be set up by each individual developer) combined with scripts to both set up your hooks and run that hook in a stand-alone fashion.
It also makes gitleaks run on a schedule over (a subset of) the repository's history (see below for more info and explanation).

The purpose of this work is to lower the probability of a developer inadvertently committing sensitive values and having a commit-time check present to prevent this from happening (i.e. preventing the commit from being created). Since our repository is a public repository, GitHub will automatically scan it for committed secrets and if it finds them, it will reach out to the issuer to take appropriate action (and it is up to the issuer to take whichever action it deems fit if it decides to do so, which could be 'revoke this secret' - e.g. based on a check of "is this an actual live secret or something that just matches our pattern?"). However, this check is post-factum and at the point in time when it is already too late (the secret is committed and has been pushed up for everyone to see).
Thus we introduce a mechanism to prevent the commit in the first place using a git pre-commit hook. This introduces one check that is done pre-commit and scans all files that are marked in git as "staged" for secrets (to keep it fast and snappy - see "known limitations" below as well) and will prevent the commit from being created if the tool finds HITs for any of the defined patterns.

The hooks are also installed upon an invocation of prime-router/cleanslate.sh to make sure you get into a known good state. Git hooks are also explained in the getting-started.md document.

Most of work is stored in the .environment/ directory at the root of the repository; the rationale being this help you set up your environment. I'm not married to that name...

NOTE: this needs a security review and/or consideration because those who install the hook, "trust" that the called code is something they want to run. There is an attack scenario where an ill-wisher changes the called code in the runner (to do something bad) which will now get run by every developer. We need to consider this carefully and keep a close eye on changes to these hook runners!

The DevOps team has discussed this potential issue and feels that we have other mitigations in place that secure us against this particular vector. The DevOps team has also gathered some feedback from team members who are satisfied with the mitigations we have in place.

Some things to look at (all inside .environment/)

  1. ./githooks.sh: a helper script to install/enable or remove the git hook configuration. This script will need to be run by each and every committed to the repository in each and every clone. run ./githooks.sh without arguments to learn more about its options. This script is set up so that if we want to take advantage of more git hooks in the future (e.g. pre-push, pre-merge-commit, etc...), we can easily have this script do that for us.

  2. pre-commit.hook.sh: this is the file that gets 'installed' as .git/hooks/pre-commit. It is a delegation script that just hands control over to the actual real implementation of the pre-commit hook, which in turn is stored in .environment/pre-commit.runner.sh. The reason for the existence of this file is so that once the hook is installed, we can add more actions to the pre-commit hook by modifying the the runner file without the need to reinstall the hook itself: you just commit "more" into the runner file and that gets picked up by your previously installed hook. This file is just a dumb pass-through. Changes to this file will require a new running of githooks.sh to be reflected in your clones.

  3. pre-commit.runner.sh: this file is called by the pre-commit hook (which in turn, is installed at .git/hooks/pre-commit from the pre-commit.hook.sh file) and actually runs the different checks we want to enforce pre-committing. In its current shape, this file runs gitleaks solely on your uncommitted changes. It does this by invoking all files specified in the ${CHECKS_TO_RUN} variable, which must be marked as +x (executable). Adding additional checks in the future is as simple as creating a new shell script, and adding it to the ${CHECKS_TO_RUN} variable

  4. ./gitleaks/ directory: This directory contains the code that invokes the Gitleaks tool onto your uncommitted changes. There is the run-gitleaks.sh script which invokes gitleaks as a container and bind mounts your repository into it; and there's gitleaks-config.toml which contains the rules that gitleaks is scanning for. These rules are things that we will have to maintain to make sure we have the necessary coverage of sensitive values. On completion, the tool either reports 0 (success: no leaks) or non-zero (failure: leaks), which then gets rolled back up into the git commit invocation, which in turn only proceeds if the pre-commit hook reports back 0 as return code.

Note that the .environment/pre-commit.runner.sh and .environment/gitleaks/run-gitleaks.sh scripts can both be invoked manually in the event you should want to do that (e.g. to test that your modifications to the config file are effectual).

The gitleaks tool is set up to report out into both of these files at the same time (which are entered into .gitignore):

  • ${REPO_ROOT}/gitleaks.log: its human readable output
  • ${REPO_ROOT}/gitleaks.report.json: a json file containing either null (the literal "null") or json content that lets you hunt down the violations.

Some stuff outside of .environment/

This introduces a new GitHub action that runs on a schedule, defined in .github/run_gitleaks.yml: this action will run over the entire history of the repository SINCE (and including) the specified commit. In other words it runs on the interval [<since_commit_hash>, HEAD]. The reason we specify a boundary is because we have had some commits that contain 'leaks'. Those have either since been mitigated (and can be found again by disabling the allowlists in the gitleaks-config.toml file) and which we don't want to have show up as false positives.
In other words, we consider the specified commit as an LKG (Last Known Good) and work forward from there, assuming that there are no violations moving forward.
In the future, if we do have a violation, this will trip and we will have to fix it. which means that after fixing it, we'll have to update the commit hash specified in the run_gitleaks.yml file as well. This is intentional because now we have a clear audit trail of all the different points in time that we designated an LKG.

Known limitations

  • Hooks are client side: every developer must install these hooks into every clone to get protection; however, once installed, additionally enabled checks for those installed hooks will not require intervention (cf pre-commit.runner.sh invoked from the pre-commit hook)
  • Hooks can be circumvented, I can still commit sensitive values if I really want to by specifying --no-verify to my git commit invocation or by uninstalling the hook (but this leaves a trail, obviously)
  • Files with both staged and unstaged changes may result in 'unintuitive' (but explainable) behavior:
    • you have a file foo with content that trips a rule; you stage this file
    • you run the pre-commit hook which successfully trips and fails
    • You fix the file to no longer contain the sensitive value; but you do not stage the file
    • you re-run gitleaks
    • gitleaks succeeds
    • You could now commit the staged version of the file which still contains the sensitive value and gitleaks will not help you here
    • explanation: gitleaks looks at the file listing of the files you have staged and then scans those files in your work tree. It does NOT scan the file as-staged. This is a limitation of the tool as it currently stands
  • Your best way to get feedback is on the command line; if your IDE/Development Environment does not play nice with this, then you may get a sub-prime UX

Outstanding work

  • Define the right regex'es that we need in the gitleaks configuration file
    • SendGrid
    • Twilio
    • Okta
    • Azure (help needed with regex)
    • IPv(4|6) addresses
  • Run gitleaks on a schedule as a GitHub Action

@td-usds td-usds self-assigned this Jul 20, 2021
@td-usds td-usds added the DevOps Work Type label to flag work related to DevOps label Jul 21, 2021
@td-usds td-usds changed the title Introduction of gitleaks as a pre-commit git hook to prevent secrets from getting committed once activated #1645 Introduction of gitleaks as a pre-commit git hook to prevent secrets from getting committed once activated Jul 21, 2021
@td-usds td-usds changed the title #1645 Introduction of gitleaks as a pre-commit git hook to prevent secrets from getting committed once activated Introduction of gitleaks as a pre-commit git hook to prevent secrets from getting committed once activated Jul 21, 2021
Copy link
Contributor

@ronaldheft-gov ronaldheft-gov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept is looking good. Ran into some issues running on my Mac.

.environment/githooks.sh Show resolved Hide resolved
.environment/gitleaks/gitleaks-config.toml Show resolved Hide resolved
.environment/githooks.sh Show resolved Hide resolved
@td-usds td-usds linked an issue Jul 28, 2021 that may be closed by this pull request
- uses: actions/checkout@v2
with:
# Fetch the full history
fetch-depth: 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/actions/checkout/blob/main/README.md:

    # Number of commits to fetch. 0 indicates all history for all branches and tags.
    # Default: 1
    fetch-depth: ''

@sonarcloud
Copy link

sonarcloud bot commented Jul 29, 2021

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Comment on lines +34 to +55
regexes = [
'(?i)@cdc.local',
'(?i)@email.com',
'(?i)@organization.tld',
'(?i)a@cdc.gov',
'(?i)adhelpdsk@cdc.gov',
'(?i)data@cdc.gov',
'(?i)e.ripley@weylandyutani.com',
'(?i)jbrush@avantecenters.com',
'(?i)jj@phd.gov',
'(?i)joe.jones@az.pima.gov',
'(?i)local@test.com',
'(?i)noreply@cdc.gov',
'(?i)prime@cdc.gov',
'(?i)qom6@cdc.gov',
'(?i)qtv1@cdc.gov',
'(?i)qva8@cdc.gov',
'(?i)reportstream@cdc.gov',
'(?i)support@simplereport.gov',
'(?i)usds@cdc.gov',
'(?i)usds@omb.eop.gov',
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked up where these emails are. Interesting some are in docs. Not sure if we should have a client's email in a doc, but that's for a separate PR.

@@ -0,0 +1,281 @@
title = "PRIME ReportStream Gitleaks Configuration"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config looks good.

@td-usds td-usds merged commit fbf8cc7 into master Jul 29, 2021
@td-usds td-usds deleted the td/gitleaks branch July 29, 2021 16:19
@td-usds td-usds linked an issue Jul 29, 2021 that may be closed by this pull request
JosiahSiegel added a commit that referenced this pull request Jan 6, 2022
JosiahSiegel added a commit that referenced this pull request Jan 11, 2022
…t-trust' into josiahsiegel/#3775-deploy-frontend-efficiency
JosiahSiegel added a commit that referenced this pull request Jan 12, 2022
* closes #1647 sign docker image

* run dct steps if key vault set

* do not sign image if env.USE_DCT != true
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DevOps Work Type label to flag work related to DevOps security Work Type label to flag work related to security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Secret scanner for GitHub on commit
3 participants