-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduction of gitleaks as a pre-commit git hook to prevent secrets from getting committed once activated #1674
Conversation
…ks to scan your staged changes for secrets
…ludes a check that the file you are running is indeed marked as +x
…ge underneath us, instead of using 'latest'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept is looking good. Ran into some issues running on my Mac.
…d, we use 2 regular arrays that we map to one another using indices
…is a binary in linux, so do not clash, I'm using 'note')
- uses: actions/checkout@v2 | ||
with: | ||
# Fetch the full history | ||
fetch-depth: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://github.com/actions/checkout/blob/main/README.md:
# Number of commits to fetch. 0 indicates all history for all branches and tags.
# Default: 1
fetch-depth: ''
Also making cleanslate callable from anywhere
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
regexes = [ | ||
'(?i)@cdc.local', | ||
'(?i)@email.com', | ||
'(?i)@organization.tld', | ||
'(?i)a@cdc.gov', | ||
'(?i)adhelpdsk@cdc.gov', | ||
'(?i)data@cdc.gov', | ||
'(?i)e.ripley@weylandyutani.com', | ||
'(?i)jbrush@avantecenters.com', | ||
'(?i)jj@phd.gov', | ||
'(?i)joe.jones@az.pima.gov', | ||
'(?i)local@test.com', | ||
'(?i)noreply@cdc.gov', | ||
'(?i)prime@cdc.gov', | ||
'(?i)qom6@cdc.gov', | ||
'(?i)qtv1@cdc.gov', | ||
'(?i)qva8@cdc.gov', | ||
'(?i)reportstream@cdc.gov', | ||
'(?i)support@simplereport.gov', | ||
'(?i)usds@cdc.gov', | ||
'(?i)usds@omb.eop.gov', | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked up where these emails are. Interesting some are in docs. Not sure if we should have a client's email in a doc, but that's for a separate PR.
@@ -0,0 +1,281 @@ | |||
title = "PRIME ReportStream Gitleaks Configuration" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Config looks good.
…t-trust' into josiahsiegel/#3775-deploy-frontend-efficiency
* closes #1647 sign docker image * run dct steps if key vault set * do not sign image if env.USE_DCT != true
See also #1645
Introduction
This PR introduces gitleaks as a pre-commit hook to git (to be set up by each individual developer) combined with scripts to both set up your hooks and run that hook in a stand-alone fashion.
It also makes gitleaks run on a schedule over (a subset of) the repository's history (see below for more info and explanation).
The purpose of this work is to lower the probability of a developer inadvertently committing sensitive values and having a commit-time check present to prevent this from happening (i.e. preventing the commit from being created). Since our repository is a public repository, GitHub will automatically scan it for committed secrets and if it finds them, it will reach out to the issuer to take appropriate action (and it is up to the issuer to take whichever action it deems fit if it decides to do so, which could be 'revoke this secret' - e.g. based on a check of "is this an actual live secret or something that just matches our pattern?"). However, this check is post-factum and at the point in time when it is already too late (the secret is committed and has been pushed up for everyone to see).
Thus we introduce a mechanism to prevent the commit in the first place using a git pre-commit hook. This introduces one check that is done pre-commit and scans all files that are marked in git as "staged" for secrets (to keep it fast and snappy - see "known limitations" below as well) and will prevent the commit from being created if the tool finds HITs for any of the defined patterns.
The hooks are also installed upon an invocation of
prime-router/cleanslate.sh
to make sure you get into a known good state. Git hooks are also explained in the getting-started.md document.Most of work is stored in the
.environment/
directory at the root of the repository; the rationale being this help you set up your environment. I'm not married to that name...NOTE: this needs a security review and/or consideration because those who install the hook, "trust" that the called code is something they want to run. There is an attack scenario where an ill-wisher changes the called code in the runner (to do something bad) which will now get run by every developer. We need to consider this carefully and keep a close eye on changes to these hook runners!
Some things to look at (all inside
.environment/
)./githooks.sh
: a helper script to install/enable or remove the git hook configuration. This script will need to be run by each and every committed to the repository in each and every clone. run./githooks.sh
without arguments to learn more about its options. This script is set up so that if we want to take advantage of more git hooks in the future (e.g.pre-push
,pre-merge-commit
, etc...), we can easily have this script do that for us.pre-commit.hook.sh
: this is the file that gets 'installed' as.git/hooks/pre-commit
. It is a delegation script that just hands control over to the actual real implementation of the pre-commit hook, which in turn is stored in.environment/pre-commit.runner.sh
. The reason for the existence of this file is so that once the hook is installed, we can add more actions to the pre-commit hook by modifying the the runner file without the need to reinstall the hook itself: you just commit "more" into the runner file and that gets picked up by your previously installed hook. This file is just a dumb pass-through. Changes to this file will require a new running ofgithooks.sh
to be reflected in your clones.pre-commit.runner.sh
: this file is called by the pre-commit hook (which in turn, is installed at.git/hooks/pre-commit
from thepre-commit.hook.sh
file) and actually runs the different checks we want to enforce pre-committing. In its current shape, this file runs gitleaks solely on your uncommitted changes. It does this by invoking all files specified in the${CHECKS_TO_RUN}
variable, which must be marked as +x (executable). Adding additional checks in the future is as simple as creating a new shell script, and adding it to the${CHECKS_TO_RUN}
variable./gitleaks/
directory: This directory contains the code that invokes the Gitleaks tool onto your uncommitted changes. There is therun-gitleaks.sh
script which invokes gitleaks as a container and bind mounts your repository into it; and there'sgitleaks-config.toml
which contains the rules that gitleaks is scanning for. These rules are things that we will have to maintain to make sure we have the necessary coverage of sensitive values. On completion, the tool either reports 0 (success: no leaks) or non-zero (failure: leaks), which then gets rolled back up into the git commit invocation, which in turn only proceeds if the pre-commit hook reports back 0 as return code.Note that the
.environment/pre-commit.runner.sh
and.environment/gitleaks/run-gitleaks.sh
scripts can both be invoked manually in the event you should want to do that (e.g. to test that your modifications to the config file are effectual).The gitleaks tool is set up to report out into both of these files at the same time (which are entered into
.gitignore
):${REPO_ROOT}/gitleaks.log
: its human readable output${REPO_ROOT}/gitleaks.report.json
: a json file containing eithernull
(the literal "null
") or json content that lets you hunt down the violations.Some stuff outside of
.environment/
This introduces a new GitHub action that runs on a schedule, defined in
.github/run_gitleaks.yml
: this action will run over the entire history of the repository SINCE (and including) the specified commit. In other words it runs on the interval[<since_commit_hash>, HEAD]
. The reason we specify a boundary is because we have had some commits that contain 'leaks'. Those have either since been mitigated (and can be found again by disabling the allowlists in thegitleaks-config.toml
file) and which we don't want to have show up as false positives.In other words, we consider the specified commit as an LKG (Last Known Good) and work forward from there, assuming that there are no violations moving forward.
In the future, if we do have a violation, this will trip and we will have to fix it. which means that after fixing it, we'll have to update the commit hash specified in the
run_gitleaks.yml
file as well. This is intentional because now we have a clear audit trail of all the different points in time that we designated an LKG.Known limitations
pre-commit.runner.sh
invoked from thepre-commit
hook)--no-verify
to mygit commit
invocation or by uninstalling the hook (but this leaves a trail, obviously)foo
with content that trips a rule; you stage this fileOutstanding work
SendGridTwilioOktaIPv(4|6) addressesRun gitleaks on a schedule as a GitHub Action