Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making checkov between 70% to 25% faster 🚀 #6740

Open
tpvasconcelos opened this issue Sep 29, 2024 · 7 comments
Open

Making checkov between 70% to 25% faster 🚀 #6740

tpvasconcelos opened this issue Sep 29, 2024 · 7 comments
Labels
contribution requested This is a great feature idea, but we will need a contribution to get it added to Checkov. enhancement New feature or request

Comments

@tpvasconcelos
Copy link

tpvasconcelos commented Sep 29, 2024

Description

checkov always eagerly loads all runners and checks on all CLI invocations, regardless of whether they are needed or not. I guess this was not always an issue but currently it seems to add quite the overhead. For instance, a simple checkov --version or checkov --help on a 4-core 8Gb-memory Gitpod instance takes just over 2 seconds. Most of this time is spent importing the hundreds of python modules and checks.

If you agree that this is a welcomed improvement, I've done some digging into how this could be addressed and am proposing an incremental solution in this pull request. My results show reduced runtimes of:

  • around -45% reduction for simple CLI calls like checkov --version
  • around -40% reduction of single-framework runs (e.g. checkov --framework=ansible -d .)
  • very small improvement (-6%) even when for invocations that trigger all checks, which means that the lazy-loading logic does not provide any noticeable overhead.

The current changes in the pull request already pass all current unit tests but if this is a desired feature, I'll need to go over it more carefully to make sure it is ready for a full review.

Motivation

Firstly, it would be nice to not have to wait over 2 seconds for the output of checkov --help.

More seriously, when running multiple checkov checks in a CI/CD pipeline, the time checkov takes to load starts to add up, mainly when using checkov with pre-commit. Consider the following pre-commit config for example:

  - repo: https://github.com/bridgecrewio/checkov
    rev: 3.2.255
    hooks:
      - id: checkov_diff
        name: "checkov (ansible)"
        entry: checkov --framework ansible
        files: ^infra/tf/data/ansible/.+\.ya?ml$
      - id: checkov_diff
        name: "checkov (dockerfile)"
        entry: checkov --framework dockerfile
        files: (\.Dockerfile|Dockerfile)$
      - id: checkov_diff
        name: "checkov (github_actions)"
        entry: checkov --framework github_actions
        files: ^\.github/.+\.ya?ml$
      - id: checkov_diff
        name: "checkov (json, yaml)"
        entry: checkov --framework json,yaml
        files: \.(json|ya?ml)$
      - id: checkov_diff
        name: "checkov (terraform)"
        entry: checkov --framework terraform --var-file=infra/tf/terraform.tfvars
        files: ^infra/tf/.+\.(tf|tfvars)$
      - id: checkov_diff
        name: "checkov (k8s)"
        entry: checkov --framework kubernetes
        files: ^nfra/k8s/.+\.ya?ml$

Benchmarking

The following benchmarks were run both on my local machine (M3 mac) and on a 4-core/8Gb-memory Gitpod instance. The results can vary quite a bit on different Gitpod workspace instances, even when requesting the same resources. For this reason, the benchmarks bellow were run on the same workspace instance.

I used hyperfine to help me gather the benchmark statistics and pandas to correlate and render them in HTML.

Requirements

(expandable section)

After booting the Gitpod instance, I ran the following commands:

# Setup Python env
pyenv install 3.8.20
pyenv global 3.8.20
python3.8 -m pip install pipenv
pipenv --python 3.8
pipenv install
# Install hyperfine
wget https://github.com/sharkdp/hyperfine/releases/download/v1.16.1/hyperfine_1.16.1_amd64.deb
sudo dpkg -i hyperfine_1.16.1_amd64.deb
# and pandas...
pipenv install pandas

On my local machine (macOS) I also had to install hyperfine and pandas with:

brew install hyperfine
pipenv install pandas

Isolating the effect of the change

(expandable section)

The changes proposed in this pull request should not have any impact on the actual execution of the checks and checkov Runners. The effects are only present before the runs are triggered. You can verify this yourself by running some local tests.

To properly isolate the behaviour changed in this PR and remove any extra sources of noise, I patched the BaseRunner.run() to simply return an empty Report right away.

This can be achieved by creating a new entry point under checkov/main_patched.py with the following code:

from __future__ import annotations

from typing import TypeVar, Generic

import checkov.common.runners.base_runner
from checkov.common.output.report import Report

_Context = TypeVar("_Context", bound="dict[Any, Any]|None")
_Definitions = TypeVar("_Definitions", bound="dict[Any, Any]|None")
_GraphManager = TypeVar("_GraphManager", bound="GraphManager[Any, Any]|None")

class PatchedBaseRunner(
    checkov.common.runners.base_runner.BaseRunner,
    Generic[_Definitions, _Context, _GraphManager]
):
    def __new__(cls, *args, **kwargs):
        def noop_run(self, *args, **kwargs):
            return Report(check_type=self.check_type)
        cls.run = noop_run
        return super().__new__(cls, *args, **kwargs)

checkov.common.runners.base_runner.BaseRunner = PatchedBaseRunner

if __name__ == '__main__':
    from checkov.main import Checkov
    import sys

    ckv = Checkov()
    sys.exit(ckv.run())

Test that it works by running:

pipenv run python -m checkov.main_patched --version

Running the benchmark

(expandable section)

I executed the following script to generate the results displayed in the next section:

#!/usr/bin/env bash
set -o errexit  # Exit on error
set -o pipefail # Use last non-zero exit code in a pipeline
set -o nounset  # Disallow expansion of unset variables

cmd='python -m checkov.main_patched'
common_checkov_flags='--quiet --compact --skip-results-upload --skip-download'

bench(){
  local branch="${1}"
  local csv_file="bench_checkov_${branch}.csv"
  git checkout "${branch}"
  echo "Benchmarking ${branch}..."
  hyperfine \
    --warmup 3 \
    --min-runs 10 \
    --max-runs 20 \
    --time-unit second \
    --shell=none \
    --export-csv "${csv_file}" \
    "${cmd} ${common_checkov_flags} --version" \
    "${cmd} ${common_checkov_flags} --list" \
    "${cmd} ${common_checkov_flags} --framework=openapi -d tests/openapi/" \
    "${cmd} ${common_checkov_flags} --framework=ansible -d tests/ansible/examples/" \
    "${cmd} ${common_checkov_flags} --framework=terraform -d tests/terraform/checks/data" \
    "${cmd} ${common_checkov_flags} -d tests/"

  sed -i "s|${cmd} ${common_checkov_flags} ||g" "${csv_file}"
  sed -i 's|command|argv|g' "${csv_file}"
}

bench "main"
bench "lazy-cherry"

python <<EOF
import pandas as pd
drop_cols = ["median", "user", "system", "min", "max"]
df1 = pd.read_csv("bench_checkov_main.csv", index_col="argv").drop(columns=drop_cols)
df2 = pd.read_csv("bench_checkov_lazy-cherry.csv", index_col="argv").drop(columns=drop_cols)
df = pd.concat({"Eager": df1, "Lazy": df2}, axis=1)
# Round means and stds to 3 decimal places
df.loc[:, (slice(None), "mean")] = df.loc[:, (slice(None), "mean")].applymap(lambda x: round(x, 3))
df.loc[:, (slice(None), "stddev")] = df.loc[:, (slice(None), "stddev")].applymap(lambda x: round(x, 3))
# Calculate percentage difference
df.loc[:, ("pct", "mean")] = (df["Lazy"]["mean"] - df["Eager"]["mean"]) / df["Eager"]["mean"] * 100
df.loc[:, ("pct", "mean")] = df["pct"]["mean"].apply(lambda x: f"{'🔻' if x < 0 else '🔺'}{round(x)}%")
# Format index
df.index = df.index.map(lambda x: f"<code>{x}</code>")
# Break up long lines
df.index = df.index.str.replace(" -d ", "</code><br><code>-d ")
print(df.to_markdown(tablefmt="github"))
df.to_html("bench_checkov.html", border=0, escape=False, justify="left")
EOF

# Fix table's text-alignment
sed -i 's|<th>|<th align="left">|g' bench_checkov.html
sed -i 's|<td>|<td align="left">|g' bench_checkov.html

echo "HTML table was saved to ./bench_checkov.html"

Results

M3 - macOS

Eager Lazy pct
argv mean stddev mean stddev mean
--version 0.999 0.034 0.552 0.013 🔻-45%
--list 2.156 0.068 1.878 0.057 🔻-13%
--framework=openapi
-d tests/openapi/
0.940 0.022 0.597 0.030 🔻-36%
--framework=ansible
-d tests/ansible/examples/
0.941 0.037 0.593 0.018 🔻-37%
--framework=terraform
-d tests/terraform/checks/data
0.953 0.032 0.767 0.015 🔻-20%
-d tests/ 0.989 0.008 0.929 0.022 🔻-6%

4-core/8Gb-memory Gitpod instance

Eager Lazy pct
argv mean stddev mean stddev mean
--version 2.103 0.022 1.242 0.040 🔻-41%
--list 3.643 0.106 3.342 0.051 🔻-8%
--framework=openapi
-d tests/openapi/
2.177 0.065 1.273 0.026 🔻-42%
--framework=ansible
-d tests/ansible/examples/
2.166 0.059 1.245 0.037 🔻-43%
--framework=terraform
-d tests/terraform/checks/data
2.113 0.060 1.759 0.037 🔻-17%
-d tests/ 2.242 0.035 2.112 0.042 🔻-6%
@tpvasconcelos
Copy link
Author

If you feel like the examples I used for the benchmarks are not very representative, let me know and I can run a few more

@tpvasconcelos
Copy link
Author

I have also started working on a similar pull request. See tpvasconcelos#3 for more details.

Updating the results table above with this new improvement, shows the following results on my M3 mac:

Eager Lazy pct
argv mean stddev mean stddev mean
--version 0.999 0.034 0.287 0.002 🔻-71%
--list 2.156 0.068 1.664 0.010 🔻-23%
--framework=openapi
-d tests/openapi/
0.940 0.022 0.303 0.013 🔻-68%
--framework=ansible
-d tests/ansible/examples/
0.941 0.037 0.299 0.002 🔻-68%
--framework=terraform
-d tests/terraform/checks/data
0.953 0.032 0.543 0.010 🔻-43%
-d tests/ 0.989 0.008 0.755 0.005 🔻-24%

@tpvasconcelos
Copy link
Author

If I run the pre-commit hook defined in the Motivation section above in a private repository of mine, it only takes 1.38 seconds when running from this branch. While it takes a 2.77 seconds when running against the latest v3.2.255 release. i.e., -50% improvement

@tpvasconcelos
Copy link
Author

Hey @gruebel @tsmithv11! Apologies for pinging you directly here but I wanted to check whether this is something you'd be interested in for the checkov project.

In short, the patch I'm suggesting makes checkov between 70% faster (when executing simple commands like checkov --help) but also as much as 25% faster when running the much more expensive checkov -d . command.

If this is something that interests your team, I'd be happy to provide more context where needed, run additional benchmarks, and make the PR ready for review.

Thanks in advance! 🚀

@tpvasconcelos tpvasconcelos changed the title Performance improvements by lazy loading checks and runners Making checkov between 70% to 25% faster 🚀 Oct 8, 2024
@bo156
Copy link
Contributor

bo156 commented Nov 10, 2024

@tpvasconcelos First of all, thank you for this amazing suggestion 💯
I completely agree with you that commands like checkov --help shouldn't be affected from the many imports we have to load all python modules.

However, I'm hesitant about your specific PR as it introduces a specific mechanism of lazy loading with references to the python internal mechanisms (like references to the stack-frames).
The main drawbacks from our side for using those mechanisms are:

  1. Adding extra complexity to the project, which requires specific understanding of the importing mechanism.
  2. From my experience, affecting the imports in such a way might cause unintended side effects. For example, I believe this might cause debugging to be harder in checkov as it will add the code for "computing" the lazy imports as part of the stack-frames.

Instead, I think you raised an important point which is the fact that we don't need to import all of the code of the checks for a lot of usages in checkov.
What I suggest, is trying to remove the imports to all of the checks from the relevant __init__ files and test the performance in this case. It might require following the imports in other locations as well to make sure there is no other reference to them.

@tpvasconcelos Is that something you are willing to consider/try to implement?

@bo156 bo156 added enhancement New feature or request contribution requested This is a great feature idea, but we will need a contribution to get it added to Checkov. labels Nov 10, 2024
@tpvasconcelos
Copy link
Author

Hey @bo156 thanks for your response! Just letting you know that I did not forget about this. I simply haven't been able to prioritise going over my PR to check whether I can properly address your concerns. I will try to do this in the coming days/weeks.

In the meantime, do you think you could take a look at:

PR Context
#6738 I have not opened a PR against this repo because my PR currently depends on this one. This should be straightforward to review but please take a look at #6737 for more context.
#6739 This is a one line change that should take you 15 seconds to review 😉

Thanks in advance!

@fatbasstard
Copy link

Looking forward to this, our CheckOV run is now almost taking 9 minutes (on our biggest repo) so would great to see some performance improvements...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution requested This is a great feature idea, but we will need a contribution to get it added to Checkov. enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants