Skip to content

Commit

Permalink
feat(fs): use git commit hash as cache key for clean repositories (#8278
Browse files Browse the repository at this point in the history
)

Signed-off-by: knqyf263 <knqyf263@gmail.com>
  • Loading branch information
knqyf263 authored Jan 27, 2025
1 parent aec8885 commit b5062f3
Show file tree
Hide file tree
Showing 18 changed files with 345 additions and 98 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ thumbs.db
coverage.txt
integration/testdata/fixtures/images
integration/testdata/fixtures/vm-images
internal/gittest/testdata/test-repo

# SBOMs generated during CI
/bom.json
Expand Down
6 changes: 2 additions & 4 deletions docs/docs/configuration/cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,7 @@ It supports three types of backends for this cache:
- TTL can be configured via `--cache-ttl`

### Local File System
The local file system backend is the default choice for container and VM image scans.
When scanning container images, it stores analysis results on a per-layer basis, using layer IDs as keys.
This approach enables faster scans of the same container image or different images that share layers.
The local file system backend is the default choice for container image, VM image and repository scans.

!!! note
Internally, this backend uses [BoltDB][boltdb], which has an important limitation: only one process can access the cache at a time.
Expand All @@ -63,7 +61,7 @@ This approach enables faster scans of the same container image or different imag
### Memory
The memory backend stores analysis results in memory, which means the cache is discarded when the process ends.
This makes it useful in scenarios where caching is not required or desired.
It serves as the default for repository, filesystem and SBOM scans and can also be employed for container image scans when caching is unnecessary.
It serves as the default for filesystem and SBOM scans and can also be employed for container image scans when caching is unnecessary.

To use the memory backend for a container image scan, you can use the following command:

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/references/configuration/cli/trivy_repository.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ trivy repository [flags] (REPO_PATH | REPO_URL)

```
--branch string pass the branch name to be scanned
--cache-backend string [EXPERIMENTAL] cache backend (e.g. redis://localhost:6379) (default "memory")
--cache-backend string [EXPERIMENTAL] cache backend (e.g. redis://localhost:6379) (default "fs")
--cache-ttl duration cache TTL when using redis as cache backend
--cf-params strings specify paths to override the CloudFormation parameters files
--check-namespaces strings Rego namespaces
Expand Down
6 changes: 6 additions & 0 deletions docs/docs/target/container_image.md
Original file line number Diff line number Diff line change
Expand Up @@ -463,6 +463,12 @@ trivy image --compliance docker-cis-1.6.0 [YOUR_IMAGE_NAME]
## Authentication
Please reference [this page](../advanced/private-registries/index.md).

## Scan Cache
When scanning container images, it stores analysis results in the cache, using the image ID and the layer IDs as the key.
This approach enables faster scans of the same container image or different images that share layers.

More details are available in the [cache documentation](../configuration/cache.md#scan-cache-backend).

## Options
### Scan Image on a specific Architecture and OS
By default, Trivy loads an image on a "linux/amd64" machine.
Expand Down
10 changes: 10 additions & 0 deletions docs/docs/target/filesystem.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,3 +91,13 @@ $ trivy fs --scanners license /path/to/project
## SBOM generation
Trivy can generate SBOM for local projects.
See [here](../supply-chain/sbom.md) for the detail.

## Scan Cache
When scanning local projects, it doesn't use the cache by default.
However, when the local project is a git repository with clean status and the cache backend other than the memory one is enabled, it stores analysis results, using the latest commit hash as the key.

```shell
$ trivy fs --cache-backend fs /path/to/git/repo
```

More details are available in the [cache documentation](../configuration/cache.md#scan-cache-backend).
6 changes: 6 additions & 0 deletions docs/docs/target/repository.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,12 @@ $ trivy repo --scanners license (REPO_PATH | REPO_URL)
Trivy can generate SBOM for code repositories.
See [here](../supply-chain/sbom.md) for the detail.

## Scan Cache
When scanning git repositories, it stores analysis results in the cache, using the latest commit hash as the key.
Note that the cache is not used when the repository is dirty, otherwise Trivy will miss the files that are not committed.

More details are available in the [cache documentation](../configuration/cache.md#scan-cache-backend).

## References
The following flags and environmental variables are available for remote git repositories.

Expand Down
8 changes: 8 additions & 0 deletions docs/docs/target/vm.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,14 @@ $ trivy vm --scanners license [YOUR_VM_IMAGE]
Trivy can generate SBOM for VM images.
See [here](../supply-chain/sbom.md) for the detail.

## Scan Cache
When scanning AMI or EBS snapshots, it stores analysis results in the cache, using the snapshot ID.
Scanning the same snapshot several times skips analysis if the cache is already available.

When scanning local files, it doesn't use the cache by default.

More details are available in the [cache documentation](../configuration/cache.md#scan-cache-backend).

## Supported Architectures

### Virtual machine images
Expand Down
47 changes: 47 additions & 0 deletions internal/gittest/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ package gittest
import (
"errors"
"net/http/httptest"
"os"
"path/filepath"
"runtime"
"testing"
"time"

Expand Down Expand Up @@ -59,6 +61,51 @@ func NewServer(t *testing.T, repo, dir string) *httptest.Server {
return httptest.NewServer(service)
}

// NewServerWithRepository creates a git server with an existing repository
func NewServerWithRepository(t *testing.T, repo, dir string) *httptest.Server {
// Create a bare repository
bareDir := t.TempDir()
gitDir := filepath.Join(bareDir, repo+".git")

// Clone the existing repository as a bare repository
r, err := git.PlainClone(gitDir, true, &git.CloneOptions{
URL: dir,
Tags: git.AllTags,
})
require.NoError(t, err)

// Fetch all remote branches and create local branches
err = r.Fetch(&git.FetchOptions{
RefSpecs: []config.RefSpec{
"+refs/remotes/origin/*:refs/heads/*",
},
Tags: git.AllTags,
})
if err != nil && !errors.Is(err, git.NoErrAlreadyUpToDate) {
require.NoError(t, err)
}

// Set up a git server
service := gitkit.New(gitkit.Config{Dir: bareDir})
err = service.Setup()
require.NoError(t, err)

return httptest.NewServer(service)
}

// NewTestServer creates a git server with the local copy of "github.com/aquasecurity/trivy-test-repo".
// If the test repository doesn't exist, it suggests running 'mage test:unit'.
func NewTestServer(t *testing.T) *httptest.Server {
_, filePath, _, _ := runtime.Caller(0)
dir := filepath.Join(filepath.Dir(filePath), "testdata", "test-repo")

if _, err := os.Stat(dir); os.IsNotExist(err) {
require.Fail(t, "test-repo not found. Please run 'mage test:unit' to set up the test fixtures")
}

return NewServerWithRepository(t, "test-repo", dir)
}

func Clone(t *testing.T, ts *httptest.Server, repo, worktree string) *git.Repository {
cloneOptions := git.CloneOptions{
URL: ts.URL + "/" + repo + ".git",
Expand Down
43 changes: 43 additions & 0 deletions internal/gittest/testdata/fixture.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
package gittest

import (
"log/slog"
"path/filepath"
"runtime"

"github.com/go-git/go-git/v5"
"github.com/magefile/mage/target"
"golang.org/x/xerrors"
)

const (
repoURL = "https://github.com/aquasecurity/trivy-test-repo/"
repoDir = "test-repo" // subdirectory for the cloned repository
)

// Fixtures clones a Git repository for unit tests
func Fixtures() error {
_, filePath, _, _ := runtime.Caller(0)
dir := filepath.Dir(filePath)
cloneDir := filepath.Join(dir, repoDir)

// Check if the directory already exists and is up to date
if updated, err := target.Path(cloneDir, filePath); err != nil {
return err
} else if !updated {
return nil
}

slog.Info("Cloning...", slog.String("url", repoURL))

// Clone the repository with all branches and tags
_, err := git.PlainClone(cloneDir, false, &git.CloneOptions{
URL: repoURL,
Tags: git.AllTags,
})
if err != nil {
return xerrors.Errorf("error cloning repository: %w", err)
}

return nil
}
8 changes: 5 additions & 3 deletions magefiles/magefile.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,12 @@ import (
"github.com/magefile/mage/sh"
"github.com/magefile/mage/target"

//mage:import rpm
rpm "github.com/aquasecurity/trivy/pkg/fanal/analyzer/pkg/rpm/testdata"
// Trivy packages should not be imported in Mage (see https://github.com/aquasecurity/trivy/pull/4242),
// but this package doesn't have so many dependencies, and Mage is still fast.
//mage:import gittest
gittest "github.com/aquasecurity/trivy/internal/gittest/testdata"
//mage:import rpm
rpm "github.com/aquasecurity/trivy/pkg/fanal/analyzer/pkg/rpm/testdata"
"github.com/aquasecurity/trivy/pkg/log"
)

Expand Down Expand Up @@ -286,7 +288,7 @@ func compileWasmModules(pattern string) error {

// Unit runs unit tests
func (t Test) Unit() error {
mg.Deps(t.GenerateModules, rpm.Fixtures)
mg.Deps(t.GenerateModules, rpm.Fixtures, gittest.Fixtures)
return sh.RunWithV(ENV, "go", "test", "-v", "-short", "-coverprofile=coverage.txt", "-covermode=atomic", "./...")
}

Expand Down
2 changes: 0 additions & 2 deletions pkg/commands/app.go
Original file line number Diff line number Diff line change
Expand Up @@ -478,8 +478,6 @@ func NewRepositoryCommand(globalFlags *flag.GlobalFlagGroup) *cobra.Command {

repoFlags.ScanFlagGroup.DistroFlag = nil // `repo` subcommand doesn't support scanning OS packages, so we can disable `--distro`

repoFlags.CacheFlagGroup.CacheBackend.Default = string(cache.TypeMemory) // Use memory cache by default

cmd := &cobra.Command{
Use: "repository [flags] (REPO_PATH | REPO_URL)",
Aliases: []string{"repo"},
Expand Down
5 changes: 5 additions & 0 deletions pkg/fanal/artifact/artifact.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import (
)

type Option struct {
Type Type
AnalyzerGroup analyzer.Group // It is empty in OSS
DisabledAnalyzers []analyzer.Type
DisabledHandlers []types.HandlerType
Expand All @@ -30,6 +31,10 @@ type Option struct {
FileChecksum bool // For SPDX
DetectionPriority types.DetectionPriority

// Original is the original target location, e.g. "github.com/aquasecurity/trivy"
// Currently, it is used only for remote git repositories
Original string

// Git repositories
RepoBranch string
RepoCommit string
Expand Down
Loading

0 comments on commit b5062f3

Please sign in to comment.