Skip to content

Commit

Permalink
perf: remove tree sitter walking (#1206)
Browse files Browse the repository at this point in the history
* feat: go ast

* wip

* refactor: move over to new tree (scopes not implemented)

* perf: use node for executing rules

* build: update to go 1.21

* refactor: remove per language composition

* refactor: remove old tree package

* test: fixup detector tests

* refactor: move pattern implementation into separate type

* feat: add back dataflow

* fix: fixup a few FIXMEs

* feat: implement cursor scope

* chore: favour golang slices pkg over ssoroka

* feat: aliased nodes

* fix: use correct query param name

* refactor: better name for strict/static cursor scope

* perf: improve cache performance

* refactor: reorganize packages

* fix: crashes

* fix: query performance regression

* perf: store query results on nodes

* perf: cache tweaks

* chore: log tree and query ids for trace

* refactor: add file context

* chore: improve trace output

* fix: don't analyze anonymous nodes

* fix: update correct slice in builder alias method

* fix: fixes from testing rules

* wip: further improvements

* fix: include receiver in .to_ call dataflow

* fix: lookup vars in js call functions

* fix: use name node for import variable

* feat: better javascript spread element support

* feat: more reflexive methods

* fix: variable nodes

* fix: add getBytes reflexive method

* refactor: add comment on memory reuse

* chore: add more trace logging

* refactor: fix some imports

* refactor: use normal slices pkg

* test: fix tests

* refactor: simplify scanner objects

* perf: further performance improvement

* test: fix remaining tests

* refactor: add scanner rule type

* perf: more performance improvements

* fix: allow multiple string detections

* feat: variable imports

* test: add tests for filters

* fix: fixes from testing

* fix: performance regression

* chore: add local scan options to envrc example

* feat: add more java reflexive methods

* fix: ensure worker processes exit if parent dies

* fix: built-in rule filter matches

* fix: add more ruby reflexive methods

* fix: ignoring of minified files

* fix: more ruby reflexive

* fix: typo in rule validation message

* test: update remaining tests

* refactor: fix linting issues

* fix: returning of datatype detections

* fix: make node dump stable

* test: ensures env variables don't conflict with tests

---------

Co-authored-by: Cédric Fabianski <cfabianski@me.com>
  • Loading branch information
didroe and cfabianski authored Sep 14, 2023
1 parent 058d751 commit 5e0772d
Show file tree
Hide file tree
Showing 1,705 changed files with 19,489 additions and 9,888 deletions.
5 changes: 4 additions & 1 deletion .envrc.example
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,7 @@ export REDIS_INIT="true"
export GOOGLE_MAX_ATTEMPT="5"
export BEARER_EXECUTABLE_PATH="./bearer"
export GITHUB_WORKSPACE="/path/to/bearer/project"
export SCAN_DIR=/Users/username/OWASP
export SCAN_DIR=/Users/username/OWASP
export BEARER_DISABLE_DEFAULT_RULES=true
export BEARER_EXTERNAL_RULE_DIR=../bearer-rules/rules
export BEARER_FORCE=true
4 changes: 2 additions & 2 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
pkg/parser/sitter/**/*.c linguist-generated
pkg/parser/sitter/**/*.cc linguist-generated
internal/parser/sitter/**/*.c linguist-generated
internal/parser/sitter/**/*.cc linguist-generated
2 changes: 1 addition & 1 deletion .github/actions/linux-build/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ RUN apt-get update && \
apt-get update && \
apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin git && \
\
wget --output-document=/tmp/go.tar.gz https://go.dev/dl/go1.19.8.linux-amd64.tar.gz && \
wget --output-document=/tmp/go.tar.gz https://go.dev/dl/go1.21.0.linux-amd64.tar.gz && \
tar --extract --gunzip --file=/tmp/go.tar.gz --directory=/usr/local && \
ln -s /usr/local/go/bin/go /usr/local/bin/ && \
\
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/canary.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: 1.19
go-version: 1.21
- name: Setup Gon
run: brew install mitchellh/gon/gon
- name: Import Code-Signing Certificates
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/command_doc_check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: 1.19
go-version: 1.21
- name: Generate command docs
run: go run ./scripts/gen-doc-yaml.go
- name: Check no uncommited changes
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/e2e_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: 1.19
go-version: 1.21
- name: Build binary for integration tests
run: go build -a -o ./bearer ./cmd/bearer/main.go
- name: Run tests
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
steps:
- uses: actions/setup-go@v4
with:
go-version: 1.19
go-version: 1.21
- uses: actions/checkout@v4
- name: golangci-lint
uses: golangci/golangci-lint-action@v3
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: 1.19
go-version: 1.21
- name: Setup Gon
run: brew install mitchellh/gon/gon
- name: Import Code-Signing Certificates
Expand Down
6 changes: 2 additions & 4 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: 1.19
go-version: 1.21
- name: Run package tests
run: go test -v ./pkg/...
- name: Run detector tests
run: go test -v ./new/detector/...
run: go test -v ./internal/...
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Bearer CLI scans your source code for:
* A08: Data Integrity Failures (e.g. Deserialization of Untrusted Data).
* A09: Security Logging and Monitoring Failures (e.g. Insertion of Sensitive Information into Log File).
* A10: Server-Side Request Forgery (SSRF).

*Note: all the rules and their code patterns are accessible through the [documentation](https://docs.bearer.com/reference/rules/).*

* **Privacy risks** with the ability to detect [sensitive data flow](https://docs.bearer.com/explanations/discovery-and-classification/) such as the use of PII, PHI in your app, and [components](https://docs.bearer.com/reference/recipes/) processing sensitive data (e.g. databases like pgSQL, third-party APIs such as OpenAI, Sentry, etc.). This helps generate a [privacy report](https://docs.bearer.com/guides/privacy/) relevant for:
Expand Down Expand Up @@ -133,7 +133,7 @@ curl -sfL https://raw.githubusercontent.com/Bearer/bearer/main/contrib/install.s
<details>
<summary>Docker</summary>

Bearer CLI is also available as a Docker image on [Docker Hub](https://hub.docker.com/r/bearer/bearer) and [ghcr.io](https://github.com/Bearer/bearer/pkgs/container/bearer).
Bearer CLI is also available as a Docker image on [Docker Hub](https://hub.docker.com/r/bearer/bearer) and [ghcr.io](https://github.com/bearer/bearer/internals/container/bearer).

With docker installed, you can run the following command with the appropriate paths in place of the examples.

Expand Down Expand Up @@ -243,9 +243,9 @@ We believe that by linking security issues with a clear business impact and risk
In addition, by being Free and Open, extendable by design, and built with a great developer UX in mind, we bet you will see the difference for yourself.
### What is the privacy scanner?
### What is the privacy scanner?
In addition of detecting security flaws in your code, Bearer CLI allows you to automate the evidence gathering process needed to generate a privacy report for your compliance team.
In addition of detecting security flaws in your code, Bearer CLI allows you to automate the evidence gathering process needed to generate a privacy report for your compliance team.
When you run Bearer CLI on your codebase, it discovers and classifies data by identifying patterns in the source code. Specifically, it looks for data types and matches against them. Most importantly, it never views the actual values—it just can’t—but only the code itself. If you want to learn more, here is the [longer explanation](https://docs.bearer.com/explanations/discovery-and-classification/).
Expand Down
2 changes: 1 addition & 1 deletion api/fetch_ignores.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ package api
import (
"encoding/json"

ignoretypes "github.com/bearer/bearer/pkg/util/ignore/types"
ignoretypes "github.com/bearer/bearer/internal/util/ignore/types"
)

type CloudIgnoreData struct {
Expand Down
4 changes: 2 additions & 2 deletions cmd/bearer/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ package main
import (
"github.com/bearer/bearer/cmd/bearer/build"

"github.com/bearer/bearer/pkg/commands"
"github.com/bearer/bearer/pkg/util/output"
"github.com/bearer/bearer/internal/commands"
"github.com/bearer/bearer/internal/util/output"
)

func main() {
Expand Down
6 changes: 3 additions & 3 deletions docs/_data/datatypes.js
Original file line number Diff line number Diff line change
Expand Up @@ -74,10 +74,10 @@ function sortData(typesFile, catsFile, groupsFile) {
}
// example();
module.exports = async function () {
let dataTypes = await fetchData("../pkg/classification/db/data_types/");
let dataCats = await fetchData("../pkg/classification/db/data_categories/");
let dataTypes = await fetchData("../internal/classification/db/data_types/");
let dataCats = await fetchData("../internal/classification/db/data_categories/");
let groupings = await fetchFile(
"../pkg/classification/db/category_grouping.json"
"../internal/classification/db/category_grouping.json"
);
return sortData(dataTypes, dataCats, groupings);
};
4 changes: 2 additions & 2 deletions docs/_data/recipes.js
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ async function fetchData(dir) {
return {
...data,
id: path.basename(file, ".json"),
source: `/pkg/classification/db/recipes/${file}`,
source: `/internal/classification/db/recipes/${file}`,
};
})
);
Expand All @@ -26,6 +26,6 @@ async function fetchData(dir) {
}
}
module.exports = async function () {
let recipes = await fetchData("../pkg/classification/db/recipes/");
let recipes = await fetchData("../internal/classification/db/recipes/");
return recipes;
};
6 changes: 3 additions & 3 deletions docs/contributing/code.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ If you're interested in contributing code to Bearer CLI, this guide will help yo

## Set up Bearer CLI locally

Bearer CLI is written in [Go](https://www.go.dev), so you'll need golang v1.19 or greater installed. Installation instructions for your architecture can be found at [golang downloads page](https://go.dev/dl/).
Bearer CLI is written in [Go](https://www.go.dev), so you'll need golang v1.21 or greater installed. Installation instructions for your architecture can be found at [golang downloads page](https://go.dev/dl/).

Next, confirm the installation by running the following command:

Expand Down Expand Up @@ -65,13 +65,13 @@ go test ./...
Running classification tests:

```bash
go test ./pkg/classification/... -count=1
go test ./internal/classification/... -count=1
```

Running a single specific test:

```bash
go test -run ^TestSchema$ ./pkg/classification/schema -count=1
go test -run ^TestSchema$ ./internal/classification/schema -count=1
```

### Integration testing
Expand Down
10 changes: 5 additions & 5 deletions docs/contributing/recipes.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@ title: Add or update a recipe

Recipes are part of how Bearer CLI makes connections between your code and other sources. These are things like data stores, APIs, and internal services. They work by providing information about endpoints, API base urls, package information, etc.

Recipes are located at `bearer/pkg/classification/db/recipes/`.
Recipes are located at `bearer/internal/classification/db/recipes/`.

```md
.
pkg/
internal/
│ └ classification/
│ └ db/
│ └ recipes/
Expand All @@ -29,7 +29,7 @@ Each recipe consists of a `JSON` file containing the following properties:
- `package_manager` (string): The package manager that manages the package, such as npm, go, etc.
- `uuid`: A unique identifier to distinguish the recipe from others. See below for [generating a new uuid](#generating-a-uuid).
- `sub_type` (string): The subtype of the earlier `type` property.
- `external_service` subtypes:
- `external_service` subtypes:
- `third_party`
- `data_store` subtypes:
- `database`
Expand All @@ -42,7 +42,7 @@ Each recipe consists of a `JSON` file containing the following properties:
- `internal_service` subtypes:
- `message_bus`

If any of the existing properties and available values don't meet the needs of your new recipe, [open a new issue]({{meta.sourcePath}}/issues/new/choose). You can view the existing recipes [in the GitHub repo]({{meta.sourcePath}}/tree/main/pkg/classification/db/recipes).
If any of the existing properties and available values don't meet the needs of your new recipe, [open a new issue]({{meta.sourcePath}}/issues/new/choose). You can view the existing recipes [in the GitHub repo]({{meta.sourcePath}}/tree/main/internal/classification/db/recipes).

## Generating a UUID

Expand All @@ -62,4 +62,4 @@ uuidgen | tr "[:upper:]" "[:lower:]"

## Commiting the new recipe

To contribute the new recipe to Bearer CLI, refer to the [Contributing Code guide](/contributing/code/).
To contribute the new recipe to Bearer CLI, refer to the [Contributing Code guide](/contributing/code/).
6 changes: 3 additions & 3 deletions docs/explanations/reports.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Bearer CLI can generate various types of reports about your codebase, all from t
- Usage: `bearer scan . --report security`
- Default format: `json`

The security report allows you to quickly see security risks and vulnerabilities found in your codebase using a security [scanner type](/explanations/scanners) (SAST by default).
The security report allows you to quickly see security risks and vulnerabilities found in your codebase using a security [scanner type](/explanations/scanners) (SAST by default).

For each violation, the report includes the affected file and, when possible, the line of code and a snippet of the surrounding code. Here's an excerpt from the security report run on the [OWASP Juice Shop app](https://github.com/juice-shop/juice-shop):

Expand Down Expand Up @@ -111,7 +111,7 @@ By default, Bearer CLI maps all subjects to “User”, but you can override thi
bearer scan . --report=privacy --data-subject-mapping=/path/to/mappings.json
```

The custom map file should follow the format used by [subject_mapping.json]({{meta.sourcePath}}/blob/main/pkg/classification/db/subject_mapping.json). Replace a key’s value with the higher-level subject you’d like to associate it with. Some examples might include Customer, Employee, Client, Patient, etc. Bearer CLI will use your replacement file instead of the default, so make sure to include any and all subjects you want reported.
The custom map file should follow the format used by [subject_mapping.json]({{meta.sourcePath}}/blob/main/internal/classification/db/subject_mapping.json). Replace a key’s value with the higher-level subject you’d like to associate it with. Some examples might include Customer, Employee, Client, Patient, etc. Bearer CLI will use your replacement file instead of the default, so make sure to include any and all subjects you want reported.

## Data Flow Report

Expand Down Expand Up @@ -185,4 +185,4 @@ If we look at the `db/schema.rb` file mentioned in the report, we can see that e

## Next steps

For additional options on generating reports, selecting format types, and writing the output to a file, see the [command reference](/reference/commands/) documentation.
For additional options on generating reports, selecting format types, and writing the output to a file, see the [command reference](/reference/commands/) documentation.
4 changes: 2 additions & 2 deletions docs/explanations/scanners.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Scanner Types

# Scanner Types

Bearer CLI comes with two types of security scanners, SAST (default) and Secrets.
Bearer CLI comes with two types of security scanners, SAST (default) and Secrets.

## SAST Scanner

Expand Down Expand Up @@ -51,7 +51,7 @@ Detected: Password in URL
File: ../../OWASP/NodeGoat/README.md:59
```

You can see a full list of [built-in patterns](https://github.com/Bearer/bearer/blob/main/pkg/detectors/gitleaks/gitlab_config.toml).
You can see a full list of [built-in patterns](https://github.com/Bearer/bearer/blob/main/internal/detectors/gitleaks/gitlab_config.toml).

⚠️ Secret detection patterns are not configurable today. If this is something you'd like to see, please open an [issue](https://github.com/Bearer/bearer/issues).

2 changes: 1 addition & 1 deletion docs/guides/custom-rule.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ the matched code.
- `id`: A unique identifier. Internal rules are named `lang_framework_rule_name`. For rules targeting the language core, `lang` is used instead of a framework name. For example `ruby_lang_logger` and `ruby_rails_logger`. For custom rules, you may consider appending your org name.
- `description`: A brief, one-sentence description of the rule. The best practice is to make this an actionable “rule” phrase, such as “Do X” or “Do not do X in Y”.
- `cwe_id`: The associated list of [CWE](https://cwe.mitre.org/) identifiers. (Optional)
- `associated_recipe`: Links the rule to a [recipe]({{meta.sourcePath}}/tree/main/pkg/classification/db/recipes). Useful for associating a rule with a third party. Example: “Sentry” (Optional)
- `associated_recipe`: Links the rule to a [recipe]({{meta.sourcePath}}/tree/main/internal/classification/db/recipes). Useful for associating a rule with a third party. Example: “Sentry” (Optional)
- `remediation_message`: Used for internal rules, this builds the documentation page for a rule. (Optional)
- `documentation_url`: Used to pass custom documentation URL for the security report. This can be useful for linking to your own internal documentation or policies. By default, all rules in the main repo will automatically generate a link to the rule on [docs.bearer.com](/). (Optional)
- `auxiliary`: Allows you to define helper rules and detectors to make pattern-building more robust. Auxiliary rules contain a unique `id` and their own `patterns` in the same way rules do. You’re unlikely to use this regularly. See the [weak_encryption](https://github.com/Bearer/bearer-rules/blob/main/ruby/lang/weak_encryption.yml) rule for examples. In addition, see our advice on how to avoid [variable joining](#variable-joining) in auxiliary rules. (Optional)
Expand Down
6 changes: 3 additions & 3 deletions docs/guides/privacy.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Bearer CLI's [privacy report type](/explanations/reports/#privacy-report) allows

## Getting started

If you haven't already, install Bearer CLI using the instructions on the [installation page](/reference/installation/) or the [quick start](/quickstart/).
If you haven't already, install Bearer CLI using the instructions on the [installation page](/reference/installation/) or the [quick start](/quickstart/).

To run your first privacy report, navigate to the project root and use the `bearer scan` command with the `--report privacy` flag:

Expand Down Expand Up @@ -110,7 +110,7 @@ This will allow team members to import the report into spreadsheets or their pre

## Subject mapping

Bearer CLI uses "User" as the default data subject. To override this, you can copy the [subject_mapping.json](https://github.com/bearer/bearer/blob/main/pkg/classification/db/subject_mapping.json) and customize it to your needs. Then, use the `--data-subject-mapping` flag to use your mappings instead. This will use your supplied mapping file instead of the default.
Bearer CLI uses "User" as the default data subject. To override this, you can copy the [subject_mapping.json](https://github.com/bearer/bearer/blob/main/internal/classification/db/subject_mapping.json) and customize it to your needs. Then, use the `--data-subject-mapping` flag to use your mappings instead. This will use your supplied mapping file instead of the default.

```bash
bearer scan . --report privacy --data-subject-mapping /path/to/mappings.json
Expand All @@ -120,4 +120,4 @@ This is useful when your team has different terms for data subjects, or multiple

## Next steps

For more ways to make the most of our Bearer CLI, see our guide on [configuring the scan](/guides/configure-scan/) and the [commands reference](/reference/commands/). Need additional help? [Open an issue]({{meta.links.issues}}) or join our [Discord community]({{meta.links.discord}}).
For more ways to make the most of our Bearer CLI, see our guide on [configuring the scan](/guides/configure-scan/) and the [commands reference](/reference/commands/). Need additional help? [Open an issue]({{meta.links.issues}}) or join our [Discord community]({{meta.links.discord}}).
4 changes: 2 additions & 2 deletions docs/reference/commands.njk
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: Commands
layout: layouts/doc.njk
---
{# Welcome :wave:. The content of this page is automatically generated based on Bearer's CLI help files.
They can be found here: https://github.com/Bearer/bearer/tree/main/pkg/commands
They can be found here: https://github.com/Bearer/bearer/tree/main/internal/commands
#}

{% set items = [bearer_scan, bearer_init, bearer_ignore_add, bearer_ignore_show, bearer_ignore_remove, bearer_ignore_pull, bearer_ignore_migrate, bearer_version] %}
Expand Down Expand Up @@ -60,4 +60,4 @@ Bearer CLI offers a number of commands to use and customize the CLI to your need
<p>In addition to the primary <code>{{ item.name | trim }}</code> command, you can also use <code>{{ item.aliases | trim}}</code> in place of it.
</p>
{% endif %}
{% endfor %}
{% endfor %}
6 changes: 3 additions & 3 deletions docs/reference/datatypes.njk
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@ title: Data Types
layout: layouts/doc.njk
---
{# This content of this page is generated. To edit individual types or category text,
edit the files located at https://github.com/Bearer/bearer/tree/main/pkg/classification/db #}
edit the files located at https://github.com/Bearer/bearer/tree/main/internal/classification/db #}
{% renderTemplate "liquid,md",
datatypes %}
# Supported Data Types
Bearer CLI supports {{counts.types}} data types including Personal Data (PD), Sensitive Personal Data, Personally Identifiable Information (PII), and Protected Health Information (PHI).
Bearer CLI supports {{counts.types}} data types including Personal Data (PD), Sensitive Personal Data, Personally Identifiable Information (PII), and Protected Health Information (PHI).

The following is a catagorized list of the supported data types.
{% endrenderTemplate %}
Expand All @@ -22,4 +22,4 @@ The following is a catagorized list of the supported data types.
{% endfor %}
</ul>
{% endfor %}
{% endfor %}
{% endfor %}
2 changes: 1 addition & 1 deletion docs/reference/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ sudo yum -y install bearer

### Docker

Bearer CLI is also available as a Docker image on [Docker Hub](https://hub.docker.com/r/bearer/bearer) and [ghcr.io](https://github.com/Bearer/bearer/pkgs/container/bearer).
Bearer CLI is also available as a Docker image on [Docker Hub](https://hub.docker.com/r/bearer/bearer) and [ghcr.io](https://github.com/bearer/bearer/internals/container/bearer).

With docker installed, you can run the following command with the appropriate paths in place of the examples.

Expand Down
2 changes: 1 addition & 1 deletion e2e/flags/report_flags_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import (
"testing"

"github.com/bearer/bearer/e2e/internal/testhelper"
"github.com/bearer/bearer/pkg/util/tmpfile"
"github.com/bearer/bearer/internal/util/tmpfile"
"github.com/bradleyjkemp/cupaloy"
)

Expand Down
Loading

0 comments on commit 5e0772d

Please sign in to comment.