Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Performance Regression Testing [Rust] #3602

Merged
merged 81 commits into from
Aug 11, 2021
Merged
Show file tree
Hide file tree
Changes from 76 commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
9f62ec2
add rust performance runner
Jul 20, 2021
9e3da39
draft of regression testing workflow
Jul 20, 2021
ee58d27
use a profiles.yml file
Jul 21, 2021
d0773f3
warmup fs caches
Jul 21, 2021
4ff3f6d
moved config dir out of projects dir
Jul 21, 2021
75d3d87
fix warmup syntax
Jul 21, 2021
1d7e834
draft of comparison
Jul 22, 2021
a29367b
read files and parse json
Jul 22, 2021
108b55b
add regression type
Jul 22, 2021
c0e2023
update profiles dir path
Jul 22, 2021
34e2c4f
fix results output dir
Jul 22, 2021
a45c9d0
add new arg for runner for branch name
Jul 22, 2021
4c06689
create a tuple of measurements to group results by
Jul 22, 2021
c8bc25d
add struct
Jul 22, 2021
37b31d1
add derived traits to structs
Jul 27, 2021
4ddba7e
--wip--
Jul 27, 2021
45bb955
still wip but compiles
Jul 28, 2021
d5662ef
wrap up comparsion logic
Aug 2, 2021
80951ae
fmt
Aug 2, 2021
5466d47
add test for exception display
Aug 2, 2021
fa78102
add comparison to workflow
Aug 2, 2021
8a5e9b7
add some output to comparitor
Aug 2, 2021
429396a
move step dependencies around
Aug 2, 2021
f4775d7
change job name
Aug 2, 2021
6ba837d
fix download artifact name
Aug 2, 2021
3d4a82c
add comments
Aug 2, 2021
2f7ab2d
unify rust apps as subcommands
Aug 3, 2021
868fd64
download correct artifact
Aug 3, 2021
95116db
fix measure call
Aug 3, 2021
f1f99a2
add name to action
Aug 3, 2021
03332b2
more gracefully exit than unwrapping
Aug 3, 2021
b02875a
add debug line
Aug 3, 2021
c182c05
error when there are no results to process
Aug 3, 2021
37e8625
point to the downloaded artifacts correctly
Aug 3, 2021
80244a0
add error for groups that are not two elements large
Aug 3, 2021
faff8c0
group by run only
Aug 3, 2021
757614d
add stddev regression
Aug 3, 2021
ed91ded
branches named dev and baseline
Aug 3, 2021
fe24dd4
return all calculations not just regressions
Aug 3, 2021
873e971
move io operations to main
Aug 3, 2021
3004969
move measure io to main
Aug 3, 2021
4e020c3
enforce branch names at calculation time
Aug 3, 2021
5891b59
better variable names
Aug 3, 2021
935c138
type gymnastics
Aug 3, 2021
64bf9c8
test fix
Aug 3, 2021
252280b
write out results as artifact
Aug 3, 2021
bad0198
minor readme changes
Aug 3, 2021
580b1fd
minor print change
Aug 3, 2021
1f189f5
added small paragraph to readme
Aug 3, 2021
486afa9
upload results even when regressions are detected
Aug 3, 2021
ba9d76b
make final json human readable
Aug 4, 2021
287c4d2
revamp exception hierarchy
Aug 4, 2021
2f9907b
remove one more unwrap
Aug 4, 2021
3c8daac
refactor for simpler flow
Aug 4, 2021
94c6cf1
remove some clones
Aug 4, 2021
2782a33
minor refactor
Aug 4, 2021
b77eff8
errors to warnings in ci
Aug 4, 2021
0b4689f
address PR feedback
Aug 4, 2021
4b1c6b5
fix spacing
Aug 4, 2021
fe0b9e7
refactor to use references better
Aug 4, 2021
02007b3
more reference handling
Aug 4, 2021
751ea92
make the happy path use references and clone in the exception paths
Aug 4, 2021
44a8f6a
more reference improvements
Aug 4, 2021
02839ec
iter instead of into_iter
Aug 4, 2021
88acf07
remove clone
Aug 4, 2021
2d3d1b0
rename to dummy project
Aug 5, 2021
6c8de62
up stddev threshold
Aug 5, 2021
d9b02fb
add lots of comments
Aug 5, 2021
52a0fde
fmt
Aug 5, 2021
3d9e54d
add fmt check
Aug 5, 2021
5f9ed1a
run twice a day
Aug 5, 2021
3e1e171
add test for regression calculation
Aug 5, 2021
2068dd5
fmt
Aug 5, 2021
ef63319
add comment
Aug 5, 2021
8707270
point to manifest
Aug 5, 2021
06cc0c5
changelog
Aug 5, 2021
b90b3a9
avoid abandoned destructors by refactoring usage of exit
Aug 6, 2021
cd6894a
add to future work
Aug 9, 2021
355b0c4
remove unnecessary newline printing
Aug 9, 2021
8609c02
minor change
Aug 9, 2021
1fe5375
add more future work
Aug 9, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
181 changes: 181 additions & 0 deletions .github/workflows/performance.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@

name: Performance Regression Testing
# Schedule triggers
on:
# TODO this is just while developing
pull_request:
branches:
- 'develop'
- 'performance-regression-testing'
schedule:
# runs twice a day at 10:05am and 10:05pm
- cron: '5 10,22 * * *'
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

jobs:

# checks fmt of runner code
# purposefully not a dependency of any other job
# will block merging, but not prevent developing
fmt:
name: Cargo fmt
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- run: rustup component add rustfmt
- uses: actions-rs/cargo@v1
with:
command: fmt
args: --manifest-path performance/runner/Cargo.toml --all -- --check

# runs any tests associated with the runner
# these tests make sure the runner logic is correct
test-runner:
name: Test Runner
runs-on: ubuntu-latest
env:
# turns errors into warnings
RUSTFLAGS: "-D warnings"
steps:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- uses: actions-rs/cargo@v1
with:
command: test
args: --manifest-path performance/runner/Cargo.toml

# build an optimized binary to be used as the runner in later steps
build-runner:
needs: [test-runner]
name: Build Runner
runs-on: ubuntu-latest
env:
RUSTFLAGS: "-D warnings"
steps:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- uses: actions-rs/cargo@v1
with:
command: build
args: --release --manifest-path performance/runner/Cargo.toml
- uses: actions/upload-artifact@v2
with:
name: runner
path: performance/runner/target/release/runner

# run the performance measurements on the current or default branch
measure-dev:
needs: [build-runner]
name: Measure Dev Branch
runs-on: ubuntu-latest
steps:
- name: checkout dev
uses: actions/checkout@v2
- name: Setup Python
uses: actions/setup-python@v2.2.2
with:
python-version: '3.8'
- name: install dbt
run: pip install -r dev-requirements.txt -r editable-requirements.txt
- name: install hyperfine
run: wget https://github.com/sharkdp/hyperfine/releases/download/v1.11.0/hyperfine_1.11.0_amd64.deb && sudo dpkg -i hyperfine_1.11.0_amd64.deb
- uses: actions/download-artifact@v2
with:
name: runner
- name: change permissions
run: chmod +x ./runner
- name: run
run: ./runner measure -b dev -p ${{ github.workspace }}/performance/projects/
- uses: actions/upload-artifact@v2
with:
name: dev-results
path: performance/results/

# run the performance measurements on the release branch which we use
# as a performance baseline. This part takes by far the longest, so
# we do everything we can first so the job fails fast.
# -----
# we need to checkout dbt twice in this job: once for the baseline dbt
# version, and once to get the latest regression testing projects,
# metrics, and runner code from the develop or current branch so that
# the calculations match for both versions of dbt we are comparing.
measure-baseline:
needs: [build-runner]
name: Measure Baseline Branch
runs-on: ubuntu-latest
steps:
- name: checkout latest
uses: actions/checkout@v2
with:
ref: '0.20.latest'
- name: Setup Python
uses: actions/setup-python@v2.2.2
with:
python-version: '3.8'
- name: move repo up a level
run: mkdir ${{ github.workspace }}/../baseline/ && cp -r ${{ github.workspace }} ${{ github.workspace }}/../baseline
- name: "[debug] ls new dbt location"
run: ls ${{ github.workspace }}/../baseline/dbt/
# installation creates egg-links so we have to preserve source
- name: install dbt from new location
run: cd ${{ github.workspace }}/../baseline/dbt/ && pip install -r dev-requirements.txt -r editable-requirements.txt
# checkout the current branch to get all the target projects
# this deletes the old checked out code which is why we had to copy before
- name: checkout dev
uses: actions/checkout@v2
- name: install hyperfine
run: wget https://github.com/sharkdp/hyperfine/releases/download/v1.11.0/hyperfine_1.11.0_amd64.deb && sudo dpkg -i hyperfine_1.11.0_amd64.deb
- uses: actions/download-artifact@v2
with:
name: runner
- name: change permissions
run: chmod +x ./runner
- name: run runner
run: ./runner measure -b baseline -p ${{ github.workspace }}/performance/projects/
- uses: actions/upload-artifact@v2
with:
name: baseline-results
path: performance/results/

# detect regressions on the output generated from measuring
# the two branches. Exits with non-zero code if a regression is detected.
calculate-regressions:
needs: [measure-dev, measure-baseline]
name: Compare Results
runs-on: ubuntu-latest
steps:
- uses: actions/download-artifact@v2
with:
name: dev-results
- uses: actions/download-artifact@v2
with:
name: baseline-results
- name: "[debug] ls result files"
run: ls
- uses: actions/download-artifact@v2
with:
name: runner
- name: change permissions
run: chmod +x ./runner
- name: run calculation
run: ./runner calculate -r ./
# always attempt to upload the results even if there were regressions found
- uses: actions/upload-artifact@v2
if: ${{ always() }}
with:
name: final-calculations
path: ./final_calculations.json
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
- Handle whitespace after a plus sign on the project config ([#3526](https://github.com/dbt-labs/dbt/pull/3526))

### Under the hood
- Add performance regression testing [#3602](https://github.com/dbt-labs/dbt/pull/3602)
- Improve default view and table materialization performance by checking relational cache before attempting to drop temp relations ([#3112](https://github.com/fishtown-analytics/dbt/issues/3112), [#3468](https://github.com/fishtown-analytics/dbt/pull/3468))
- Add optional `sslcert`, `sslkey`, and `sslrootcert` profile arguments to the Postgres connector. ([#3472](https://github.com/fishtown-analytics/dbt/pull/3472), [#3473](https://github.com/fishtown-analytics/dbt/pull/3473))
- Move the example project used by `dbt init` into `dbt` repository, to avoid cloning an external repo ([#3005](https://github.com/fishtown-analytics/dbt/pull/3005), [#3474](https://github.com/fishtown-analytics/dbt/pull/3474), [#3536](https://github.com/fishtown-analytics/dbt/pull/3536))
Expand Down
16 changes: 16 additions & 0 deletions performance/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Performance Regression Testing
This directory includes dbt project setups to test on and a test runner written in Rust which runs specific dbt commands on each of the projects. Orchestration is done via the GitHub Action workflow in `/.github/workflows/performance.yml`. The workflow is scheduled to run every night, but it can also be triggered manually.

The github workflow hardcodes our baseline branch for performance metrics as `0.20.latest`. As future versions become faster, this branch will be updated to hold us to those new standards.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need some way to keep this up to date. We already have 0.21.latest, though there hasn't yet been a final release from that branch.

I'm wondering if it would be preferable to install the latest stable release of dbt, rather than checking out a specific branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we could do that instead. Definitely less manual than this approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should switch to this approach after a version or two so we know we haven't introduced any slippage in the metrics we're measuring. 🤔


## Adding a new dbt project
Just make a new directory under `performance/projects/`. It will automatically be picked up by the tests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we really want to check in the thousands of files associated with the performance tests? Could we put the projects we want to test into separate repos?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. we absolutely could do that. The advantage would be that we don't have to download all those files when we checkout dbt, but it's also another repository to manage. I just measured that a dbt project with 2k files is roughly 6.3k to download which I don't consider a deal breaker. But I'm honestly open to both approaches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this offers general support for performance-testing any project. I'm thinking about specific projects that we've known to parse slowly, and how great it would be to stress-test a putative improvement in a more-rigorous manner.

As far as running this nightly: If we're going to use generated projects, I'd be open to including the project-generation code in this repo, and running the project generation as a step in the performance workflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtcohen6 that is a good next step, however most places that do this kind of performance work (and publish it) usually check in projects to prove the metrics aren't changing because the project is changing. We could go against the grain on that one, but I'm inclined to check it in if it doesn't cause any tangible issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok cool, I buy that (and remember you having told me this). I'm all for doing this in a way that follows existing practice.


## Adding a new dbt command
In `runner/src/measure.rs::measure` add a metric to the `metrics` Vec. The Github Action will handle recompilation if you don't have the rust toolchain installed.

## Future work
- add more projects to test different configurations that have been known bottlenecks
- add more dbt commands to measure
- possibly using the uploaded json artifacts to store these results so they can be graphed over time
- reading new metrics from a file so no one has to edit rust source to add them to the suite
1 change: 1 addition & 0 deletions performance/project_config/.user.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
id: 5d0c160e-f817-4b77-bce3-ffb2e37f0c9b
12 changes: 12 additions & 0 deletions performance/project_config/profiles.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
default:
target: dev
outputs:
dev:
type: postgres
host: localhost
user: dummy
password: dummy_password
port: 5432
dbname: dummy
schema: dummy
threads: 4
38 changes: 38 additions & 0 deletions performance/projects/01_dummy_project/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@

# Name your package! Package names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'my_new_package'
version: 1.0.0
config-version: 2

# This setting configures which "profile" dbt uses for this project. Profiles contain
# database connection information, and should be configured in the ~/.dbt/profiles.yml file
profile: 'default'

# These configurations specify where dbt should look for different types of files.
# The `source-paths` config, for example, states that source models can be found
# in the "models/" directory. You probably won't need to change these!
source-paths: ["models"]
analysis-paths: ["analysis"]
test-paths: ["tests"]
data-paths: ["data"]
macro-paths: ["macros"]

target-path: "target" # directory which will store compiled SQL files
clean-targets: # directories to be removed by `dbt clean`
- "target"
- "dbt_modules"

# You can define configurations for models in the `source-paths` directory here.
# Using these configurations, you can enable or disable models, change how they
# are materialized, and more!

# In this example config, we tell dbt to build all models in the example/ directory
# as views (the default). These settings can be overridden in the individual model files
# using the `{{ config(...) }}` macro.
models:
my_new_package:
# Applies to all files under models/example/
example:
materialized: view
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
select 1 as id
11 changes: 11 additions & 0 deletions performance/projects/01_dummy_project/models/path_0/node_0.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
models:
- columns:
- name: id
tests:
- unique
- not_null
- relationships:
field: id
to: node_0
name: node_0
version: 2
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
select 1 as id
union all
select * from {{ ref('node_0') }}
11 changes: 11 additions & 0 deletions performance/projects/01_dummy_project/models/path_0/node_1.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
models:
- columns:
- name: id
tests:
- unique
- not_null
- relationships:
field: id
to: node_0
name: node_1
version: 2
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
select 1 as id
union all
select * from {{ ref('node_0') }}
11 changes: 11 additions & 0 deletions performance/projects/01_dummy_project/models/path_0/node_2.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
models:
- columns:
- name: id
tests:
- unique
- not_null
- relationships:
field: id
to: node_0
name: node_2
version: 2
38 changes: 38 additions & 0 deletions performance/projects/02_dummy_project/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@

# Name your package! Package names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'my_new_package'
version: 1.0.0
config-version: 2

# This setting configures which "profile" dbt uses for this project. Profiles contain
# database connection information, and should be configured in the ~/.dbt/profiles.yml file
profile: 'default'

# These configurations specify where dbt should look for different types of files.
# The `source-paths` config, for example, states that source models can be found
# in the "models/" directory. You probably won't need to change these!
source-paths: ["models"]
analysis-paths: ["analysis"]
test-paths: ["tests"]
data-paths: ["data"]
macro-paths: ["macros"]

target-path: "target" # directory which will store compiled SQL files
clean-targets: # directories to be removed by `dbt clean`
- "target"
- "dbt_modules"

# You can define configurations for models in the `source-paths` directory here.
# Using these configurations, you can enable or disable models, change how they
# are materialized, and more!

# In this example config, we tell dbt to build all models in the example/ directory
# as views (the default). These settings can be overridden in the individual model files
# using the `{{ config(...) }}` macro.
models:
my_new_package:
# Applies to all files under models/example/
example:
materialized: view
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
select 1 as id
11 changes: 11 additions & 0 deletions performance/projects/02_dummy_project/models/path_0/node_0.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
models:
- columns:
- name: id
tests:
- unique
- not_null
- relationships:
field: id
to: node_0
name: node_0
version: 2
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
select 1 as id
union all
select * from {{ ref('node_0') }}
11 changes: 11 additions & 0 deletions performance/projects/02_dummy_project/models/path_0/node_1.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
models:
- columns:
- name: id
tests:
- unique
- not_null
- relationships:
field: id
to: node_0
name: node_1
version: 2
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
select 1 as id
union all
select * from {{ ref('node_0') }}
11 changes: 11 additions & 0 deletions performance/projects/02_dummy_project/models/path_0/node_2.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
models:
- columns:
- name: id
tests:
- unique
- not_null
- relationships:
field: id
to: node_0
name: node_2
version: 2
Loading