-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Performance Regression Testing [Rust] #3602
Changes from 76 commits
9f62ec2
9e3da39
ee58d27
d0773f3
4ff3f6d
75d3d87
1d7e834
a29367b
108b55b
c0e2023
34e2c4f
a45c9d0
4c06689
c8bc25d
37b31d1
4ddba7e
45bb955
d5662ef
80951ae
5466d47
fa78102
8a5e9b7
429396a
f4775d7
6ba837d
3d4a82c
2f7ab2d
868fd64
95116db
f1f99a2
03332b2
b02875a
c182c05
37e8625
80244a0
faff8c0
757614d
ed91ded
fe24dd4
873e971
3004969
4e020c3
5891b59
935c138
64bf9c8
252280b
bad0198
580b1fd
1f189f5
486afa9
ba9d76b
287c4d2
2f9907b
3c8daac
94c6cf1
2782a33
b77eff8
0b4689f
4b1c6b5
fe0b9e7
02007b3
751ea92
44a8f6a
02839ec
88acf07
2d3d1b0
6c8de62
d9b02fb
52a0fde
3d9e54d
5f9ed1a
3e1e171
2068dd5
ef63319
8707270
06cc0c5
b90b3a9
cd6894a
355b0c4
8609c02
1fe5375
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
|
||
name: Performance Regression Testing | ||
# Schedule triggers | ||
on: | ||
# TODO this is just while developing | ||
pull_request: | ||
branches: | ||
- 'develop' | ||
- 'performance-regression-testing' | ||
schedule: | ||
# runs twice a day at 10:05am and 10:05pm | ||
- cron: '5 10,22 * * *' | ||
# Allows you to run this workflow manually from the Actions tab | ||
workflow_dispatch: | ||
|
||
jobs: | ||
|
||
# checks fmt of runner code | ||
# purposefully not a dependency of any other job | ||
# will block merging, but not prevent developing | ||
fmt: | ||
name: Cargo fmt | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v2 | ||
- uses: actions-rs/toolchain@v1 | ||
with: | ||
profile: minimal | ||
toolchain: stable | ||
override: true | ||
- run: rustup component add rustfmt | ||
- uses: actions-rs/cargo@v1 | ||
with: | ||
command: fmt | ||
args: --manifest-path performance/runner/Cargo.toml --all -- --check | ||
|
||
# runs any tests associated with the runner | ||
# these tests make sure the runner logic is correct | ||
test-runner: | ||
name: Test Runner | ||
runs-on: ubuntu-latest | ||
env: | ||
# turns errors into warnings | ||
RUSTFLAGS: "-D warnings" | ||
steps: | ||
- uses: actions/checkout@v2 | ||
- uses: actions-rs/toolchain@v1 | ||
with: | ||
profile: minimal | ||
toolchain: stable | ||
override: true | ||
- uses: actions-rs/cargo@v1 | ||
with: | ||
command: test | ||
args: --manifest-path performance/runner/Cargo.toml | ||
|
||
# build an optimized binary to be used as the runner in later steps | ||
build-runner: | ||
needs: [test-runner] | ||
name: Build Runner | ||
runs-on: ubuntu-latest | ||
env: | ||
RUSTFLAGS: "-D warnings" | ||
steps: | ||
- uses: actions/checkout@v2 | ||
- uses: actions-rs/toolchain@v1 | ||
with: | ||
profile: minimal | ||
toolchain: stable | ||
override: true | ||
- uses: actions-rs/cargo@v1 | ||
with: | ||
command: build | ||
args: --release --manifest-path performance/runner/Cargo.toml | ||
- uses: actions/upload-artifact@v2 | ||
with: | ||
name: runner | ||
path: performance/runner/target/release/runner | ||
|
||
# run the performance measurements on the current or default branch | ||
measure-dev: | ||
needs: [build-runner] | ||
name: Measure Dev Branch | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: checkout dev | ||
uses: actions/checkout@v2 | ||
- name: Setup Python | ||
uses: actions/setup-python@v2.2.2 | ||
with: | ||
python-version: '3.8' | ||
- name: install dbt | ||
run: pip install -r dev-requirements.txt -r editable-requirements.txt | ||
- name: install hyperfine | ||
run: wget https://github.com/sharkdp/hyperfine/releases/download/v1.11.0/hyperfine_1.11.0_amd64.deb && sudo dpkg -i hyperfine_1.11.0_amd64.deb | ||
- uses: actions/download-artifact@v2 | ||
with: | ||
name: runner | ||
- name: change permissions | ||
run: chmod +x ./runner | ||
- name: run | ||
run: ./runner measure -b dev -p ${{ github.workspace }}/performance/projects/ | ||
- uses: actions/upload-artifact@v2 | ||
with: | ||
name: dev-results | ||
path: performance/results/ | ||
|
||
# run the performance measurements on the release branch which we use | ||
# as a performance baseline. This part takes by far the longest, so | ||
# we do everything we can first so the job fails fast. | ||
# ----- | ||
# we need to checkout dbt twice in this job: once for the baseline dbt | ||
# version, and once to get the latest regression testing projects, | ||
# metrics, and runner code from the develop or current branch so that | ||
# the calculations match for both versions of dbt we are comparing. | ||
measure-baseline: | ||
needs: [build-runner] | ||
name: Measure Baseline Branch | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: checkout latest | ||
uses: actions/checkout@v2 | ||
with: | ||
ref: '0.20.latest' | ||
- name: Setup Python | ||
uses: actions/setup-python@v2.2.2 | ||
with: | ||
python-version: '3.8' | ||
- name: move repo up a level | ||
run: mkdir ${{ github.workspace }}/../baseline/ && cp -r ${{ github.workspace }} ${{ github.workspace }}/../baseline | ||
- name: "[debug] ls new dbt location" | ||
run: ls ${{ github.workspace }}/../baseline/dbt/ | ||
# installation creates egg-links so we have to preserve source | ||
- name: install dbt from new location | ||
run: cd ${{ github.workspace }}/../baseline/dbt/ && pip install -r dev-requirements.txt -r editable-requirements.txt | ||
# checkout the current branch to get all the target projects | ||
# this deletes the old checked out code which is why we had to copy before | ||
- name: checkout dev | ||
uses: actions/checkout@v2 | ||
- name: install hyperfine | ||
run: wget https://github.com/sharkdp/hyperfine/releases/download/v1.11.0/hyperfine_1.11.0_amd64.deb && sudo dpkg -i hyperfine_1.11.0_amd64.deb | ||
- uses: actions/download-artifact@v2 | ||
with: | ||
name: runner | ||
- name: change permissions | ||
run: chmod +x ./runner | ||
- name: run runner | ||
run: ./runner measure -b baseline -p ${{ github.workspace }}/performance/projects/ | ||
- uses: actions/upload-artifact@v2 | ||
with: | ||
name: baseline-results | ||
path: performance/results/ | ||
|
||
# detect regressions on the output generated from measuring | ||
# the two branches. Exits with non-zero code if a regression is detected. | ||
calculate-regressions: | ||
needs: [measure-dev, measure-baseline] | ||
name: Compare Results | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/download-artifact@v2 | ||
with: | ||
name: dev-results | ||
- uses: actions/download-artifact@v2 | ||
with: | ||
name: baseline-results | ||
- name: "[debug] ls result files" | ||
run: ls | ||
- uses: actions/download-artifact@v2 | ||
with: | ||
name: runner | ||
- name: change permissions | ||
run: chmod +x ./runner | ||
- name: run calculation | ||
run: ./runner calculate -r ./ | ||
# always attempt to upload the results even if there were regressions found | ||
- uses: actions/upload-artifact@v2 | ||
if: ${{ always() }} | ||
with: | ||
name: final-calculations | ||
path: ./final_calculations.json |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# Performance Regression Testing | ||
This directory includes dbt project setups to test on and a test runner written in Rust which runs specific dbt commands on each of the projects. Orchestration is done via the GitHub Action workflow in `/.github/workflows/performance.yml`. The workflow is scheduled to run every night, but it can also be triggered manually. | ||
|
||
The github workflow hardcodes our baseline branch for performance metrics as `0.20.latest`. As future versions become faster, this branch will be updated to hold us to those new standards. | ||
|
||
## Adding a new dbt project | ||
Just make a new directory under `performance/projects/`. It will automatically be picked up by the tests. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm wondering if we really want to check in the thousands of files associated with the performance tests? Could we put the projects we want to test into separate repos? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes. we absolutely could do that. The advantage would be that we don't have to download all those files when we checkout dbt, but it's also another repository to manage. I just measured that a dbt project with 2k files is roughly 6.3k to download which I don't consider a deal breaker. But I'm honestly open to both approaches. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like that this offers general support for performance-testing any project. I'm thinking about specific projects that we've known to parse slowly, and how great it would be to stress-test a putative improvement in a more-rigorous manner. As far as running this nightly: If we're going to use generated projects, I'd be open to including the project-generation code in this repo, and running the project generation as a step in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jtcohen6 that is a good next step, however most places that do this kind of performance work (and publish it) usually check in projects to prove the metrics aren't changing because the project is changing. We could go against the grain on that one, but I'm inclined to check it in if it doesn't cause any tangible issues. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok cool, I buy that (and remember you having told me this). I'm all for doing this in a way that follows existing practice. |
||
|
||
## Adding a new dbt command | ||
In `runner/src/measure.rs::measure` add a metric to the `metrics` Vec. The Github Action will handle recompilation if you don't have the rust toolchain installed. | ||
|
||
## Future work | ||
- add more projects to test different configurations that have been known bottlenecks | ||
- add more dbt commands to measure | ||
- possibly using the uploaded json artifacts to store these results so they can be graphed over time | ||
- reading new metrics from a file so no one has to edit rust source to add them to the suite |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
id: 5d0c160e-f817-4b77-bce3-ffb2e37f0c9b |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
default: | ||
target: dev | ||
outputs: | ||
dev: | ||
type: postgres | ||
host: localhost | ||
user: dummy | ||
password: dummy_password | ||
port: 5432 | ||
dbname: dummy | ||
schema: dummy | ||
threads: 4 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
|
||
# Name your package! Package names should contain only lowercase characters | ||
# and underscores. A good package name should reflect your organization's | ||
# name or the intended use of these models | ||
name: 'my_new_package' | ||
version: 1.0.0 | ||
config-version: 2 | ||
|
||
# This setting configures which "profile" dbt uses for this project. Profiles contain | ||
# database connection information, and should be configured in the ~/.dbt/profiles.yml file | ||
profile: 'default' | ||
|
||
# These configurations specify where dbt should look for different types of files. | ||
# The `source-paths` config, for example, states that source models can be found | ||
# in the "models/" directory. You probably won't need to change these! | ||
source-paths: ["models"] | ||
analysis-paths: ["analysis"] | ||
test-paths: ["tests"] | ||
data-paths: ["data"] | ||
macro-paths: ["macros"] | ||
|
||
target-path: "target" # directory which will store compiled SQL files | ||
clean-targets: # directories to be removed by `dbt clean` | ||
- "target" | ||
- "dbt_modules" | ||
|
||
# You can define configurations for models in the `source-paths` directory here. | ||
# Using these configurations, you can enable or disable models, change how they | ||
# are materialized, and more! | ||
|
||
# In this example config, we tell dbt to build all models in the example/ directory | ||
# as views (the default). These settings can be overridden in the individual model files | ||
# using the `{{ config(...) }}` macro. | ||
models: | ||
my_new_package: | ||
# Applies to all files under models/example/ | ||
example: | ||
materialized: view |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
select 1 as id |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
models: | ||
- columns: | ||
- name: id | ||
tests: | ||
- unique | ||
- not_null | ||
- relationships: | ||
field: id | ||
to: node_0 | ||
name: node_0 | ||
version: 2 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
select 1 as id | ||
union all | ||
select * from {{ ref('node_0') }} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
models: | ||
- columns: | ||
- name: id | ||
tests: | ||
- unique | ||
- not_null | ||
- relationships: | ||
field: id | ||
to: node_0 | ||
name: node_1 | ||
version: 2 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
select 1 as id | ||
union all | ||
select * from {{ ref('node_0') }} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
models: | ||
- columns: | ||
- name: id | ||
tests: | ||
- unique | ||
- not_null | ||
- relationships: | ||
field: id | ||
to: node_0 | ||
name: node_2 | ||
version: 2 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
|
||
# Name your package! Package names should contain only lowercase characters | ||
# and underscores. A good package name should reflect your organization's | ||
# name or the intended use of these models | ||
name: 'my_new_package' | ||
version: 1.0.0 | ||
config-version: 2 | ||
|
||
# This setting configures which "profile" dbt uses for this project. Profiles contain | ||
# database connection information, and should be configured in the ~/.dbt/profiles.yml file | ||
profile: 'default' | ||
|
||
# These configurations specify where dbt should look for different types of files. | ||
# The `source-paths` config, for example, states that source models can be found | ||
# in the "models/" directory. You probably won't need to change these! | ||
source-paths: ["models"] | ||
analysis-paths: ["analysis"] | ||
test-paths: ["tests"] | ||
data-paths: ["data"] | ||
macro-paths: ["macros"] | ||
|
||
target-path: "target" # directory which will store compiled SQL files | ||
clean-targets: # directories to be removed by `dbt clean` | ||
- "target" | ||
- "dbt_modules" | ||
|
||
# You can define configurations for models in the `source-paths` directory here. | ||
# Using these configurations, you can enable or disable models, change how they | ||
# are materialized, and more! | ||
|
||
# In this example config, we tell dbt to build all models in the example/ directory | ||
# as views (the default). These settings can be overridden in the individual model files | ||
# using the `{{ config(...) }}` macro. | ||
models: | ||
my_new_package: | ||
# Applies to all files under models/example/ | ||
example: | ||
materialized: view |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
select 1 as id |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
models: | ||
- columns: | ||
- name: id | ||
tests: | ||
- unique | ||
- not_null | ||
- relationships: | ||
field: id | ||
to: node_0 | ||
name: node_0 | ||
version: 2 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
select 1 as id | ||
union all | ||
select * from {{ ref('node_0') }} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
models: | ||
- columns: | ||
- name: id | ||
tests: | ||
- unique | ||
- not_null | ||
- relationships: | ||
field: id | ||
to: node_0 | ||
name: node_1 | ||
version: 2 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
select 1 as id | ||
union all | ||
select * from {{ ref('node_0') }} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
models: | ||
- columns: | ||
- name: id | ||
tests: | ||
- unique | ||
- not_null | ||
- relationships: | ||
field: id | ||
to: node_0 | ||
name: node_2 | ||
version: 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need some way to keep this up to date. We already have
0.21.latest
, though there hasn't yet been a final release from that branch.I'm wondering if it would be preferable to install the latest stable release of dbt, rather than checking out a specific branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we could do that instead. Definitely less manual than this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should switch to this approach after a version or two so we know we haven't introduced any slippage in the metrics we're measuring. 🤔