A/B tests with package sync + repeats #355

crusaderky · 2022-09-20T16:25:34Z

Supersedes Repeat A/B tests #324
Repair A/B tests after Package Sync #235
Add throttling to prevent AWS DoS protection from kicking in
Refactor regular (non-A/B) tests
Introduce repeat: N setting, which causes every A/B test runtime to be rerun N times
Overhaul the A/B performance reports to display statistical data instead of exact
Introduce test_null_hypothesis: true setting, which creates a verbatim clone of AB_baseline

Out of scope:

Merge tests.yaml with ab_tests.yaml. This will happen in a future PR.

crusaderky · 2022-09-20T23:38:00Z

A/B tests evidence: https://github.com/coiled/coiled-runtime/actions/runs/3091876943

crusaderky · 2022-09-20T23:40:40Z

This is ready for review

ian-r-rose

Most of these changes look reasonable to me, though I haven't gone through in detail. My main question is whether we should just further simplify the tests workflow matrix rather than pushing these lumpy include blocks around.

ian-r-rose · 2022-09-21T16:06:12Z

.github/workflows/tests.yml

@@ -22,19 +22,65 @@ defaults:
    shell: bash -l {0}

 jobs:
-  runtime:
-    name: Runtime - ${{ matrix.os }}, Python ${{ matrix.python-version }}, Runtime ${{ matrix.runtime-version }}
+  tests:


Over in #279 I proposed doing something similar to this, but without a new category matrix item. What would you think about just running the tests as a single job, and letting xdist do the rest?

I'd say it's reasonable, and it would save some time. Some non-trivial engineering is needed - mind if I do it in a successive PR?

Sure, I don't think anything here makes that refactor harder.

ian-r-rose · 2022-09-21T16:09:45Z

.github/workflows/tests.yml

        python-version: ["3.9"]
-        runtime-version: ["upstream", "latest", "0.0.4", "0.1.0"]
+        category: [runtime, benchmarks, stability]


So maybe we just don't do this and all of the extra include logic.

Even if you run all tests together, you'll still need a wordy include paragraph.

Instead of this:

matrix: os: [ubuntu-latest] python-version: ["3.9"] category: [runtime, benchmarks, stability] runtime-version: [upstream, latest, "0.0.4", "0.1.0"] include: # Run stability tests on Python 3.8 - category: stability python-version: "3.8" runtime-version: upstream os: ubuntu-latest ...

it will look like this:

matrix: os: [ubuntu-latest] python-version: ["3.9"] pytest_args: [tests] runtime-version: [upstream, latest, "0.0.4", "0.1.0"] include: # Run stability tests on Python 3.8 - pytest_args: tests/stability python-version: "3.8" runtime-version: upstream os: ubuntu-latest ...

In the next PR, I want to merge ab_tests.yaml into tests.yaml. In that PR I'll dynamically generate the whole matrix with discover_ab_environments.py (to be renamed); the matrix for non-A/B tests will be generated from parameters in ci/config.yaml (now AB_environments/config.yaml)

Yeah, I was proposing just running the whole test suite on every matrix value. Perhaps overkill, however.

In the next PR, I want to merge ab_tests.yaml into tests.yaml. In that PR I'll dynamically generate the whole matrix with discover_ab_environments.py (to be renamed); the matrix for non-A/B tests will be generated from parameters in ci/config.yaml (now AB_environments/config.yaml)

I'm a little concerned about the complexity of generating bespoke test matrices, and what it will mean for local testing. I was hoping to make the test matrix way simpler.

ian-r-rose

I haven't gone through in great detail, but this looks good to me from a high level

crusaderky added 8 commits September 15, 2022 22:05

Repeat A/B tests

45f0661

Merge from main

c778881

Throttle concurrent runs

c58806c

tweaks

1544518

Add upstream

79f4fa1

Merge branch 'main' into guido/AB_package_sync

245917a

Env variables

a0a4ac3

temp enable A/B

cce60d0

crusaderky marked this pull request as draft September 20, 2022 16:25

fix null hypothesis

603a55f

crusaderky self-assigned this Sep 20, 2022

revert temp changes

911f0c6

crusaderky requested review from ian-r-rose and ncclementi September 20, 2022 23:40

crusaderky marked this pull request as ready for review September 21, 2022 13:30

ian-r-rose reviewed Sep 21, 2022

View reviewed changes

Merge branch 'main' into guido/AB_package_sync

a7b1281

ian-r-rose approved these changes Sep 22, 2022

View reviewed changes

crusaderky merged commit b14536d into main Sep 22, 2022

crusaderky deleted the guido/AB_package_sync branch September 22, 2022 15:31

crusaderky mentioned this pull request Sep 22, 2022

Repeat A/B tests #324

Closed

crusaderky mentioned this pull request Nov 30, 2022

Tweak parallelism from A/B config #562

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A/B tests with package sync + repeats #355

A/B tests with package sync + repeats #355

crusaderky commented Sep 20, 2022 •

edited

Loading

crusaderky commented Sep 20, 2022

crusaderky commented Sep 20, 2022

ian-r-rose left a comment

ian-r-rose Sep 21, 2022

crusaderky Sep 22, 2022

ian-r-rose Sep 22, 2022

ian-r-rose Sep 21, 2022

crusaderky Sep 22, 2022 •

edited

Loading

ian-r-rose Sep 22, 2022

ian-r-rose left a comment

A/B tests with package sync + repeats #355

A/B tests with package sync + repeats #355

Conversation

crusaderky commented Sep 20, 2022 • edited Loading

crusaderky commented Sep 20, 2022

crusaderky commented Sep 20, 2022

ian-r-rose left a comment

Choose a reason for hiding this comment

ian-r-rose Sep 21, 2022

Choose a reason for hiding this comment

crusaderky Sep 22, 2022

Choose a reason for hiding this comment

ian-r-rose Sep 22, 2022

Choose a reason for hiding this comment

ian-r-rose Sep 21, 2022

Choose a reason for hiding this comment

crusaderky Sep 22, 2022 • edited Loading

Choose a reason for hiding this comment

ian-r-rose Sep 22, 2022

Choose a reason for hiding this comment

ian-r-rose left a comment

Choose a reason for hiding this comment

crusaderky commented Sep 20, 2022 •

edited

Loading

crusaderky Sep 22, 2022 •

edited

Loading