Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A/B tests with package sync + repeats #355

Merged
merged 11 commits into from
Sep 22, 2022
Merged

Conversation

crusaderky
Copy link
Contributor

@crusaderky crusaderky commented Sep 20, 2022

  • Supersedes Repeat A/B tests #324
  • Repair A/B tests after Package Sync #235
  • Add throttling to prevent AWS DoS protection from kicking in
  • Refactor regular (non-A/B) tests
  • Introduce repeat: N setting, which causes every A/B test runtime to be rerun N times
  • Overhaul the A/B performance reports to display statistical data instead of exact
  • Introduce test_null_hypothesis: true setting, which creates a verbatim clone of AB_baseline

Out of scope:

  • Merge tests.yaml with ab_tests.yaml. This will happen in a future PR.

@crusaderky crusaderky marked this pull request as draft September 20, 2022 16:25
@crusaderky crusaderky self-assigned this Sep 20, 2022
@crusaderky
Copy link
Contributor Author

@crusaderky
Copy link
Contributor Author

This is ready for review

@crusaderky crusaderky marked this pull request as ready for review September 21, 2022 13:30
Copy link
Contributor

@ian-r-rose ian-r-rose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these changes look reasonable to me, though I haven't gone through in detail. My main question is whether we should just further simplify the tests workflow matrix rather than pushing these lumpy include blocks around.

@@ -22,19 +22,65 @@ defaults:
shell: bash -l {0}

jobs:
runtime:
name: Runtime - ${{ matrix.os }}, Python ${{ matrix.python-version }}, Runtime ${{ matrix.runtime-version }}
tests:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over in #279 I proposed doing something similar to this, but without a new category matrix item. What would you think about just running the tests as a single job, and letting xdist do the rest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it's reasonable, and it would save some time. Some non-trivial engineering is needed - mind if I do it in a successive PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I don't think anything here makes that refactor harder.

python-version: ["3.9"]
runtime-version: ["upstream", "latest", "0.0.4", "0.1.0"]
category: [runtime, benchmarks, stability]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe we just don't do this and all of the extra include logic.

Copy link
Contributor Author

@crusaderky crusaderky Sep 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if you run all tests together, you'll still need a wordy include paragraph.

Instead of this:

      matrix:
        os: [ubuntu-latest]
        python-version: ["3.9"]
        category: [runtime, benchmarks, stability]
        runtime-version: [upstream, latest, "0.0.4", "0.1.0"]
        include:
          # Run stability tests on Python 3.8
          - category: stability
            python-version: "3.8"
            runtime-version: upstream
            os: ubuntu-latest
       ...

it will look like this:

      matrix:
        os: [ubuntu-latest]
        python-version: ["3.9"]
        pytest_args: [tests]
        runtime-version: [upstream, latest, "0.0.4", "0.1.0"]
        include:
          # Run stability tests on Python 3.8
          - pytest_args: tests/stability
            python-version: "3.8"
            runtime-version: upstream
            os: ubuntu-latest
         ...

In the next PR, I want to merge ab_tests.yaml into tests.yaml. In that PR I'll dynamically generate the whole matrix with discover_ab_environments.py (to be renamed); the matrix for non-A/B tests will be generated from parameters in ci/config.yaml (now AB_environments/config.yaml)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was proposing just running the whole test suite on every matrix value. Perhaps overkill, however.

In the next PR, I want to merge ab_tests.yaml into tests.yaml. In that PR I'll dynamically generate the whole matrix with discover_ab_environments.py (to be renamed); the matrix for non-A/B tests will be generated from parameters in ci/config.yaml (now AB_environments/config.yaml)

I'm a little concerned about the complexity of generating bespoke test matrices, and what it will mean for local testing. I was hoping to make the test matrix way simpler.

Copy link
Contributor

@ian-r-rose ian-r-rose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gone through in great detail, but this looks good to me from a high level

@crusaderky crusaderky merged commit b14536d into main Sep 22, 2022
@crusaderky crusaderky deleted the guido/AB_package_sync branch September 22, 2022 15:31
@crusaderky crusaderky mentioned this pull request Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants