-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Support for A/B Tests #299
Comments
This request may be related to #239 |
A/B-testing sounds very useful. A little concerned that it is going to complicate the tool a bit, allowing multiple tests inside a single "k6 run" execution, but maybe I'm paranoid. I think the |
In my mind, the variants would also be a field inside options, and would be a |
@liclac I don't think "A/B Tests" supersedes #239 "Conditional flows based on percentages". #239 intends to provide an API for conditional flows which could support "A/B Tests" amongst other use cases. Conditional flows are a handy utility if you need the test load navigates different flows. In this case, the user is interested in randomizing the load, but not in an "A/B Tests" comparison. For example, Gatling provides randomSwith. I am skeptical about this request to be very specific. Is it a common case to run a test to compare several environments simultaneously? |
@ppcano Thanks for jumping in on the discussion! I agree that #239 seems like a different approach. Regarding your question:
What I had in mind for an A/B test was not to test multiple environments simultaneously but rather to first execute a complete k6 test suite with one variant, and then execute it again with a different variant. This way, we don't have to worry about each variant having an effect on the other. Basically this is to automate what I've been doing manually, which is running two different k6 test suites in serial, and changing some attribute to see if it affected performance. In the most recent case it was to see if adding stronger encryption increased latency. |
@liclac Adding the ability to support multiple console reporters (like Mocha) may also be a way to implement it. A possible solution could be to provide a custom reporter focusing on first level group comparisons.
I am not convinced at all of making the default output to support this case "at this moment". Allowing for third-party innovation is usually a good approach to test and validate new ideas. If the idea becomes mature and stable, then it will be integrated into the core.
This looks to me a different flow that I think, we are not currently supporting. You mean that if you run the test during 5 minutes, the first 2.5min for variant A and the rest for variant B.
Could you describe what your worries are? Is there any problem with running simultaneously the 50% load to variant A and the other 50% to variant B? Thanks for the inputs. |
I do like the idea of, when A/B testing, changing the output from:
To something… preferably less clunky than, but conveying the same information as:
|
This should be mostly solved by #1007 - you would be able to run different scenarios, each containing different environment variable sets that could alter the test behavior, giving us multiple variants and A/B testing. And since we plan to tag the metrics from the different scenarios appropriately (TBD), we should be able to distinguish between the metrics generated by the different variants if an external output like InfluxDB or Load Impact Insights is used. So, the only remaining questions are if there's something we can do to improve the UX of using the feature in such a way, and if we should improve the end-of-test summary to visualize the differences between the variants in one of the proposed ways above. |
I'm closing this, since between groups, |
I'm using k6 to conduct some A/B tests, and I thought it would be nice to have built-in support for this.
Here's what I'm thinking...
k6 would perform a separate run for each variant, and inject the env variables from each variant into
__ENV
.That way, you could quickly see at-a-glance which variant performed better in which categories.
I'd be happy to take a stab at implementing this, if it seems useful.
cc @liclac
The text was updated successfully, but these errors were encountered: