[Enhancement] Support for A/B Tests #299

geekdave · 2017-08-25T19:11:23Z

I'm using k6 to conduct some A/B tests, and I thought it would be nice to have built-in support for this.

Here's what I'm thinking...

Define variants in the script like so:

export let variants = { 
    uncompressed: {
        TARGET_HOST: "old-clunky-server.mycompany.com"
    },
    gzipped: {
        TARGET_HOST: "new-shiny-server.mycompany.com"
    }
}

k6 would perform a separate run for each variant, and inject the env variables from each variant into __ENV.

Set up your test to consume the variant ENV variables to modify test behavior like so:

const baseURL = "https://" + __ENV.TARGET_HOST;

Generate a side-by-side comparison in the output: (made up numbers & math here)

http_req_duration.....: 
  uncompressed.... avg=55.83ms max=606.29ms med=30.45ms min=4ms p(90)=135.75ms p(95)=197.82ms
  gzipped......... avg=54.65ms max=371.28ms med=22.4ms min=3.83ms p(90)=122.38ms p(95)=186.39ms

              gzipped | uncompressed
  avg     2.3% faster |
  p(90)   4.2% faster |
  p(95)   6.2% faster |

http_req_connecting...: 
  uncompressed.... avg=13.45µs max=3.35ms med=0s min=0s p(90)=0s p(95)=0s
  gzipped......... avg=17.23µs max=9.35ms med=0s min=0s p(90)=0s p(95)=0s

              gzipped | uncompressed
  avg                 | 0.4% faster
  p(90)               | 0.1% faster
  p(95)               | 0.2% faster

That way, you could quickly see at-a-glance which variant performed better in which categories.

I'd be happy to take a stab at implementing this, if it seems useful.

cc @liclac

The text was updated successfully, but these errors were encountered:

ppcano · 2017-08-25T21:10:34Z

This request may be related to #239

ragnarlonn · 2017-08-28T06:52:14Z

A/B-testing sounds very useful. A little concerned that it is going to complicate the tool a bit, allowing multiple tests inside a single "k6 run" execution, but maybe I'm paranoid. I think the variants should not be a separate, exported global perhaps, but a part of options ?

liclac · 2017-08-28T14:33:36Z

I like this idea a lot. It feels like a more practical application for #239, and would supersede it.

The tech for this is really easy - just run the tests in a loop over all the variants. The tricky part would be presenting the information in a good way… @ppcano

liclac · 2017-08-28T14:38:16Z

In my mind, the variants would also be a field inside options, and would be a map[string]Options, which would mean you could override any option you want. We'd need a new env option for setting environment variables.

ppcano · 2017-08-28T19:55:53Z

@liclac I don't think "A/B Tests" supersedes #239 "Conditional flows based on percentages".

#239 intends to provide an API for conditional flows which could support "A/B Tests" amongst other use cases.

Conditional flows are a handy utility if you need the test load navigates different flows. In this case, the user is interested in randomizing the load, but not in an "A/B Tests" comparison. For example, Gatling provides randomSwith.

I am skeptical about this request to be very specific. Is it a common case to run a test to compare several environments simultaneously?

geekdave · 2017-08-28T20:00:36Z

@ppcano Thanks for jumping in on the discussion! I agree that #239 seems like a different approach. Regarding your question:

Is it a common case to run a test to compare several environments simultaneously?

What I had in mind for an A/B test was not to test multiple environments simultaneously but rather to first execute a complete k6 test suite with one variant, and then execute it again with a different variant.

This way, we don't have to worry about each variant having an effect on the other.

Basically this is to automate what I've been doing manually, which is running two different k6 test suites in serial, and changing some attribute to see if it affected performance. In the most recent case it was to see if adding stronger encryption increased latency.

ppcano · 2017-08-29T11:48:29Z

@liclac Adding the ability to support multiple console reporters (like Mocha) may also be a way to implement it.

A possible solution could be to provide a custom reporter focusing on first level group comparisons.

if (isOdd(__VU)) {
  group('uncompressed', {....});
} else {
  group('gzipped', {....});
}

randomSwitch( () =>{
  
  this.case('50', () => {
        group('uncompressed', {....});
  });
  this.case('50', () => {
        group('gzipped', {....});
  });
});

I am not convinced at all of making the default output to support this case "at this moment". Allowing for third-party innovation is usually a good approach to test and validate new ideas. If the idea becomes mature and stable, then it will be integrated into the core.

@geekdave

but rather to first execute a complete k6 test suite with one variant, and then execute it again with a different variant.

This looks to me a different flow that I think, we are not currently supporting.

You mean that if you run the test during 5 minutes, the first 2.5min for variant A and the rest for variant B.

we don't have to worry about each variant having an effect on the other.

Could you describe what your worries are?

Is there any problem with running simultaneously the 50% load to variant A and the other 50% to variant B?

Thanks for the inputs.

liclac · 2017-09-01T18:55:38Z

I do like the idea of, when A/B testing, changing the output from:

http_req_duration.....: avg=355.57µs min=85.53µs  med=0s max=57.24ms  p(90)=104.34µs p(95)=144.51µs

To something… preferably less clunky than, but conveying the same information as:

http_req_duration       avg       min      med       max      p(90)     p(95)
---------------------------------------------------------------------------
variant a               355.57µs  85.53µs  0s        57.24ms  104.34µs  144.51µs
variant b               296.21µs  87.36µs  148.45µs  14.05ms  330.6µs   390.24µs
diff                   -59.36µs  -1.83µ   +148.45µs -43.19ms +226.26µs +221.49

na-- · 2019-07-11T14:58:05Z

This should be mostly solved by #1007 - you would be able to run different scenarios, each containing different environment variable sets that could alter the test behavior, giving us multiple variants and A/B testing. And since we plan to tag the metrics from the different scenarios appropriately (TBD), we should be able to distinguish between the metrics generated by the different variants if an external output like InfluxDB or Load Impact Insights is used.

So, the only remaining questions are if there's something we can do to improve the UX of using the feature in such a way, and if we should improve the end-of-test summary to visualize the differences between the variants in one of the proposed ways above.

na-- · 2021-01-21T14:06:02Z

I'm closing this, since between groups, scenarios (since k6 v0.27.0) and the new ability to completely customize the end-of-test summary with JavaScript via handleSummary() in k6 v0.30.0 (#1768), there should be enough capabilities in k6 for people to implement something like this, if they want it. Explicitly tracking certain submetrics might help, and we'll hopefully add an easy way to do that in k6 v0.31.0 or soon after, but there's a workaround for it even now (#1321 (comment)).

ppcano mentioned this issue Dec 9, 2017

Command-line activity progress, and less verbose configurable summary output #146

Closed

liclac added the help wanted label Dec 10, 2017

na-- added the evaluation needed proposal needs to be validated or tested before fully implementing it in k6 label Jul 11, 2019

na-- closed this as completed Jan 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Support for A/B Tests #299

[Enhancement] Support for A/B Tests #299

geekdave commented Aug 25, 2017 •

edited

Loading

ppcano commented Aug 25, 2017

ragnarlonn commented Aug 28, 2017

liclac commented Aug 28, 2017

liclac commented Aug 28, 2017

ppcano commented Aug 28, 2017

geekdave commented Aug 28, 2017 •

edited

Loading

ppcano commented Aug 29, 2017

liclac commented Sep 1, 2017

na-- commented Jul 11, 2019

na-- commented Jan 21, 2021

[Enhancement] Support for A/B Tests #299

[Enhancement] Support for A/B Tests #299

Comments

geekdave commented Aug 25, 2017 • edited Loading

ppcano commented Aug 25, 2017

ragnarlonn commented Aug 28, 2017

liclac commented Aug 28, 2017

liclac commented Aug 28, 2017

ppcano commented Aug 28, 2017

geekdave commented Aug 28, 2017 • edited Loading

ppcano commented Aug 29, 2017

liclac commented Sep 1, 2017

na-- commented Jul 11, 2019

na-- commented Jan 21, 2021

geekdave commented Aug 25, 2017 •

edited

Loading

geekdave commented Aug 28, 2017 •

edited

Loading