Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Support for A/B Tests #299

Closed
geekdave opened this issue Aug 25, 2017 · 10 comments
Closed

[Enhancement] Support for A/B Tests #299

geekdave opened this issue Aug 25, 2017 · 10 comments
Labels
evaluation needed proposal needs to be validated or tested before fully implementing it in k6 help wanted

Comments

@geekdave
Copy link
Contributor

geekdave commented Aug 25, 2017

I'm using k6 to conduct some A/B tests, and I thought it would be nice to have built-in support for this.

Here's what I'm thinking...

  1. Define variants in the script like so:
export let variants = { 
    uncompressed: {
        TARGET_HOST: "old-clunky-server.mycompany.com"
    },
    gzipped: {
        TARGET_HOST: "new-shiny-server.mycompany.com"
    }
}

k6 would perform a separate run for each variant, and inject the env variables from each variant into __ENV.

  1. Set up your test to consume the variant ENV variables to modify test behavior like so:
const baseURL = "https://" + __ENV.TARGET_HOST;
  1. Generate a side-by-side comparison in the output: (made up numbers & math here)
http_req_duration.....: 
  uncompressed.... avg=55.83ms max=606.29ms med=30.45ms min=4ms p(90)=135.75ms p(95)=197.82ms
  gzipped......... avg=54.65ms max=371.28ms med=22.4ms min=3.83ms p(90)=122.38ms p(95)=186.39ms

              gzipped | uncompressed
  avg     2.3% faster |
  p(90)   4.2% faster |
  p(95)   6.2% faster |

http_req_connecting...: 
  uncompressed.... avg=13.45µs max=3.35ms med=0s min=0s p(90)=0s p(95)=0s
  gzipped......... avg=17.23µs max=9.35ms med=0s min=0s p(90)=0s p(95)=0s

              gzipped | uncompressed
  avg                 | 0.4% faster
  p(90)               | 0.1% faster
  p(95)               | 0.2% faster

That way, you could quickly see at-a-glance which variant performed better in which categories.

I'd be happy to take a stab at implementing this, if it seems useful.

cc @liclac

@ppcano
Copy link
Contributor

ppcano commented Aug 25, 2017

This request may be related to #239

@ragnarlonn
Copy link

A/B-testing sounds very useful. A little concerned that it is going to complicate the tool a bit, allowing multiple tests inside a single "k6 run" execution, but maybe I'm paranoid. I think the variants should not be a separate, exported global perhaps, but a part of options ?

@liclac
Copy link
Contributor

liclac commented Aug 28, 2017

I like this idea a lot. It feels like a more practical application for #239, and would supersede it.

The tech for this is really easy - just run the tests in a loop over all the variants. The tricky part would be presenting the information in a good way… @ppcano

@liclac
Copy link
Contributor

liclac commented Aug 28, 2017

In my mind, the variants would also be a field inside options, and would be a map[string]Options, which would mean you could override any option you want. We'd need a new env option for setting environment variables.

@ppcano
Copy link
Contributor

ppcano commented Aug 28, 2017

@liclac I don't think "A/B Tests" supersedes #239 "Conditional flows based on percentages".

#239 intends to provide an API for conditional flows which could support "A/B Tests" amongst other use cases.

Conditional flows are a handy utility if you need the test load navigates different flows. In this case, the user is interested in randomizing the load, but not in an "A/B Tests" comparison. For example, Gatling provides randomSwith.

I am skeptical about this request to be very specific. Is it a common case to run a test to compare several environments simultaneously?

@geekdave
Copy link
Contributor Author

geekdave commented Aug 28, 2017

@ppcano Thanks for jumping in on the discussion! I agree that #239 seems like a different approach. Regarding your question:

Is it a common case to run a test to compare several environments simultaneously?

What I had in mind for an A/B test was not to test multiple environments simultaneously but rather to first execute a complete k6 test suite with one variant, and then execute it again with a different variant.

This way, we don't have to worry about each variant having an effect on the other.

Basically this is to automate what I've been doing manually, which is running two different k6 test suites in serial, and changing some attribute to see if it affected performance. In the most recent case it was to see if adding stronger encryption increased latency.

@ppcano
Copy link
Contributor

ppcano commented Aug 29, 2017

@liclac Adding the ability to support multiple console reporters (like Mocha) may also be a way to implement it.

A possible solution could be to provide a custom reporter focusing on first level group comparisons.

if (isOdd(__VU)) {
  group('uncompressed', {....});
} else {
  group('gzipped', {....});
}
randomSwitch( () =>{
  
  this.case('50', () => {
        group('uncompressed', {....});
  });
  this.case('50', () => {
        group('gzipped', {....});
  });
});

I am not convinced at all of making the default output to support this case "at this moment". Allowing for third-party innovation is usually a good approach to test and validate new ideas. If the idea becomes mature and stable, then it will be integrated into the core.

@geekdave

but rather to first execute a complete k6 test suite with one variant, and then execute it again with a different variant.

This looks to me a different flow that I think, we are not currently supporting.

You mean that if you run the test during 5 minutes, the first 2.5min for variant A and the rest for variant B.

we don't have to worry about each variant having an effect on the other.

Could you describe what your worries are?

Is there any problem with running simultaneously the 50% load to variant A and the other 50% to variant B?

Thanks for the inputs.

@liclac
Copy link
Contributor

liclac commented Sep 1, 2017

I do like the idea of, when A/B testing, changing the output from:

http_req_duration.....: avg=355.57µs min=85.53µs  med=0s max=57.24ms  p(90)=104.34µs p(95)=144.51µs

To something… preferably less clunky than, but conveying the same information as:

http_req_duration       avg       min      med       max      p(90)     p(95)
---------------------------------------------------------------------------
variant a               355.57µs  85.53µs  0s        57.24ms  104.34µs  144.51µs
variant b               296.21µs  87.36µs  148.45µs  14.05ms  330.6µs   390.24µs
diff                   -59.36µs  -1.83µ   +148.45µs -43.19ms +226.26µs +221.49

@na-- na-- added the evaluation needed proposal needs to be validated or tested before fully implementing it in k6 label Jul 11, 2019
@na--
Copy link
Member

na-- commented Jul 11, 2019

This should be mostly solved by #1007 - you would be able to run different scenarios, each containing different environment variable sets that could alter the test behavior, giving us multiple variants and A/B testing. And since we plan to tag the metrics from the different scenarios appropriately (TBD), we should be able to distinguish between the metrics generated by the different variants if an external output like InfluxDB or Load Impact Insights is used.

So, the only remaining questions are if there's something we can do to improve the UX of using the feature in such a way, and if we should improve the end-of-test summary to visualize the differences between the variants in one of the proposed ways above.

@na--
Copy link
Member

na-- commented Jan 21, 2021

I'm closing this, since between groups, scenarios (since k6 v0.27.0) and the new ability to completely customize the end-of-test summary with JavaScript via handleSummary() in k6 v0.30.0 (#1768), there should be enough capabilities in k6 for people to implement something like this, if they want it. Explicitly tracking certain submetrics might help, and we'll hopefully add an easy way to do that in k6 v0.31.0 or soon after, but there's a workaround for it even now (#1321 (comment)).

@na-- na-- closed this as completed Jan 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
evaluation needed proposal needs to be validated or tested before fully implementing it in k6 help wanted
Projects
None yet
Development

No branches or pull requests

5 participants