Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add k6 benchmarking to run on PRs #392

Merged
merged 9 commits into from
Dec 7, 2020
Merged

Conversation

joe-elliott
Copy link
Member

@joe-elliott joe-elliott commented Dec 3, 2020

What this PR does:

  • Moves k6 scripts to a bench folder
  • Consolidates testing utils for both e2e and k6 benchmarking
  • Fixes readiness test in e2e tests (was accepting 500s as ready)
  • Adjusts k6 load test to only run for 1m
  • Adds benchmarking to Makefile and CI

The two tests will create output like:


          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: /shared/stress_test_write_path.js
     output: -

  scenarios: (100.00%) 2 scenarios, 6 max VUs, 1m30s max duration (incl. graceful stop):
           * steadyStateCheck: 1 looping VUs for 1m0s (exec: steadyStateCheck, gracefulStop: 30s)
           * writePath: Up to 5 looping VUs for 1m0s over 1 stages (gracefulRampDown: 5s, exec: writePath, gracefulStop: 30s)


    ✓ distributor is status 200
    ✓ ingester is status 200
    ✗ write is status 202
     ↳  88% — ✓ 106787 / ✗ 14024

    checks.....................: 88.39%  ✓ 106811 ✗ 14024
    ✓ { type:steady }..........: 100.00% ✓ 24     ✗ 0    
    ✓ { type:write }...........: 88.39%  ✓ 106787 ✗ 14024
    data_received..............: 11 MB   181 kB/s
    data_sent..................: 39 MB   652 kB/s
    http_req_blocked...........: avg=2.55µs   min=919ns    med=2.31µs   max=417.44µs p(90)=3.5µs   p(95)=4.32µs 
    http_req_connecting........: avg=4ns      min=0s       med=0s       max=138.95µs p(90)=0s      p(95)=0s     
  ✓ http_req_duration..........: avg=773.81µs min=134.24µs med=640.7µs  max=648.78ms p(90)=1.06ms  p(95)=1.33ms 
    http_req_receiving.........: avg=26.06µs  min=8.4µs    med=22.58µs  max=7.81ms   p(90)=38.81µs p(95)=46.5µs 
    http_req_sending...........: avg=17.93µs  min=4.59µs   med=16.43µs  max=7.1ms    p(90)=23.83µs p(95)=28.35µs
    http_req_tls_handshaking...: avg=0s       min=0s       med=0s       max=0s       p(90)=0s      p(95)=0s     
    http_req_waiting...........: avg=729.81µs min=108.2µs  med=597.69µs max=648.73ms p(90)=999.4µs p(95)=1.26ms 
    http_reqs..................: 120835  2012.713649/s
    iteration_duration.........: avg=1.73ms   min=518.58µs med=1.11ms   max=5s       p(90)=1.66ms  p(95)=1.99ms 
    iterations.................: 120823  2012.513769/s
    vus........................: 5       min=2    max=5  
    vus_max....................: 6       min=6    max=6  

Which will be good records of performance changes for each PR.

cc @dgzlopes

Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Copy link
Contributor

@annanay25 annanay25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! I carefully combed through all changes. These are some nice metrics, can we add a section in the README about what these metrics mean and which ones we should watch out for in every CI run?

integration/bench/smoke_test.js Outdated Show resolved Hide resolved
s := cortex_e2e.NewConcreteService(
"k6",
k6Image,
cortex_e2e.NewCommandWithoutEntrypoint("sh", "-c", "sleep 3600"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the sleep here? This will only start up once tempo is up and running, i.e after this

	require.NoError(t, s.StartAndWaitReady(tempo))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need a process to start and not exit. If you run

docker run loadimpact/k6:latest

Then it immediately exits. So we start this k6 container with sleep so it just hangs around for awhile and then use exec:

https://github.com/grafana/tempo/pull/392/files/8f5c168ffb88850487e9932fc779df9da03caf8d#diff-4f32d9cc23405c3bef7b0d0233346dd096936a420e6499b8981bf851f995f0d8R45

to kick off our tests.

integration/bench/stress_test_write_path.js Outdated Show resolved Hide resolved
integration/bench/stress_test_write_path.js Outdated Show resolved Hide resolved
Copy link
Member

@dgzlopes dgzlopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome :)

Added some comments, but nothing too important.

I think too that having some docs about the metrics is a good idea.

integration/bench/stress_test_write_path.js Show resolved Hide resolved
integration/bench/smoke_test.js Outdated Show resolved Hide resolved
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
@joe-elliott
Copy link
Member Author

These are some nice metrics, can we add a section in the README about what these metrics mean and which ones we should watch out for in every CI run?

Not super concerned about this being perfectly defined. I really just want a regular record of the performance of our write/query paths recorded for every PR. This way if/when we do have performance issues we can at least go back through our PRs and get some idea of what may have occurred. This is very much a "first pass" at this feature and a lot of work could be put into improving these tests to better test query path or compaction.

…l CI

Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
@dgzlopes
Copy link
Member

dgzlopes commented Dec 5, 2020

I like the new changes! LGTM :)

Copy link
Contributor

@annanay25 annanay25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like the separate workflow for benchmarks. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants