Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom metric threshold calculation using wrong statistics #2390

Closed
timterrill opened this issue Feb 15, 2022 · 7 comments · Fixed by #2426
Closed

Custom metric threshold calculation using wrong statistics #2390

timterrill opened this issue Feb 15, 2022 · 7 comments · Fixed by #2426
Assignees
Milestone

Comments

@timterrill
Copy link

Brief summary

When experimenting with the differences between custom metric thresholds and thresholds on tagged requests, I discovered that the custom metric thresholds were not using the custom metric data, they were using the overall statistics from the entire test (http_req_duration).

This caused thresholds for the custom metric to fail when the tagged requests passed. In my test example I have 2 requests, each of which has a unique custom metric and a unique tag:

  • homepage request:
    • custom metric = trend_homepage
    • request tag = name:tag_homepage
  • pi request
    • custom metric = trend_pi
    • request tag = name:tag_pi

The metrics for the tag and custom metric should be the same, therefore the thresholds for each request should both pass or fail.

k6 version

k6 v0.34.1 ((devel), go1.17, darwin/amd64)

OS

macOS 12.1

Docker version and image (if applicable)

No response

Steps to reproduce the problem

To reproduce this issue,

  1. Download https://gist.github.com/timterrill/e91c36a579916b697cb7af061bbcc432
  2. Run with the following command:

$> k6 run k6-threshold-bug.js

NOTE: Depending on the actual response times of the 2 endpoints, the threshold may not actually fail, but repeated runs should reproduce it.

Expected behaviour

The output I expect from this run is as follows (I modified the 2 rows for { trend_XXXX } to be what I expected):

...

   ✓ http_req_duration..............: count=217 min=18.19ms avg=21.26ms  p(98)=26.92ms  p(99.9)=31.21ms  max=31.69ms
       { expected_response:true }...: count=217 min=18.19ms avg=21.26ms  p(98)=26.92ms  p(99.9)=31.21ms  max=31.69ms
     ✓ { name:tag_homepage }........: count=214 min=18.19ms avg=21.25ms  p(98)=26.96ms  p(99.9)=31.22ms  max=31.69ms
     ✓ { name:tag_pi }..............: count=3   min=20.92ms avg=21.41ms  p(98)=22.09ms  p(99.9)=22.12ms  max=22.12ms
     ✓ { trend_homepage }...........: count=214 min=18.19ms avg=21.25ms  p(98)=26.96ms  p(99.9)=31.22ms  max=31.69ms
     ✓ { trend_pi }.................: count=3   min=20.92ms avg=21.41ms  p(98)=22.09ms  p(99.9)=22.12ms  max=22.12ms

...

     trend_homepage.................: count=214 min=18.19ms avg=21.25ms  p(98)=26.96ms  p(99.9)=31.22ms  max=31.69ms
     trend_pi.......................: count=3   min=20.92ms avg=21.41ms  p(98)=22.09ms  p(99.9)=22.12ms  max=22.12ms

Notice how trend_pi, the threshold rows for trend_pi and name:tag_pi all had 3 requests and the same metrics as the custom metric did. The http_req_duration row had 217 requests (which represents all requests for the test run).

Also the name:tag_pi and custom metric trend_pi thresholds BOTH passed (as expected).

Actual behaviour

This is the actual output of my run:

2022-02-15 k6-bug

And the part of the summary output to compare against the above section:

ISSUE: The threshold for { trend_pi } failed and both { trend_XXX} rows had the wrong data.

...

   ✓ http_req_duration..............: count=217 min=18.19ms avg=21.26ms  p(98)=26.92ms  p(99.9)=31.21ms  max=31.69ms
       { expected_response:true }...: count=217 min=18.19ms avg=21.26ms  p(98)=26.92ms  p(99.9)=31.21ms  max=31.69ms
     ✓ { name:tag_homepage }........: count=214 min=18.19ms avg=21.25ms  p(98)=26.96ms  p(99.9)=31.22ms  max=31.69ms
     ✓ { name:tag_pi }..............: count=3   min=20.92ms avg=21.41ms  p(98)=22.09ms  p(99.9)=22.12ms  max=22.12ms
     ✓ { trend_homepage }...........: count=217 min=18.19ms avg=21.26ms  p(98)=26.92ms  p(99.9)=31.21ms  max=31.69ms
     ✗ { trend_pi }.................: count=217 min=18.19ms avg=21.26ms  p(98)=26.92ms  p(99.9)=31.21ms  max=31.69ms

...

     trend_homepage.................: count=214 min=18.19ms avg=21.25ms  p(98)=26.96ms  p(99.9)=31.22ms  max=31.69ms
     trend_pi.......................: count=3   min=20.92ms avg=21.41ms  p(98)=22.09ms  p(99.9)=22.12ms  max=22.12ms

Notice how trend_pi had 3 requests, and the threshold row for name:tag_pi had 3 requests and the same metrics as the custom metric did, but the threshold calculation row for trend_pi had 217 requests and the same metrics as the http_req_duration has for the entire test run (THIS IS THE ISSUE).

Also the name:tag_pi threshold passed, where the custom metric trend_pi did not pass. That should be impossible because the simple code made a single http request that was tagged with name:tag_pi, and the custom metric trend_pi just tracked the http_req_duration from the request.

@timterrill timterrill added the bug label Feb 15, 2022
@efdknittlfrank
Copy link

efdknittlfrank commented Feb 16, 2022

To define a threshold on a custom metric, you have to use the custom metric. A custom metric is a standalone metric and independent from the http_req_duration metric. Filtering a metric on a tag without value that does not exist, seems to be a noop.

You actually want to define your threshold as:

export const options = {
        thresholds: {
            'trend_homepage': [
                { threshold: 'p(99.99) < 100', abortOnFail: false }
            ],
            'trend_pi': [
                { threshold: 'p(99.99) < 24', abortOnFail: false }
            ],
        },
}

@efdknittlfrank
Copy link

efdknittlfrank commented Feb 16, 2022

Also note that the tags you apply to your custom metric are rather redundant, they always contain the same value.

const piTrend = new Trend('trend_pi', true);
function piRequest(data, metric) {
  const name = 'tag_pi';
  const params = { tags: { name } };
  const res = http.request('GET', 'https://test.k6.io/pi.php?decimals=2', {}, params);
  metric.add(res.timings.duration, { requestName: name, metric: 'timings.duration' })
}

Filtering on the tags will output identical metric values for all combinations:

options = {
  thresholds: {
    'trend_pi': [ 'p(99.99) < 100' ],
    'trend_pi{requestName:tag_pi}': [ 'p(99.99) < 100' ],
    'trend_pi{metric:timings.duration}': [ 'p(99.99) < 100' ],
    'trend_pi(requestName:tag_pi,metric:timings.duration}': [ 'p(99.99) < 100' ],
  }
}

@na--
Copy link
Member

na-- commented Feb 16, 2022

Hmm @efdknittlfrank, while you are correct, I think @timterrill found a k6 bug here.

Notice how trend_pi had 3 requests, and the threshold row for name:tag_pi had 3 requests and the same metrics as the custom metric did, but the threshold calculation row for trend_pi had 217 requests and the same metrics as the http_req_duration has for the entire test run (THIS IS THE ISSUE).

Yes, this looks like a problem, though I created a simpler example, since the current one has a lot of (probably unnecessary) complexity. Consider this much simpler test run:

import http from 'k6/http';

export const options = {
  thresholds: {
    'http_reqs': ['count == 2'],
    'http_reqs{key}': ['count == 1'],
    'http_reqs{key:value}': ['count == 1'],
    'http_reqs{nonexistent_key}': ['count == 0'],
  }
}

export default function (data) {
  http.get('https://httpbin.test.k6.io/anything?r=1', { tags: {} })
  http.get('https://httpbin.test.k6.io/anything?r=2', { tags: { 'key': 'value' } })
}

It will result in the following summary (cropped to only the relevant parts):

...
   ✓ http_reqs......................: 2      2.507944/s
     ✓ { key:value }................: 1      1.253972/s
     ✓ { key }......................: 1      1.253972/s
     ✗ { nonexistent_key }..........: 2      2.507944/s
...

My suspicion is that since metric tags are meant to be key:value, having a threshold on http_reqs{nonexistent_key} is the same as having a threshold on http_reqs{nonexistent_key:''}. And since Go maps return an empty string when trying to access a map key that doesn't exist unless you explicitly check if the key is missing, this might be the problem, i.e. k6 will basically treat it as it if was a threshold on the whole metric, since that tag is never used... 😞

We are currently in the midst of refactoring how the thresholds are evaluated (#2356) by @oleiade, so this might already be fixed, or easier to fix after that PR. In any case, it seems like a valid bug, thank you for reporting it! 🙇

@na-- na-- added the high prio label Feb 16, 2022
@na-- na-- added this to the v0.37.0 milestone Feb 16, 2022
@na--
Copy link
Member

na-- commented Feb 16, 2022

There is also a similar bug (or bugs) to this in k6 cloud... 😞 Data crunching for thresholds is done somewhat differently there, since potentially multiple instances can generate data. My original example above with the thresholds defined like this:

{
  'http_reqs': ['count == 2'],
  'http_reqs{key}': ['count == 1'],
  'http_reqs{key:value}': ['count == 1'],
  'http_reqs{nonexistent_key}': ['count == 0'],
}

Resulted in the cloud backend not parsing the ones with no key values and reporting just these two thresholds: t1

However, when I added : to these thresholds, like this:

thresholds: {
  'http_reqs': ['count == 2'],
  'http_reqs{key:}': ['count == 1'],
  'http_reqs{key:value}': ['count == 1'],
  'http_reqs{nonexistent_key:}': ['count == 0'],
}

I got this, which includes the k6 bug, but also includes a bug for the http_reqs{key:} threshold:

t2

So yeah, this is probably going to be somewhat complicated to fix fully...

@na-- na-- added the cloud label Feb 16, 2022
@mstoykov
Copy link
Contributor

@na-- this likely should (and already is in that PR) a "syntax error"

@na--
Copy link
Member

na-- commented Feb 16, 2022

@mstoykov, maybe, partially though that would be kind of a breaking change and deserves its own separate evaluation...

But again, the problem persists even if I use the proper syntax for thresholds that want to check if a metric tag exists but has an empty value... These should be valid thresholds, right?

import http from 'k6/http';

export const options = {
  thresholds: {
    'http_reqs': ['count == 2'],
    'http_reqs{key:""}': ['count == 1'],
    'http_reqs{key:value}': ['count == 1'],
    'http_reqs{nonexistent_key:""}': ['count == 0'],
  }
}

export default function (data) {
  http.get('https://httpbin.test.k6.io/anything?r=1', { tags: {} })
  http.get('https://httpbin.test.k6.io/anything?r=2', { tags: { 'key': 'value' } })
}

And yet I still get the same wrong results:

   ✓ http_reqs......................: 2      0.962599/s
     ✓ { key:"" }...................: 1      0.481299/s
     ✓ { key:value }................: 1      0.481299/s
     ✗ { nonexistent_key:"" }.......: 2      0.962599/s

FWIW, this is the part where threshold definitions like http_reqs{key} are basically parsed as the equivalent of http_reqs{key:}:

k6/stats/stats.go

Lines 506 to 515 in c4b88f5

parts := strings.SplitN(kv, ":", 2)
key := strings.TrimSpace(strings.Trim(parts[0], `"'`))
if len(parts) != 2 {
tags[key] = ""
continue
}
value := strings.TrimSpace(strings.Trim(parts[1], `"'`))
tags[key] = value

And IMO that part is somewhat fine... If we ignore the fact the UX is slightly confusing, I am not sure it should be treated as a syntax error 🤷‍♂️ The cloud treats threshold definitions like http_reqs{key} a bit differently, but that's a somewhat separate issue.

The bigger problem on the k6 OSS side is with the data crunching, when determining whether a certain metric sample is part of a specific threshold sub-metric. That one is apparently completely broken, since it somehow thinks there are 2 HTTP requests with a nonexistent_key:"" tag...

@na-- na-- modified the milestones: v0.37.0, v0.38.0 Feb 16, 2022
@timterrill
Copy link
Author

Thanks @efdknittlfrank and @na-- for the immediate action on this!

FWIW, I was just testing out the two techniques for setting thresholds and was not going to use both in a given test, as I knew it was redundant but I wondered which way I'd like best.

Thanks for pointing out that I was using the wrong syntax for the trend_XXX custom metric thresholds, and I have fixed and tested that on my end.

I'm glad though that it did identify a bug ;)

my $0.02 on a fix might be to ignore empty tag values as I can't imagine how that's useful? Likely if that happens someone made a mistake (like I did), so possibly even generate an error message "Can't set threshold on an undefined tag-value"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants