Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explicit tracking and ignoring of metrics and sub-metrics #1321

Open
na-- opened this issue Jan 28, 2020 · 28 comments
Open

Add explicit tracking and ignoring of metrics and sub-metrics #1321

na-- opened this issue Jan 28, 2020 · 28 comments

Comments

@na--
Copy link
Member

na-- commented Jan 28, 2020

k6 measures a lot of things, and it can and can generate quite a lot of data points for a variety of different metrics. Not everything we measure is important to everyone, and we want to measure even more things - #877, #880, #888, #921, #1311, and all of the non-HTTP protocols we want to add...

Unfortunately, currently there's no way for a user to tell k6 that they're not interested in some of the things it measures (#570). That's a problem, because measuring and reporting useless things adds a significant overhead both to k6 (#764), and to any custom outputs that users pipe the metrics data to (#1060).

On top of that, some of the current default metrics just add clutter and confusion to the end-of-test summary report (#1319), without a lot of benefit to most users. All of the http_req_* "minor" metrics probably shouldn't be enabled by default, instead we should probably have only http_req_duration, which is sending + waiting + receiving. Maybe we should also expose the composite metric for the connection establishment, but I doubt most people would care about http_req_tls_handshaking individually, most of the time. group_duration is also somewhat useless when you have a variety of different groups and don't use an external metrics output... Unfortunately, we currently can't remove any of these metrics by default, since we don't have a mechanism for users to enable them if they want/need them...

There's also no way besides thresholds for users to force k6 to calculate and display information about sub-metrics, i.e. specific subsets of the data points of the main metrics. This point was recently raised by a user in the community forum, and unfortunately, thresholds is the only way to tell k6 to keep track of sub-metrics right now 😞

So, I think I've listed all of the connected issues I can remember... My proposed solution, as suggested by the issue title, is to allow users to whitelist metrics and sub-metrics. k6 should have a default whitelist, but we should allow users to set their own. Here is a suggestion for the simplest possible approach we can offer, though I think this might not be the best solution:

export let options = {
    // ...
    metrics: [
        "iterations",
        "iteration_duration",
        "http_req_duration",
        "http_req_duration{status:200}",
        "http_req_duration{staticAsset:yes}",
        "data_sent",
        "data_received",
    ],
};

This would work... somewhat..., but it has some big drawbacks:

  • If users just want to add a single extra metric to be displayed in addition to the ones we show by default, they'd have to list all of them here + their wanted one. This is as far from good UX as you can get...
  • Worse, if we ever add new metrics in newer k6 version, say http_req_errors (Proposal for adding http_req_errors metric #1311), most users that had already configured metrics themselves won't be aware of them, since they would be excluded, even if we show them by default in new k6 versions - that would be overwritten...
  • They also have to add all of their custom metrics, which can get tedious. Worse, if they don't specify this metrics option, our default metrics handling somehow has to make an exception for them and track and show the custom metrics' values anyway... After all, if a user went to the trouble of measuring something manually with a custom metrics, they probably care about it and want to see it in the results...

That said, I'm not firmly against still have such an option - it allows users explicitly and exactly to specify the metrics they want. But, if we do have it, I'd like us to heavily discourage its usage, and offer a better alternative for 99% of use cases.

I'd argue that the vast majority of users and use cases would be satisfied if we just had a simple way for adding metrics to the list of default metrics. So, I think we should add a simple new trackMetrics() function (I'd appreciate if someone can think of a better name..) to the k6/metrics package, which can be used like this:

import { trackMetric } from "k6/metrics";

trackMetrics("http_req_connecting", "http_req_tls_handshaking");
trackMetrics("http_req_duration{status:200}", "http_req_duration{staticAsset:yes}", /* ... */ );

Some details:

  • This won't overwrite the default metrics that we'd ship with k6, but will non-destructively add to them
  • With the help of the spread operator or apply(), you can easily pass an array as well.
  • For completeness' sake, we should probably also add an untrackMetric(name) function, just so that users aren't forced to use the metrics option directly.
  • Each custom metric constructor should, by default, implicitly call trackMetric(). This way, all custom metrics would be measured and shown in the test results. Not sure if we need to add a parameter to these constructors to disable this behavior, or if users can just call untrackMetric().
  • Each threshold should also implicitly call trackMetric() for the metric or sub-metric it was based on.
  • Outputs should probably also be able to specify certain metrics that they need, which would also implicitly be added to the default list if that output is enabled. For example, the group_duration metric doesn't make a lot of sense if you just run k6 locally, but it's very useful for the cloud output. That said, this addition feels optional, not a requirement, so I think it should be overrideable by untrackMetric("group_duration").

Implementation-wise, all of these function calls would probably matter only in the goja runtime that's executing the init context to get the exported options, but I'm not sure this is something worth optimizing.

Alternatively, we may be able to go off the trodden path and side-step the current k6 configuration entirely. I'm not sure if this would make the configuration mess (#883) worse, or if it won't affect it at all, but I'm hopeful for the second... In any case, we can probably mostly handle the metrics whitelist as a localized per-VU thing that doesn't touch the global configuration. So, no global metrics key in the exported options, just some local per-VU logic that decides which metrics it's going to emit.

The biggest benefit of the alternative approach I can think of is that we'd get a ton of flexibility. Different VUs will be able to emit different metrics. Even better, a VU could temporarily stop emitting certain metrics, and later resume that emission, which has been requested before: #1298, #884 (comment) , #891 (comment). The biggest issues with this approach I can think of are:

  • How would sub-metrics tracking and un-tracking work? I think we'd need significant refactoring of the way we track sub-metrics with this approach... Though that might not be a bad thing (Refactor the current way metrics are defined #572)
  • We would have no way of knowing which metrics are tracked globally, which might be somewhat problematic in the cloud execution. I don't think this is a huge issue, but I might be wrong. And there might also be a hybrid approach, where we have a global metrics option, but VUs can control metric emission locally. 🤷‍♂️

Open questions:

  • Which approach is better? I think the second non-global approach at least deserves an investigation. Or is a hybrid approach also possible?
  • What hidden implementation challenges does this involve? I don't want to mess up the k6 config any more than it already is (Configuration issues #883)...
  • Is this the best name or format for the option?
  • How we should handle the partial aggregation of HTTP metrics we send to the cloud, if we don't measure some of them by default anymore.
  • Should we invest in developing a more powerful rule definition for sub-metrics? And if so, when? I explained the problems and limitations of the current approach in Allow more flexible sub-metrics and better UX in thresholds #1313, and despite what I mentioned in that issue, improving these rules can be done separately from improvements of the threshold parsing (Improve threshold parsing and errors #961) - they're not actually connected. That said, I think these improvements should probably be done in a separate PR in the future, but we should be conscious that the current way of defining sub-metrics has to be improved and plan accordingly...
@markjmeier
Copy link

markjmeier commented Jan 28, 2020

I think the trackMetric with an untrackMetric functionality makes the most sense to me. To me, it should mean:

  1. untrackMetric() = Throw away, don't save the metric. We need to be careful with submetrics though. The total value should be a part of the main metric. e.g. if blocked has a value and we untrack it, the value of blocked is still a part of the total response time and should be included in that value. You just lose visibility to the granular value.
  2. trackMetric() = I specifically care about this metric and want it highlighted at the end of test summary and/or in the LI cloud(analysis tab). I'm tracking it, therefore it's more important to me and should be highlighted (custom metrics are currently added to the small charts in analysis tab automatically).

In short, use untrack to remove defaults from view, and track to add more things needing to be tracked.

Reading the original question on the community board, would a trend custom metric work here? That emits at the end of the test shows in terminal as part of the end of test summary.

@na--
Copy link
Member Author

na-- commented Jan 29, 2020

I think the trackMetric with an untrackMetric functionality makes the most sense to me. To me, it should mean:

  1. untrackMetric() = Throw away, don't save the metric. We need to be careful with submetrics though. The total value should be a part of the main metric. e.g. if blocked has a value and we untrack it, the value of blocked is still a part of the total response time and should be included in that value. You just lose visibility to the granular value.

Yeah, sub-metrics would require some more thought... For example, if you untrack the "parent" metric, but explicitly track a few submetrics, should that work? My instinctive answer is "no", but I can see pros and cons for both approaches, so we need to consider that a bit more carefully.

And maybe we should have separate functions that deal with submetrics? This would also make implementing the rule matching enhancements from #1313 easier... Or, better yet, expose a proper object with methods to query, track and untrack things?

  1. trackMetric() = I specifically care about this metric and want it highlighted at the end of test summary and/or in the LI cloud(analysis tab). I'm tracking it, therefore it's more important to me and should be highlighted (custom metrics are currently added to the small charts in analysis tab automatically).

This makes sense, though it would probably greatly complicate the implementation, both in k6 and in the cloud. We'd need some mechanism to communicate these changes to the cloud, some sort of a meta-metric perhaps? And if we implement the metric tracking/untracking as a local per-VU thing, so that you can untrack some metrics midway through the VU execution, it would be difficult to square that with what we display in the test run view...

Reading the original question on the community board, would a trend custom metric work here? That emits at the end of the test. Maybe I'm missing something.

You're right, this is a valid workaround, I posted it in the forum. Not sure what you mean by "That emits at the end of the test." though 😕

@mstoykov
Copy link
Contributor

I don't think the "flexible" approach is actually what we want(or at least I see no real benefits to it):

  1. What is the point of you being able to say:
res = http.get("something");
If (rest.status == 404) {
  trackMetric('http_req_duration{"url": "'+res.url+'"}');
}

I mean I guess it is powerful but do you think that will work - it won't (or it is going to miss data) and if you can't do that having it as an option localizes it one place instead of giving it the ability to be hard to find in big code bases.

  1. In the above example ... if I did have an error and the metric I defined with incorrect syntax (http_req_duration{"url": "something}).. what should k6 do - throw a warning or stop the execution?
    2.1. I guess if you define a (sub-)metric that seems to be with correct syntax, but never gets any data you will want some warning? This really doesn't change with the "flexible" approach but is still an interesting question. We definitely can check in both cases that a given metric is defined and abort the execution as we clearly won't be able to provide any data about the metric the user wanted.

  2. Are we going to provide the information that metric "something" was actually requested for adding in the final output to any "output" ? I mean the cloud one will probably be interested that someone was interested in the http_req_duration{"tag": "statisAsset"}. And other outputs ... might be interested as well.

IMO this being another option(s) have a lot more benefits:

  1. you can configure it in the config file and reuse it for all your projects
  2. will be a lot more easy to find and won't add a bunch of new code and the problem that even if you can only use trackMetric in the initContext
import { trackMetric } from "k6/metrics";

// metric not defined up to here
trackMetric("custom_metric{tag:something}");

Should trackMetric error out when it's called and no such metric is defined? Or will we allow the metric to be defined after we have said trackMetric, in which case we won't be able to tell the user (easily) where trackMetric was called with unregistered metric.

All of those are no problem in a single file 50-60 lines codebases, but we do have a lot of people with arguably whole frameworks around k6 in which this will add ... IMO unneeded complexity.
3. I propose we have 3 new options keys, instead 1 new option key and 2 functions:

{
  "tracked_metrics": ["metric1","metric2"]
}

mean track only metrics: "metric1", "metric2" and nothing else with a good default value:

{
  "additionally_track_metrics": ["metric3","metric4"]
}

which means to the metrics that already are tracked ... track those as well. This will return an error if combined with the above "tracked_metrics",
and finally

{
  "do_not_track_metrics": ["metric5"]
}

can be combined with additionally_track_metrics (but not if they share the same key), but not tracked_metrics. This way most people will only use additionall_track_metrics and do_not_track_metrics. All of those will be in the same place (we might even add metric option key on top of those ...? Obviously all the names I proposed are up for discussion ... preferably with something less verbose :D

Implementation algorithm:

  1. We parse the options check for syntax errors.
    1.1 If errors report them and stop the execution
  2. If no syntax errors: check that all the metrics(and sub metrics) have their base metric defined
    2.1. if not we can print a very nice message telling which ones are not valid and what are the currently defined metrics with a small explanation about sub-metrics and possibly a link to the documentation.
  3. Profit !!!

I agree that this adds complexity to the configuration .. but not really any more then it already is there and I think this complexity is a lot better than adding 1 key and then 2 functions that we need to take care of on top of. This also can be easily expected in the cloud output at least as all the options are sent either way.

I also think that this should be considered with the future metric aggregation in mind.

Also, I would like to point out that the cloud output might need 1 or more of the metrics either way in order to function properly and I am against the user being able to break the cloud output when they use it and as such we probably will need a mechanism to tell them that they can't disable "http_req_duration" for example. Also, this probably should prevent people from disabling metrics that are used in thresholds?

@imiric
Copy link
Contributor

imiric commented Jan 29, 2020

I agree with the drawbacks of proposal #1, those seem like deal breakers.

The second functional approach is interesting, but it will only complicate scripts and our implementation if we decide to support untrackMetric(), and it won't give the flexibility of changing default tracked metrics (though I suppose those could be untracked if needed).

AFAICS, this is a global runtime setting, and as such it makes more sense to me for it to be configured via a CLI flag / env var. For this to work simply, we probably need to structure the metric names better and for the config option to support wildcards or regex.

Consider:

k6 run --metrics 'iter*,http_req/*,data*' ...

I think this makes intuitive sense as to what is being tracked. Note that we could structure child metrics with / to make this point clearer.

I don't think adding the functionality of untracking metrics mid-test would be worth the added complexity.

@na--
Copy link
Member Author

na-- commented Jan 29, 2020

@mstoykov,

I don't think the "flexible" approach is actually what we want(or at least I see no real benefits to it):

  1. What is the point of you being able to say:
res = http.get("something");
If (rest.status == 404) {
  trackMetric('http_req_duration{"url": "'+res.url+'"}');
}

I mean I guess it is powerful but do you think that will work - it won't (or it is going to miss data) and if you can't do that having it as an option localizes it one place instead of giving it the ability to be hard to find in big code bases.

Hmm seems like I didn't explain my "localized tracking and untracking" idea properly... The above example doesn't make sense, because http.get() would have already sent the metrics it generated. Something like this is what I envision if we have localized tracking:

untrackMetric("http_req_duration");

// A region of code that makes a bunch of requests you don't want
// to track. There are many reasons users may want that - some
// setup() code that's not important, or requests that they know
// are slow and don't want to pollute the rest of the metrics, or
// expected errors, etc.
http.get("https://myapi.com/some-super-slow-call-I-don't-care-about")
// ...

trackMetric("http_req_duration");
  1. In the above example ... if I did have an error and the metric I defined with incorrect syntax (http_req_duration{"url": "something}).. what should k6 do - throw a warning or stop the execution?

As I mentioned above, we probably should have separate methods for submetrics, so we don't rely quite so much on arcane syntax. In any case, I think we should just do the normal thing in such a situation and throw an exception 😉.

2.1. I guess if you define a (sub-)metric that seems to be with correct syntax, but never gets any data you will want some warning? This really doesn't change with the "flexible" approach but is still an interesting question. We definitely can check in both cases that a given metric is defined and abort the execution as we clearly won't be able to provide any data about the metric the user wanted.

I'm not sure aborting the execution makes sense - after all, a metric can receive data points even at the very end of the execution, in the teardown() 😉.

And while in a lot of cases it would be a good UX to emit a warning if a metric was (explicitly or implicitly) tracked, but didn't get any data, there are cases where it doesn't make sense... For example, say that you have an error Counter, but it only receives values when there's an error. We obviously shouldn't show an error/warning if the error counter didn't receive any metrics... 😄

So, I think the reasonable approach here is that, if the user implicitly or explicitly tracked a metric, we should show it in the end-of-test summary, even if it's to show 0. Pending any explicit template manipulations, of course (#1319).

  1. Are we going to provide the information that metric "something" was actually requested for adding in the final output to any "output" ? I mean the cloud one will probably be interested that someone was interested in the http_req_duration{"tag": "statisAsset"}. And other outputs ... might be interested as well.

I think yes... eventually. It would be better UX, but it seems like we can implement it as a separate PR, after we implement the rest. That's what I meant by "This makes sense, though it would probably greatly complicate the implementation, both in k6 and in the cloud." above...

IMO this being another option(s) have a lot more benefits:

  1. you can configure it in the config file and reuse it for all your projects

If you mean "configure an array and reuse that", this has all of the drawbacks I mentioned above under "but it has some big drawbacks"... Even if we use a proper option to configure this, I'd still like to heavily discourage this approach.

And if you mean trackMetrics()/untrackMetrics(), then users can reuse their configuration,regardless of which aproach we choose - a proper global option or a local per-VU thing. Here's how it can work in both approaches:

import { trackMetrics } from "k6/metrics";
import { myImportantMetricsArray } from "./some-common-file.js"

trackMetrics(...myImportantMetricsArray);

The only difference is that with the local per-VU approach, you have the added flexibilty.

  1. will be a lot more easy to find and won't add a bunch of new code and the problem that even if you can only use trackMetric in the initContext

I don't understand what you mean here, sorry.

import { trackMetric } from "k6/metrics";

// metric not defined up to here
trackMetric("custom_metric{tag:something}");

Should trackMetric error out when it's called and no such metric is defined? Or will we allow the metric to be defined after we have said trackMetric, in which case we won't be able to tell the user (easily) where trackMetric was called with unregistered metric.

trackMetrics is in a sense "registering" this metric... The way metrics work in k6, calling a custom metric constructor at the moment doesn't do or register anything globally... So, I think this should change, and trackMetrics (and custom metrics constructors that would implicitly call it) would actually create a record somewhere. As I said, I think a nice balance should be that, if you explicitly or implicitly track a metric, it will be displayed in the end-of-test summary (and in the cloud, eventually).

All of those are no problem in a single file 50-60 lines codebases, but we do have a lot of people with arguably whole frameworks around k6 in which this will add ... IMO unneeded complexity.
3. I propose we have 3 new options keys, instead 1 new option key and 2 functions:

{
  "tracked_metrics": ["metric1","metric2"]
}

mean track only metrics: "metric1", "metric2" and nothing else with a good default value:

{
  "additionally_track_metrics": ["metric3","metric4"]
}

which means to the metrics that already are tracked ... track those as well. This will return an error if combined with the above "tracked_metrics",
and finally

{
  "do_not_track_metrics": ["metric5"]
}

can be combined with additionally_track_metrics (but not if they share the same key), but not tracked_metrics. This way most people will only use additionall_track_metrics and do_not_track_metrics. All of those will be in the same place (we might even add metric option key on top of those ...? Obviously all the names I proposed are up for discussion ... preferably with something less verbose :D

Implementation algorithm:

1. We parse the `options` check for syntax errors.
   1.1 If errors report them and stop the execution

2. If no syntax errors: check that all the metrics(and sub metrics) have their base metric defined
   2.1. if not we can print a very nice message telling which ones are not valid and what are the currently defined metrics with a small explanation about sub-metrics and possibly a link to the documentation.

3. Profit !!!

I agree that this adds complexity to the configuration .. but not really any more then it already is there and I think this complexity is a lot better than adding 1 key and then 2 functions that we need to take care of on top of. This also can be easily expected in the cloud output at least as all the options are sent either way.

I very much dislike this approach... 😅 To me it seems to be more complicated for users, more complicated in terms of #883, and less flexible than either of the approaches that used functions. And I don't exaggerate the user-facing complexity, it literally took me 3 readings to understand what you're proposing, and I'm still probably misunderstanding something. For example:

This will return an error if combined with the above "tracked_metrics"

Wat, why?

I also think that this should be considered with the future metric aggregation in mind.

Agree, though I don't currently see a lot of overlap between these things. That is, I don't see how any metric whitelist/blacklist option, or even its lack, would affect the aggregation. The most relevant thing I can think of is the aforementioned meta-metric that could be used to notify outputs of the existence of metrics/sub-metrics. This probably could be reused for aggregation "announcements"/options...

Also, I would like to point out that the cloud output might need 1 or more of the metrics either way in order to function properly and I am against the user being able to break the cloud output when they use it and as such we probably will need a mechanism to tell them that they can't disable "http_req_duration" for example.

Please refer to the "Outputs should probably also be able to specify certain metrics that they need" part from my initial proposal..

Also, this probably should prevent people from disabling metrics that are used in thresholds?

Ditto, "Each threshold should also implicitly call trackMetric()..." in the original post/

@imiric,

The second functional approach is interesting, but it will only complicate scripts and our implementation if we decide to support untrackMetric(), and it won't give the flexibility of changing default tracked metrics (though I suppose those could be untracked if needed).

Whether we're using a global option or a local per-VU metric tracking, the complexity, performance, and logic of the code wouldn't be substantially changed. Only, in one case you'd execute that logic somewhat centrally, and in the other case you'd do it independently in each VU. Thinking about it some more, the local option might even be a bit more performant...

Also, we're not doing something wildly dissimilar even now. Refer to the stats.PushIfNotDone(ctx, state.Samples, ...) calls sprinkled in the k6 code. The ctx and state argument in these invocations are VU-level variables...

AFAICS, this is a global runtime setting, and as such it makes more sense to me for it to be configured via a CLI flag / env var.

I'm not against having a global CLI flag / env var., but this has the same drawbacks of my first global metrics option proposal.

For this to work simply, we probably need to structure the metric names better and for the config option to support wildcards or regex.

Consider:

k6 run --metrics 'iter*,http_req/*,data*' ...

I think this makes intuitive sense as to what is being tracked. Note that we could structure child metrics with / to make this point clearer.

I'd have very much liked if we could do something like this... 😞 Unfortunately, because metrics in k6 scripts can originate organically in the middle of the script, we can't know all of their names in advance. "Knowing the metric names in advance" is half of the idea for having an explicit whitelist...

So, since we can't know all possible metric names before the script starts, this approach would be very costly in terms of CPU usage, because we'd have to save all of these wildcard patterns without resolving them. Then, whenever a metric is emitted, we'd have to validate its name against them. Even if we used tries, it seems to me like this would be too CPU heavy when you have to do it potentially a several million times per test run 😞 The alternative of having a simple whitelist/blacklist would just be a fast lookup in a small map.

I don't think adding the functionality of untracking metrics mid-test would be worth the added complexity.

I'm more and more of the opposite opinion... 😅 The thing is that the more I consider the alternatives, the more appealing the local per-VU metric tracking and untracking becomes... Or, at the most, the hybrid approach.

My reasons are twofold:

  1. Implementing it is not going to be complicated. My only concern is how we would deal with sub-metrics, and most of that is not the actual implementation, but what would be the most consistent and logical interface we can present to users...
  2. We would eventually want to add functionality to suppress metric emission anyway... There are way too many valid use cases for something like it. So, the question is, should we add some other, completely different, mechanism to implement something like that (Ideas for improving the group() built-in function #884 (comment) for example, or even @sniku's ideas in Proposal for adding http_req_errors metric #1311), or should we have a single consistent, flexible and logical interface for dealing with all "metrics emission" use cases? To me, the latter is the obviously preferable choice, both in terms of implementation complexity for us (way less corner cases), and in terms of UX (users only need to understand one relatively simple thing).

@gygitlab
Copy link

Hey. I was the one who asked the question over on Community.

Thanks for raising this and driving it forward. To explain a little more where this is coming from the problem we were trying to solve was to succinctly show more detail about different endpoints being called in the same tests so we can see which ones are slower, etc... The tag + threshold functionality does this nicely for us without having to add in a lot of custom metrics for each endpoint (and then multiple thereof for each data point we wanted to track).

All of the http_req_* "minor" metrics probably shouldn't be enabled by default, instead we should probably have only http_req_duration, which is sending + waiting + receiving

Also agree with this except that we're also using http_req_waiting to monitor time to first byte so it would be good to have this kept or a TTFB metric added instead.

Many thanks again for driving this. Really happy with k6 and excited to see where it goes next.

@na--
Copy link
Member Author

na-- commented Jan 29, 2020

Thanks for chiming in!

To explain a little more where this is coming from the problem we were trying to solve was to succinctly show more detail about different endpoints being called in the same tests so we can see which ones are slower, etc... The tag + threshold functionality does this nicely for us without having to add in a lot of custom metrics for each endpoint (and then multiple thereof for each data point we wanted to track).

Do you think the approach of enumerating these endpoints and calling something like trackMetric("http_req_ttfb{<url-or-some-endpoint-tag>:<endpoint-url-or-id>}"); for each endpoint would work well for you? Or do you have any better suggestions?

All of the http_req_* "minor" metrics probably shouldn't be enabled by default, instead we should probably have only http_req_duration, which is sending + waiting + receiving

Also agree with this except that we're also using http_req_waiting to monitor time to first byte so it would be good to have this kept or a TTFB metric added instead.

TTFB definitely seems like a more useful metric than most of the ones we show right now... We currently measure and display the fundamental parts of the HTTP request making process, connecting, tls_handshaking, sending, waiting and receiving, in that order. We also show blocked, which is somewhat separate, is often misinterpreted, and isn't very useful right now... though it might become more so in the future (#1169).

The only composite metric we currently show is http_req_duration, which is sending + waiting + receiving. This is useful precisely because it ignores all connection establishment times, making it very comparable across all requests in a load test. If you haven't disabled the default k6 connection reuse, the initial requests would have some non-zero connecting and tls_handshaking times, while the subsequent requests would have these metrics be 0. So, if you compare the total times of these requests, you'd be comparing apples to oranges, and that's one of the reasons http_req_duration exists - you could compare its values across all requests in your load test.

That said, it's obvious that network delays shouldn't be ignored. So I'd be perfectly fine with adding a new composite metric we show by default called http_req_ttfb. It should probably be connecting + tls_handshaking + sending + waiting, since I think that's the most common TTFB interpretation, right? This would still be very useful, just with the understanding that connection reuse would likely heavily affect it.

But I think we, by default, should display the "minor" constituent metrics only if users explicitly request it, or have thresholds based on them. I think that, in most use cases, you'd only want to look at these if you're debugging an issue? And it'd be much easier to initially spot an issue with the composite metrics, since they can surface them much more easily and visibly, I think... And if http_req_waiting is precisely what you want and need anyway, having a single trackMetric("http_req_waiting") in the script hopefully isn't a huge burden 😅

@na--
Copy link
Member Author

na-- commented Jan 29, 2020

We can very easily add such a helper function as a simple and fool-proof way to track/untrack metrics per-VU. This way there wouldn't be any issues with surviving exceptions and code bugs, would be to have a helper API like this:

withUntrackedMetrics(["http_req_duration{status:403}", "http_req_ttfb"], function() {
   // some user code that doesn't want to measure the metrics above, 
   // but may throw an exception without messing the state
})

Basically, this is like group(), but for tracking/untracking metrics. I'd argue that if group isn't complicated, this isn't complicated either 😉

@gygitlab
Copy link

Do you think the approach of enumerating these endpoints and calling something like trackMetric("http_req_ttfb{:}"); for each endpoint would work well for you? Or do you have any better suggestions?

Hmm that seems fine to me at first glance yeah.

But I think we, by default, should display the "minor" constituent metrics only if users explicitly request it, or have thresholds based on them. I think that, in most use cases, you'd only want to look at these if you're debugging an issue? And it'd be much easier to initially spot an issue with the composite metrics, since they can surface them much more easily and visibly, I think... And if http_req_waiting is precisely what you want and need anyway, having a single trackMetric("http_req_waiting") in the script hopefully isn't a huge burden 😅

Yeah I can agree on this. What I meaning though was you remove the minor metrics and leave http_req_duration but also bring in a new TTFB metric to go alongside it as I'd consider those the be the real metrics most folks will want to follow.

@TamiTakamiya
Copy link
Contributor

I am also very interested in this feature primarily because the disk usage of our InfluxDB increases > 1 GB/hour when our test is running. Unless we pay extra attention, the disk usage could reach to the limit (we set it to 100 GB) any time, like at midnight or on weekends. I hope this feature will be implemented in the near future. Thanks.

While waiting, I am thinking of making a workaround for the InfluxDB disk usage issue, i.e. just suppress writing to InfluxDB for some metrics specified through an environment variable. When I could build such a workaround, I will push the code to my forked k6 repository.

@TamiTakamiya
Copy link
Contributor

I could build a workaround described above: TamiTakamiya@566b0e6

This simply suppress saving metrics specified in the environment variable K6_INFLUXDB_METRICS_SUPPRESSED into InfluxDB. As metrics are still being collected internally and appear in the summary report, it won't be a long-term solution, but at least it could resolve our immediate issue.

@na--
Copy link
Member Author

na-- commented Aug 27, 2020

I described a very partial workaround here, for anyone that doesn't want their setup() and teardown() metrics to pollute the end-of-test summary. Implementing this issue, or something like it, is still vitally important, so we fix root cause of the problem, but the per-scenario tags in k6 v0.27.0 offer some wiggle room in the meantime...

@na--
Copy link
Member Author

na-- commented Jan 12, 2021

Another use case for giving users the ability to say "keep track of all submetrics based on some tags": https://community.k6.io/t/how-to-get-failure-statistics/1247

If users could specify that k6 should "keep track of all sets of http_reqs sub-metrics based on the error_code (or status + error) tags", then they'd be able to generate such an HTML report now that we have #1768

@na-- na-- added this to the v0.39.0 milestone Apr 26, 2022
@ppcano
Copy link
Contributor

ppcano commented May 19, 2022

I came to Exclude http requests made in the setup and teardown functions, so I wanted to know how some requests that interact with a third-party service could be ignored.

Given the k6 built-in metrics, the selected HTTP request/s should not affect the general data_* and http_req* metrics.

I read a few proposals. Subscribing to the issue to be informed about the final API proposal.

@BjornPaulsson
Copy link

Subscribed to the issue.
We would need a solution for this as soon as possible where I work.
Is there any time plan for this?

@na--
Copy link
Member Author

na-- commented Nov 3, 2022

No firm plans yet, sorry 😞 Though the metric refactoring we did in the recently released k6 v0.41.0 should hopefully make implementing this feature significantly easier 🤞

I think the biggest blocker right now is the fact that we don't have a good proposal for how the JS API should look like 🤔

@sebastianpujina
Copy link

Subscribed to the issue. Any plans for this? 🤞

@codebien
Copy link
Contributor

codebien commented May 9, 2023

Hi @sebastianpujina, this feature is one of our high priorities in our backlog for the Metrics area. In the next cycle, we could consider starting to work on a JavaScript API for it.

Remember to upvote the issue using the thumbs-up reaction so it will be balanced appropriately during the evaluation phase.

Currently, there isn't any update to the @na--'s #1321 (comment).

@metaturso
Copy link

We can very easily add such a helper function as a simple and fool-proof way to track/untrack metrics per-VU. This way there wouldn't be any issues with surviving exceptions and code bugs, would be to have a helper API like this:

withUntrackedMetrics(["http_req_duration{status:403}", "http_req_ttfb"], function() {
   // some user code that doesn't want to measure the metrics above, 
   // but may throw an exception without messing the state
})

Having the ability to disable or discard specific metrics collected within a block directly from the JavaScript API would be incredibly useful. Are there any plans to implement this feature, or something along those lines?

One practical example where this would be beneficial is implementing a function that retries a request until it succeeds. This would allow us to avoid hardcoding the status code in threshold names e.g., http_req_duration{name:foo bar, status:200}, and keeps the retry logic nicely tucked in inside the retry function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests