Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-indexable high-cardinality metric tags / metadata #2584

Closed
na-- opened this issue Jul 4, 2022 · 6 comments · Fixed by #2726
Closed

Support non-indexable high-cardinality metric tags / metadata #2584

na-- opened this issue Jul 4, 2022 · 6 comments · Fixed by #2726
Assignees
Labels
breaking change for PRs that need to be mentioned in the breaking changes section of the release notes performance refactor
Milestone

Comments

@na--
Copy link
Member

na-- commented Jul 4, 2022

Currently, the k6 system metric tags are a mix of high-cardinality ones (url, error, vu and iter) and relatively low-cardinality ones (everything else). This has already been a problem multiple times and has forced us to adopt some hacks to work around the issues, e.g. these workarounds in the built-in outputs: 1, 2, 3). On top of that, even the current high-cardinality tags are going be a big problem for the upcoming time series changes (#2580), but we'll need to add even more and higher cardinality tags for things like tracing (#2128), to store the trace and span IDs. Finally, k6 has no information whether user-defined custom tags have high or low cardinality.

So, because of all of these reasons, we'll need to have some way to differentiate between two distinct types of metric sample tags/metadata:

  1. low-cardinality tags that can be used for indexing... so, in the time series mentioned above, in sub-metric thresholds, in certain outputs that do aggregation based on tags, etc.
  2. and tags with high cardinality, containing metadata that's often unique for every measurement / data point

Different outputs will be free to treat these high-cardinality tags/metadata however they want - they can discard it (probably the behavior of outputs like statsd, datadod, prometheus, etc. that work with time series), sample it (e.g. to export some tracing information), transform it (e.g. in influxdb this will correspond to data point fields) or output it directly like they do to tags (e.g. json, csv and other similar outputs can just write it to the disk, since we don't do any indexing or aggregation, just dump raw data). This will finally allow us to deprecate the hacks we currently have.

The exact API changes on how we'll add this for custom tags in k6 are still under consideration, but the currently favored approach is to extend the tags API to support both strings and objects, something like this:

http.get('http://httpbin.test.k6.io/', {
  tags: {
    my_tag: {
      value: "I'm a a non-indexable tag",
      index: false
    },
    more_tag: "backwards-compatible indexable tag",
  }
);

Unfortunately, we will probably have to do a minor breaking change in the k6 core and transform some of the currently problematic (because of their high cardinality) k6 system tags like vu, iter, url into non-indexable ones by default... So thresholds based on them will no longer work. Warnings about this very likely breaking change were added to k6 v0.39.0 (#2583) for scripts that have thresholds on them, and we mention them in the v0.39.0 release notes (#2581).

@na--
Copy link
Member Author

na-- commented Aug 18, 2022

While working on the latest iteration of this, I realized a few issues with the plan to make url an non-indexed tag... 😞 However, I think I have an easy to implement suggestion for a better alternative 🤞

Here are the problems I saw:

  1. The biggest problem is the k6 cloud actually expects a url tag and we can't unilaterally remove this from k6 😞 and probably k6 users who have already configured other external outputs
  2. The second biggest problem is that we don't emit a name tag for websocket metrics, just a url one, in either the current k6/ws code or in the new xk6-websockets one 😞 so we can't unilaterally say that the url is now going to be non-indexed metadata for all metrics...
  3. The third biggest problem is a UX one, setting a threshold on http_req_duration{url:whatever} is kind of nice and logical, whereas name is a k6 term that people first need to be taught about...
  4. Finally, while the k6/net/grpc API has a name tag, its url tag is actually pretty low-cardinality, so it's not a problem for atlas 🎉

So, my proposal to fix all of these issues is this:

  • keep url as an normal (index-able) tag, but always set it equal to the value of the name tag, if name was specified (which is basically what we currently do in the cloud output)
  • if name was not specified, then keep the current behavior and set both url and name to the cleaned-up URL
  • so, the url and name tag values will always be identical, but users will be able to avoid memory runoff effects from high-cardinality metrics by manually setting the name tag
  • however, if the URL value is different from the name tag, then we will save the raw high-cardinality URL value in a new non-indexed raw_url metadata entry, so it's still accessible to users that want it (e.g. in the JSON output)

This will leave us with (mostly) sensible defaults and with an easy way for users to avoid high-cardinality metrics. It will also minimize the disruption to users who already have thresholds on url sub-metrics, only some of them will need to adjust them.

The code for this even looks relatively clean 😅

enabledTags := t.state.Options.SystemTags
cleanURL := URL{u: unfReq.request.URL, URL: unfReq.request.URL.String()}.Clean()
// After k6 v0.41.0, the `name` and `url` tags have the exact same values:
nameTagValue, nameTagManuallySet := ctm.Tags.Get(metrics.TagName.String())
if !nameTagManuallySet {
// If the user *didn't* manually set a `name` tag value and didn't use
// the http.url`` template literal helper to have k6 automatically set
// it (see `lib/netext/httpext.MakeRequest()`), we will use the cleaned
// URL value as the value of both `name` and `url` tags.
ctm.SetSystemTagOrMetaIfEnabled(enabledTags, metrics.TagName, cleanURL)
ctm.SetSystemTagOrMetaIfEnabled(enabledTags, metrics.TagURL, cleanURL)
} else {
// However, if the user set the `name` tag value somehow, we will use
// whatever they set as the value of the `url` tags too, to prevent
// high-cardinality values in the indexed tags.
ctm.SetSystemTagOrMetaIfEnabled(enabledTags, metrics.TagURL, nameTagValue)
// However, if the actual clean-URL value would have been different than
// the `name` value, we will also attach it as the `raw_url` non-indexed
// tag (i.e. add it to the Sample.Metadata).
if nameTagValue != cleanURL {
ctm.SetSystemTagOrMetaIfEnabled(enabledTags, metrics.TagRawURL, cleanURL)
}
}

And if everyone agrees with this approach, we should probably merge a small PR before k6 v0.40.0 is released that rolls back these parts of #2583:

if _, ok := sm.Tags.Get("url"); ok {
me.logger.Warnf("Thresholds like '%s', based on the high-cardinality 'url' metric tag, "+
"are deprecated and will not be supported in future k6 releases. "+
"To prevent breaking changes and reduce bugs, use the 'name' metric tag instead, see"+
"URL grouping (https://k6.io/docs/using-k6/http-requests/#url-grouping) for more information.", name,
)
}

k6/cmd/integration_test.go

Lines 318 to 320 in b89a3aa

assert.True(t, testutils.LogContains(logs, logrus.WarnLevel,
"Thresholds like 'http_req_duration{url:https://test.k6.io}', based on the high-cardinality 'url' metric tag, are deprecated",
))

(edit: created PR for v0.40.0 to remove the warning: #2655)

@mstoykov
Copy link
Contributor

Can you remember the reasons we were particularly against this in the original discussion?

@na--
Copy link
Member Author

na-- commented Aug 18, 2022

No... 😞 I tried to find some discussion on the topic, but I don't think we delved that much into it, we mostly discussed other details.

na-- added a commit that referenced this issue Aug 18, 2022
This was mistakenly added in k6 v0.39.0 (#2583), but it turns out we probably don't need it (see #2584 (comment))
@codebien
Copy link
Contributor

I'm not getting when the url as an indexable tag will be useful per-se, all the cases seem covered by the name tag and when it will be different we will move to Metadata.
For the WebSocket case, I guess is it just a feature that we can add for consistency? Regarding the thresholds, I guess in that case the user would set the name so I'm not sure the UX is better because the threshold will use the low-cardinality value so the name.

@na--
Copy link
Member Author

na-- commented Aug 18, 2022

To put it another way - with #2584 (comment), we get all of the benefits of #2584 (comment), but with a much smaller breaking change.

We get a way to have low-cardinality URLs, users just have to set the name tag or use the http.url helper. But we don't break the majority of existing tests that might use a threshold or sub-metric with the url tag. If a script has just a handful of URLs, you don't need to mess with name, it should just work as it is with url, since it will be low-cardinality. And if it isn't, users will have an easy way to fix it with name.

So, only the people with many URLs or with dynamic URLs need to bother with name, everyone else is fine and doesn't need to change anything. And most people with such scripts probably already use name, so they also don't have to do anything. Even more importantly, the k6 cloud continues to work exactly as before, no changes needed there before we can merge #2654.

In summary, it seems to me that it's a much smaller breaking change with basically the same benefits.

I'm not sure the UX is better

Setting a threshold on http_req_duration{url:https://test.k6.io/whatever} seems like a much more understandable thing for a beginner k6 user than setting a threshold on http_req_duration{name:https://test.k6.io/whatever}. URL is a valid concept outside of k6, while a "name" is not. And for the majority of scripts, this will probably be fine, since they don't have thousands of URLs. The rest will need to understand URL grouping and use name, but this is basically already the case if you want to have a threshold on them... The only change is that now the url tag will always be equal to name, instead of just some of the time.

For the WebSocket case, I guess is it just a feature that we can add for consistency?

Sure, I am not against adding it, but we don't have it yet. So, unless we add it right now and release it in v0.40.0, in both websocket APIs, there will be no transitional k6 version that has both name and url websocket tags for people to gracefully migrate. But it's a moot point, since with my suggestion we don't need such a version 😅

@na--
Copy link
Member Author

na-- commented Oct 6, 2022

After some lengthy internal discussions, we didn't reach consensus if the JS APIs from #2654 are the best one for the job. I'll try to do a longer public summary later, but suffice it to say that there were some strong and valid internal objections when it came to its UX... 😅

However, the need to support some sort of high-cardinality metrics metadata is still there and needs to be handled, so we decided to do these things for now:

  1. Split apart the url == name tag fixes that reduce the cardinality of the url tag from Support non-indexable high-cardinality metric tags/metadata #2654 and merge them separately. This was already done. First as a PR (atlas: url tag value always equal to the name tag #2703) on top of the atlas branch (Refactor metric samples to use Atlas for tags and have time series #2594). Merging it there allowed us to merge the atlas branch in master without a worry that the k6 built-in system tags will cause high cardinality issues.
  2. Spit apart other mostly refactoring and validation parts from Support non-indexable high-cardinality metric tags/metadata #2654, like ApplyCustomUserTags(), but without any Metadata or JS API changes, and merge them separately in master.
  3. Make another PR that adds support for Metadata in every metrics.Sample Go struct, but without any direct access to it from the JS APIs. So, no custom user-specified metadata, for now. However, this Metadata map[string]string value will be usable by the k6 core code (e.g. for things like iter, vu and maybe raw_url) and by xk6 extensions!
  4. (Everything ⬆️ would hopefully make it in k6 v0.41.0 at the end of the month, everything below would be nice to have, but unlikely to make it in time.)
  5. Iterate over possible JS APIs and try to deliver a more user-friendly and less error-prone API than non-indexable tags, i.e. something better than tags: { foo: {value: 'bar', index: false}}. We have some ideas we'll iterate on internally and share in the issue later, but it's likely we'll start with some k6/experimental stand-alone API, instead of trying to thread it in every JS HTTP/WS/gRPC/etc. call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change for PRs that need to be mentioned in the breaking changes section of the release notes performance refactor
Projects
None yet
3 participants