-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How is the cardinality limit applied when attributes are being filtered? #3798
Comments
The limit is intended to be against the amount of in memory storage you have. Given that, I believe the filtering MUST be done on your hot-path. The intention is to avoid Denial-of-Service attacks via high cardinality inputs that impact metrics (as many users will attach attributes that come from requests, even our semconv recommend this). I don't think we have any rules about forcing filtering to be in the hot path or afterwards. To me this means you have to limit before you hit any in-memory storage. Whether this means, in Go, you start doing filtering in the hot path, is up to you. I'd argue that the con of having less cardinality than your limit in some cases is better than the alternative of running out of memory :) |
@jsuereth did you mean "Given that, I believe the [limiting] MUST be done on your hot-path"? Not necessarily the filtering, right? |
👍 |
Yep! Limiting MUST be on hot path, you can pick how you filter for best UX + performance. |
@jsuereth what are you thoughts on potential inconsistencies across languages here? If one implementation filters prior to the limiting and another filters afterwards they could produce different outputs. Is consistency here going to be an issue? |
I think regarding cardinality limits we're talking about error scenarios and worst-case behavior. We already have a lot of inconssitencies in how failures are handled due to runtime limitations. We try to be consistent, but when it comes to extraordinary/error scenarios, I think some inconsistencies between SDKs is ok. |
Another way to phrase it -> I think users, if given a choice, would prefer lower o11y overhead per-language over perfect consistency. |
Java checks whether the cardinality exceeds the limit after attribute filtering has occurred. I think this is the right behavior because its the least surprising to users, e.g. its surprising if I set my cardinality limit to 100 but only see 20 series. Both the attribute filter and cardinality limit are mechanisms to manage cardinality. The should work together and not be at odds with each other. |
Doesn't this mean that Java needs to filter every measurement the user makes though? And do so in the "hot path" of telemetry recording? |
Yes. |
I've clarified the SDK cardinality limit in #3856. The spec now says "For a given metric, the cardinality limit is a hard limit on the number of metric points that can be collected during a collection cycle". I also sent another editorial PR to clean up metric points #3906. @MrAlias Do you think this issue can be marked as resolved? (I've provided more info regarding how I want to address a set of problems in #3866). |
@MrAlias is this complete? |
I think this can be closed with the conclusion that Filtering of attributes should be done before applying limits. This requires filtering of attributes in hot path. Some languages may vary in implementation (for perf reasons), but that is considered okay. |
@MrAlias please let me know if you want to reopen it. I haven't seen an update from you after #3798 (comment). |
The conclusion that @jsuereth mentioned above is different than this. This issue may need to be reopened and a change made to the specification if there is still confusion about the permissiveness of what is allowed. |
What part do you see misalignment? Is that one of the below, or different? Filtering of attributes should be done before applying limits (Separately, I am curious to learn more on why filtering in hot-path is considered expensive. If unwanted tags are not dropped in hot path early enough, then one has to process entire set of incoming attributes (sort, de-dup,hash-lookup, all of which have a cost proportional to the number of attributes), and hence it would be less performant...So it feels counter-intuitive how filtering in hot path is more expensive. We have done some optimizations in OTel .NET to do that (https://github.com/open-telemetry/opentelemetry-dotnet/pull/3864/files/3f47828869a6bc42478bd5468b864382d147cff1#r1013461213), and I am in need of implementing the same in a performant way in OTel Rust too, so want to check all possible ways to get high perf We can chat about this offline if that is easier. |
I mistook your statement as meaning there was a normative recommendation to filter prior to applying limits. I think if you are making a non-normative recommendation there, then things look aligned. If this is supposed to be normative, I think we need to codify that in the specification. |
This processing is done outside of the hot-path. It is not performed when the attribute/span is recorded. If designed correctly this can be done concurrently while the API continues to record. |
Got it. Do you think its best if we make a normative statement in the spec like "If SDK is enforcing cardinality limits, it SHOULD be done after filtering of attributes, if any". (It felt obvious to me that filtering should be done prior to cardinality capping (maybe because I was only thinking from .NET implementation!) but based on this issue/discussion, it may be worth adding it to spec!) |
I'm fine either way. The Go SIG was taking the comments in this issue and the lack of specification language to mean there was a decided and intentional ambiguity. If we want to change that and make a normative recommendation, that would be fine as well. The Go SIG would need to evaluate more objectively the trade-offs and decide if we would want to comply with the recommendation. I support unifying behavior, but I also support @jsuereth's point that this is an error situation and ambiguity here is also fine. |
Re-opening since seeing so many comments on a closed issue gives me anxiety 😅. |
I would personally prefer this:
e.g. we're using this behavior in Java instrumentation: #3546 (comment) I also think it gives users the cleanest way to recover from a cardinality limit error condition, by dropping a (high-cardinality) attribute via metric view and getting full fidelity of all the remaining attributes |
I'll send a PR with that wording, and then close this issue with the PR. (assuming PR gets approved by all!) |
I'm fine if we want to have additional clarification in the spec. |
When a view is applied to a metric pipeline that contains an attribute filter, how should the cardinality limit be applied? Prior to filtering attributes, or post?
The answer we come up with will affect the output of user data. For example, if measurements for the following attributes are made:
{path: "/", code: 200, query: ""}
{path: "/", code: 400, query: "user=bob"}
{path: "/admin", code: 200, query: ""}
If an attribute filter is applied so that only the
path
attribute is retained and cardinality limit of3
is set, if the filtering is applied prior to checking the cardinality limit the following attributes will be kept on the output metric streams:{path: "/"}
{path: "admin"}
However, if the cardinality limit is applied prior to filtering the following attributes will be kept on the output metric streams:
{path: "/"}
{otel.metric.overflow: true}
Filter before limiting
Given the cardinality limit feature was added to limit the number of resources and SDK uses during measurements, if filtering is to be applied prior to the cardinality limit being applied it will need to be done in the "hot path" of making a measurement.
Pros:
Cons:
Limit before filtering
Limiting without doing any filtering means the filtering process can be delayed to the collection of metric streams. This is a substantial performance issue given the filtering process will need to run at most
M
times (forM
being the number of distinct attributes recorded) rather thanN
times (forN
being the number of measurements made).Pros:
Cons
otel.metric.overflow
attribute in some cases)The text was updated successfully, but these errors were encountered: