Specify cardinality limiting and attribute filtering interaction #3803

MrAlias · 2024-01-04T17:14:25Z

Clarify that the specification allows implementations to choose the best approach for them in implementing cardinality limiting when attributes are being filtered.

cc @jsuereth @jack-berg

Resolve open-telemetry#3798

jmacd · 2024-01-08T17:50:22Z

specification/metrics/sdk.md

+#### Interaction with filtering
+
+It is left unspecified if attribute filtering from a user provided view needs
+to be applied before or after applying the cardinality limit. While this can
+lead to inconsistent telemetry across implementations, this will only happen in
+an error scenario where a cardinality limit is used.


I thought through this --
I couldn't find which aspect of "attribute filtering" and views you are concerned about, and I'd like to understand it.

The key phrase in #2960, to me, is:

Regardless of aggregation temporality, the SDK MUST ensure that every metric event is reflected in exactly one Aggregator, which is either an Aggregator associated with the correct attribute set or an aggregator associated with the overflow attribute set.

The line you're referring to, I think, is:

Implementations MAY accept additional attribute filtering functionality for this parameter.

In my mental model for this, the view is responsible for choosing which aggregators accumulate which metric event. Custom filtering might change the logic used by a View to choose aggregators, but doesn't change the View's role.

The relationship between views and cardinality limits as defined (i.e., either the correct aggregator or the overflow aggregator selected), to me is decoupled from attribute filtering as performed by views.

There is, on purpose (IMO) a lot unspecified about how cardinality limits are to be applied because it is very implementation dependent.

Would you be equally happy with a more general declaration, such as:

Metric cardinality limits are meant as an self protection feature. Implementations are free to use any definition that works for its design, subject to the the two constraints (1) each metric event is aggregated by exactly one aggregator, and (2) when overflow is happening the `otel.metric.overflow` attribute appears on at least one timeseries.

What inconsistency are you considering? Does the inconsistency somehow not happen without attribute filtering?

Does this happen because attribute filters, as described here, are really being used as measurement processors? I think it's time for OTel to add a measurement processor specification, having a clear relationship with cardinality limits. Measurement processors MUST be applied before aggregation so that they are effective/correct control against cardinality limits.

What inconsistency are you considering? Does the inconsistency somehow not happen without attribute filtering?

I have laid out an example of the inconsistency in #3798:

For example, if measurements for the following attributes are made:

{path: "/", code: 200, query: ""}

{path: "/", code: 400, query: "user=bob"}

{path: "/admin", code: 200, query: ""}

If an attribute filter is applied so that only the path attribute is retained and cardinality limit of 3 is set, if the filtering is applied prior to checking the cardinality limit the following attributes will be kept on the output metric streams:

{path: "/"}

{path: "admin"}

However, if the cardinality limit is applied prior to filtering the following attributes will be kept on the output metric streams:

{path: "/"}

{otel.metric.overflow: true}

Would you be equally happy with a more general declaration, such as:

Metric cardinality limits are meant as an self protection feature. Implementations are free to use any definition that works for its design, subject to the the two constraints (1) each metric event is aggregated by exactly one aggregator, and (2) when overflow is happening the `otel.metric.overflow` attribute appears on at least one timeseries.

I think this could work. I am worried that it would be to vague for readers though, they may not understand it to imply it relates to attribute filtering.

Hmm.. I'm sure I agree that it should be left unspecified. The view attribute filter mechanism is meant to be a user control for cardinality, and so I think it would be surprising to users if its interaction with this other cardinality limiting control was ambiguous.

I imagine the workflow being something like:

Run the app with default cardinality limits and views.

Exceed the cardinality limit and notice with otel.metric_overflow and application logs.

Add an attribute filter to remove problematic attributes

Expect to see that the cardinality otel.metric_overflow is gone.

If the cardinality limit applies before view attribute filtering, then view attribute filtering isn't an effective tool to prevent overflow. With a cardinality limit of 2000, I could end up seeing only 1000 series exported and still see the otel.metric_overflow series. This would be quite surprising.

Note that @dashpole brought this up in the #2960 where cardinality limits were originally added:

Being able to correct cardinality problems via Views seems essential, and while this is a step forward, this feels like a big omission. Are views necessarily "late in the pipeline"?

I believe that that the conclusion (based on comment, comment) was to imply that view attribute filtering should happen before cardinality limits are applied.

If the cardinality limit applies before view attribute filtering, then view attribute filtering isn't an effective tool to prevent overflow. With a cardinality limit of 2000, I could end up seeing only 1000 series exported and still see the otel.metric_overflow series. This would be quite surprising.

As by @jsuereth stated in the issue this was discussed, and is resolving (#3798):

[...] regarding cardinality limits we're talking about error scenarios and worst-case behavior. We already have a lot of inconssitencies in how failures are handled due to runtime limitations. We try to be consistent, but when it comes to extraordinary/error scenarios, I think some inconsistencies between SDKs is ok.

If the cardinality limit applies before view attribute filtering, then view attribute filtering isn't an effective tool to prevent overflow.

Agreed, but I don't think it was an effective tool to begin with. Many implementations are not filtering on measurement and it is therefore not preventing memory overflows.

I also don't think we can say those implementations are wrong to not do this. Filtering on measurement is a poorly scaling computation pattern.

I think the only thing that can be said about attribute filtering is that it is an effective tool to prevent overflow of data being sent to a backend. Meaning it is an effective cost reduction tool, not a memory limiting tool in many implementations.

I think the most compelling argument I heard in our discussion today was around user expectations. If users think that attribute filtering would lead to memory limiting, then we have an issue and an argument to specify semantics here to match user expectations.

However, if possible, I'd prefer to allow room for experimentation in implementation. I can see three implementations today:

A PubSub / RingBuffer style hot-path where measurements are pushed into a queue and some aggregation is responsible for pulling them off (OpenCensus did this, I believe C# works like this in some fashion).

"Fast" access to local memory storage from the hot path, where data is directly written to memory (possibly via "fast" atomics, etc.). Java has this design and it tries to quickly reduce contention between threads recording values via two-stage locking (one to reach a particular timeseries, another to write data into that timeseries). We migrated the Java SDK to this style after benchmarking vs. OpenCensus.

A two-stage aggregation process where known storage is allocated ahead of time and metric writes are flushed quickly, and then a follow up aggregation is done collecting the first set of aggregations/timeseries and performing filtering / further aggregation as needed. (@jmacd's design).

In all of these designs we're trying to trade off "slowdown in the hotpath" for "total memory consumption". I think there's still room for us to explore here, so I don't want too rigid of a specification.

Can we guarantee that:

Cardinality Limits => Memory Out of bounds protectoins?

Are we sure that:

Users assume Attribute filters will also limit memory consumption for timeseries?

Users assume Attribute filters will also limit memory consumption for timeseries?

I think so, see open-telemetry/opentelemetry-java-instrumentation#10119 for an example

github-actions · 2024-01-17T03:17:05Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions · 2024-01-25T03:17:02Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

MrAlias · 2024-01-30T17:02:43Z

SIG meeting follow up on this:

Myself, @jsuereth, @jmacd, and @jack-berg: we would like to find a time for us to discuss this synchronously, develop a plan of action, and report back to the spec SIG (via meeting or PR).

I plan to touch base with the people listed via slack in a few hours to find a time that works.

github-actions · 2024-02-07T03:16:42Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions · 2024-02-14T03:16:50Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

trask · 2024-02-14T15:38:30Z

/unstale /reopen 🤞

I'm interested in this from the perspective of #3785, where I think it's important to apply the attributes "filter" advisory before applying the cardinality limit

Specify limiting and filtering interaction

9ffa45a

Resolve open-telemetry#3798

MrAlias added the spec:metrics Related to the specification/metrics directory label Jan 4, 2024

MrAlias requested review from a team January 4, 2024 17:14

github-actions bot assigned yurishkuro Jan 4, 2024

pellared approved these changes Jan 4, 2024

View reviewed changes

MrAlias added 2 commits January 8, 2024 09:14

Merge branch 'main' into limit-filter-interaction

6859b77

Merge branch 'main' into limit-filter-interaction

d296153

jmacd reviewed Jan 8, 2024

View reviewed changes

MrAlias mentioned this pull request Jan 8, 2024

Filter metric time series attributes instead of measurements open-telemetry/opentelemetry-go#4816

Closed

github-actions bot added the Stale label Jan 17, 2024

MrAlias removed the Stale label Jan 17, 2024

github-actions bot added the Stale label Jan 25, 2024

MrAlias mentioned this pull request Jan 29, 2024

Investigate if attribute filtering should be in the instrument or aggregator open-telemetry/opentelemetry-go#3011

Open

carlosalberto removed the Stale label Jan 30, 2024

github-actions bot added the Stale label Feb 7, 2024

trask mentioned this pull request Feb 9, 2024

Stabilize "attributes" instrument advisory parameter #3785

Open

github-actions bot closed this Feb 14, 2024

pellared mentioned this pull request Feb 23, 2024

An alternative explanation of cardinality #3883

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify cardinality limiting and attribute filtering interaction #3803

Specify cardinality limiting and attribute filtering interaction #3803

MrAlias commented Jan 4, 2024

jmacd Jan 8, 2024

MrAlias Jan 8, 2024 •

edited

Loading

MrAlias Jan 8, 2024

jack-berg Jan 8, 2024

MrAlias Jan 8, 2024 •

edited

Loading

MrAlias Jan 8, 2024

jsuereth Jan 9, 2024

trask Jan 9, 2024

github-actions bot commented Jan 17, 2024

github-actions bot commented Jan 25, 2024

MrAlias commented Jan 30, 2024

github-actions bot commented Feb 7, 2024

github-actions bot commented Feb 14, 2024

trask commented Feb 14, 2024

Specify cardinality limiting and attribute filtering interaction #3803

Specify cardinality limiting and attribute filtering interaction #3803

Conversation

MrAlias commented Jan 4, 2024

jmacd Jan 8, 2024

Choose a reason for hiding this comment

MrAlias Jan 8, 2024 • edited Loading

Choose a reason for hiding this comment

MrAlias Jan 8, 2024

Choose a reason for hiding this comment

jack-berg Jan 8, 2024

Choose a reason for hiding this comment

MrAlias Jan 8, 2024 • edited Loading

Choose a reason for hiding this comment

MrAlias Jan 8, 2024

Choose a reason for hiding this comment

jsuereth Jan 9, 2024

Choose a reason for hiding this comment

trask Jan 9, 2024

Choose a reason for hiding this comment

github-actions bot commented Jan 17, 2024

github-actions bot commented Jan 25, 2024

MrAlias commented Jan 30, 2024

github-actions bot commented Feb 7, 2024

github-actions bot commented Feb 14, 2024

trask commented Feb 14, 2024

MrAlias Jan 8, 2024 •

edited

Loading

MrAlias Jan 8, 2024 •

edited

Loading