-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limitation on number of attributes, links, events and behavior #182
Comments
Seems like there’s a couple of options:
The third option is the most similar to OpenTracing imo. I think the second option might be the best because there’s always going to be some sort of maximum, might as well put it in a well-known place. |
I'd have a slight preference for the third option: If the limits are user-adjustable via API, aren't we mixing different personas? An application developer (who would set the limit) might not know the exporter and ultimatively the backend that uses the data. The backend might have difficulties with 10 or with 10k attributes. It should be a concern of the exporter/backend only. |
That's a good point. |
3 also seems like the most reasonable choice to me, though I'm not extremely opinionated about this topic |
3 is not a good option because the intent for this is to not grow memory unbounded, this is not to protect the exporter or the backend, is to help Application owners to control how much memory their instrumentation uses. I think the best compromise is 2, the limit is adjustable via the SDK not via the API (so an instrumentation dev, as you pointed does not know the backend), but the Application owner knows the backend in almost all the situations. |
2 and 3 are almost identical in a sense that if exporter can configure this SDK setting, than it is the same. Multiple exporters will just need to set the max of options and then do additional trimming on their side. There is no way to prevent exporters to configure SDK as long as this setting is available for end user. |
Removed this from the 0.3 milestone as it's not a blocker. We can add this in the 0.4 milestone. |
Do we also want to impose/allow to configure some limits on the size of attributes values? |
For attribute and label keys, there is discussion about a maximum length (among other things) on #504. |
I would argue that 2 provides a significant advantage by also enabling limiting of events enforceable by the SDK itself, which ensures that one cannot accidentally flood memory with hundreds of thousands of attributes before the trace reaches the exporter. For that reason alone I would argue for #2. Even if exporters end up configuring it, it also covers the don't-cause-memory-issues use case. Other idea: ulimit-style limits.As another idea too, you could make the maximum count work similar to ulimits in Linux: the value starts at infinite, you can set it to be lower, but can't set it higher. This ensures that the most attributes that will ever be passed to an exporter is a count that all exporters support. But reading it through, it sounds like it's just overcomplicating things significantly. Most likely someone can configure their SDK appropriately with complete knowledge of their backends. |
I suggest to remove |
+1 on making this |
From the issue triage mtg today, i'm changing the label to |
@tigrannajaryan |
Would setting some relatively arbitrary limit be acceptable, in that case? I do agree that having a sane limit will save some real production headaches, and loss of telemetry (e.g. span attributes) is more desirable than an app crashing due to OOM. Based on what I've seen 128 would be a good threshold (i've never seen someone go beyond that), but it's very anecdotal. |
@toumorokoshi Exactly. I have no idea on what number can be deemed safe for "all" use cases since I fear we know our users not well enough, however. I've seen requests on our Gitter that people intend to use spans for representing long-running actions that span over multiple hours (days?) and I could expect a higher number of events on these. We should probably aim high to not prevent "valid" (who's to decide what's valid anyway) edge cases and only have it as a fallback for error conditions to prevent production crashes. If the limit we decide on turns out to be too low for certain use cases we haven't thought of, we can always go higher or allow overriding by users in a truly non-breaking manner. @tigrannajaryan @carlosalberto @SergeyKanzhelev since you voted for moving this out of GA - what do you think? |
@arminru I am certainly not against having this capability and believe it is useful but we need to make some sacrifices to cut the scope and deliver the GA.
Some behavioral changes are likely inevitable when we release new versions. Formally, one can call this a breaking change but it is not the sort of the breaking change I think we should worry the most. If we can avoid hard breaking changes that make it impossible to recompile your app with a new version of OpenTelemetry because function signatures are different I think that will likely be good enough. If we want to absolutely guarantee no changes in the behavior we can set the default to infinity when we introduce the Span data limiting and trimming capability. [UPDATE] Just to make it clear: I don't advocate for weak compatibility guarantees, but I think we need to strike the right balance between being careful and being able to postpone functionality to after GA. It is very difficult to be completely certain that we will not need to change the behavior after the GA. If we set the absolute certainty of non-breaking behavioral changes after GA as the criteria for the GA then we will slow down a lot and I don't think we can meet our GA timelines. |
I would say if we can't come to a fast consensus, then postponing makes sense. Although I think this could be summed up into two changes to the spec:
And to @arminru's point,we just choose a number that's well beyond 99.99% of use cases. Don't want to throw numbers out without data, but 1000 seems well above the needed amount. |
If that would be acceptable, I'm happy to file a PR. |
Sounds good, please do it. |
Fixes open-telemetry#182 To help safeguard against a common error, adding default limits to the number of events per span in the SDK.
* SDK Trace: Adding span collection limits Fixes #182 To help safeguard against a common error, adding default limits to the number of events per span in the SDK. * Adding to the changelog * Addressing feedback Adding optional additional configurations for span limits. Modifying the unmlimited value to -1 to enable restricting collections to 0 elements. * Addressing feedback - modifying logging messaging to SHOULD, to imply configurability - clarifying collections types rather than using "each" * Update specification/trace/sdk.md Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> * Clarifying limits are in elements, not bytes Ensuring there is no ambiguation between limits being in bytes, or numbers of elements in a collection. * Removing customizable span collection limit Addressing feedback arguing for a non-configurable limit, as is the case with other limits present in OpenTelemetry. * fixing lint * Update specification/trace/sdk.md Co-authored-by: Nikita Salnikov-Tarnovski <gnikem@gmail.com> * Update specification/trace/sdk.md Co-authored-by: Sergey Kanzhelev <S.Kanzhelev@live.com> * adding collection limits for attributes, links Adding collection limits for spans in the compliance matrix for attributes and links as well as events. * Making collection limits optional Addressing feedback to make collection limits optional for now. The motivation is to allow innovation around safe SDK behavior, while providing a guideline and awareness for the issue. Reducing the logging on discarded attributes, events, or links to once per process to further reduce spam. * Updating log recommendations. Giving the SDK implementors more freedom in how they choose to log, while setting guidelines on the frequency. * adding back limits to spec-compliance-matrix To help make consumers aware of which SDKS implement span collection limits. * Apply suggestions from code review Co-authored-by: Nikita Salnikov-Tarnovski <gnikem@gmail.com> Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> Co-authored-by: Nikita Salnikov-Tarnovski <gnikem@gmail.com> Co-authored-by: Sergey Kanzhelev <S.Kanzhelev@live.com>
* Introduce remote-parent OTEP
* Introduce remote-parent OTEP
* SDK Trace: Adding span collection limits Fixes open-telemetry#182 To help safeguard against a common error, adding default limits to the number of events per span in the SDK. * Adding to the changelog * Addressing feedback Adding optional additional configurations for span limits. Modifying the unmlimited value to -1 to enable restricting collections to 0 elements. * Addressing feedback - modifying logging messaging to SHOULD, to imply configurability - clarifying collections types rather than using "each" * Update specification/trace/sdk.md Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> * Clarifying limits are in elements, not bytes Ensuring there is no ambiguation between limits being in bytes, or numbers of elements in a collection. * Removing customizable span collection limit Addressing feedback arguing for a non-configurable limit, as is the case with other limits present in OpenTelemetry. * fixing lint * Update specification/trace/sdk.md Co-authored-by: Nikita Salnikov-Tarnovski <gnikem@gmail.com> * Update specification/trace/sdk.md Co-authored-by: Sergey Kanzhelev <S.Kanzhelev@live.com> * adding collection limits for attributes, links Adding collection limits for spans in the compliance matrix for attributes and links as well as events. * Making collection limits optional Addressing feedback to make collection limits optional for now. The motivation is to allow innovation around safe SDK behavior, while providing a guideline and awareness for the issue. Reducing the logging on discarded attributes, events, or links to once per process to further reduce spam. * Updating log recommendations. Giving the SDK implementors more freedom in how they choose to log, while setting guidelines on the frequency. * adding back limits to spec-compliance-matrix To help make consumers aware of which SDKS implement span collection limits. * Apply suggestions from code review Co-authored-by: Nikita Salnikov-Tarnovski <gnikem@gmail.com> Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> Co-authored-by: Nikita Salnikov-Tarnovski <gnikem@gmail.com> Co-authored-by: Sergey Kanzhelev <S.Kanzhelev@live.com>
* Introduce remote-parent OTEP
In Java SDK implementation, there is a limit and behavior is to discard in FIFO.
The limitations are not part of the spec and behavior when there are more than max is not defined.
The text was updated successfully, but these errors were encountered: