Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limitation on number of attributes, links, events and behavior #182

Closed
lmolkova opened this issue Jul 13, 2019 · 20 comments · Fixed by #942
Closed

Limitation on number of attributes, links, events and behavior #182

lmolkova opened this issue Jul 13, 2019 · 20 comments · Fixed by #942
Labels
area:sdk Related to the SDK needs discussion Need more information before all suitable labels can be applied release:after-ga Not required before GA release, and not going to work on before GA spec:trace Related to the specification/trace directory

Comments

@lmolkova
Copy link
Contributor

In Java SDK implementation, there is a limit and behavior is to discard in FIFO.

The limitations are not part of the spec and behavior when there are more than max is not defined.

@austinlparker
Copy link
Member

Seems like there’s a couple of options:

  • Spec defines a hard max limit, unconfigurable.
  • Spec defines a default max, user-adjustable.
  • Spec defines no limit, exporters can select a limit (config of this limit is an exporter-specific concern)

The third option is the most similar to OpenTracing imo. I think the second option might be the best because there’s always going to be some sort of maximum, might as well put it in a well-known place.

@bogdandrutu bogdandrutu added the area:sdk Related to the SDK label Jul 16, 2019
@discostu105
Copy link
Member

I'd have a slight preference for the third option:

If the limits are user-adjustable via API, aren't we mixing different personas? An application developer (who would set the limit) might not know the exporter and ultimatively the backend that uses the data. The backend might have difficulties with 10 or with 10k attributes. It should be a concern of the exporter/backend only.

@austinlparker
Copy link
Member

That's a good point.

@mtwo
Copy link
Member

mtwo commented Jul 16, 2019

3 also seems like the most reasonable choice to me, though I'm not extremely opinionated about this topic

@iredelmeier iredelmeier added the needs discussion Need more information before all suitable labels can be applied label Jul 30, 2019
@bogdandrutu
Copy link
Member

3 is not a good option because the intent for this is to not grow memory unbounded, this is not to protect the exporter or the backend, is to help Application owners to control how much memory their instrumentation uses.

I think the best compromise is 2, the limit is adjustable via the SDK not via the API (so an instrumentation dev, as you pointed does not know the backend), but the Application owner knows the backend in almost all the situations.

@bogdandrutu bogdandrutu added this to the Alpha v0.3 milestone Sep 30, 2019
@SergeyKanzhelev
Copy link
Member

2 and 3 are almost identical in a sense that if exporter can configure this SDK setting, than it is the same. Multiple exporters will just need to set the max of options and then do additional trimming on their side. There is no way to prevent exporters to configure SDK as long as this setting is available for end user.

@jmacd jmacd removed this from the Alpha v0.3 milestone Jan 22, 2020
@jmacd
Copy link
Contributor

jmacd commented Jan 22, 2020

Removed this from the 0.3 milestone as it's not a blocker. We can add this in the 0.4 milestone.

@bogdandrutu bogdandrutu added the spec:trace Related to the specification/trace directory label Jun 12, 2020
@carlosalberto carlosalberto added the release:required-for-ga Must be resolved before GA release, or nice to have before GA label Jul 2, 2020
@andrewhsu andrewhsu added the priority:p2 Medium priority level label Jul 17, 2020
@iNikem
Copy link
Contributor

iNikem commented Jul 27, 2020

Do we also want to impose/allow to configure some limits on the size of attributes values?

@Oberon00
Copy link
Member

For attribute and label keys, there is discussion about a maximum length (among other things) on #504.

@toumorokoshi
Copy link
Member

2 and 3 are almost identical in a sense that if exporter can configure this SDK setting, than it is the same. Multiple exporters will just need to set the max of options and then do additional trimming on their side. There is no way to prevent exporters to configure SDK as long as this setting is available for end user.

I would argue that 2 provides a significant advantage by also enabling limiting of events enforceable by the SDK itself, which ensures that one cannot accidentally flood memory with hundreds of thousands of attributes before the trace reaches the exporter.

For that reason alone I would argue for #2. Even if exporters end up configuring it, it also covers the don't-cause-memory-issues use case.

Other idea: ulimit-style limits.

As another idea too, you could make the maximum count work similar to ulimits in Linux: the value starts at infinite, you can set it to be lower, but can't set it higher. This ensures that the most attributes that will ever be passed to an exporter is a count that all exporters support.

But reading it through, it sounds like it's just overcomplicating things significantly. Most likely someone can configure their SDK appropriately with complete knowledge of their backends.

@tigrannajaryan
Copy link
Member

I suggest to remove release:required-for-ga label since this is an additive, non-breaking change.

@carlosalberto
Copy link
Contributor

+1 on making this release:after-ga .

@andrewhsu
Copy link
Member

From the issue triage mtg today, i'm changing the label to release:after-ga since it looks like from the comments this can be punted.

@andrewhsu andrewhsu added release:after-ga Not required before GA release, and not going to work on before GA and removed priority:p2 Medium priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA labels Sep 9, 2020
@arminru
Copy link
Member

arminru commented Sep 10, 2020

@tigrannajaryan
It depends on the outcome of this discussion if this is an additive, non-breaking change.
If some users are used to recording 10k events on a span and after updating the SDK suddenly 9k of them are dropped because a "sanity limit" of 1k is introduced to the SDK (to limit memory usage in case of usage errors as @bogdandrutu pointed out, for example), this would be a breaking change in my opinion.
If the spec would clearly state, that at least n number of attributes, links and events (each) are supported to be recorded and exported, this would already give users some kind of guarantee and consumers know what they will have to expect at least.
Having a user-configurable limit to go beyond (or above) the sanity limit imposed by the SDK would be additive and non-breaking.

@toumorokoshi
Copy link
Member

Would setting some relatively arbitrary limit be acceptable, in that case? I do agree that having a sane limit will save some real production headaches, and loss of telemetry (e.g. span attributes) is more desirable than an app crashing due to OOM.

Based on what I've seen 128 would be a good threshold (i've never seen someone go beyond that), but it's very anecdotal.

@arminru
Copy link
Member

arminru commented Sep 10, 2020

@toumorokoshi Exactly. I have no idea on what number can be deemed safe for "all" use cases since I fear we know our users not well enough, however. I've seen requests on our Gitter that people intend to use spans for representing long-running actions that span over multiple hours (days?) and I could expect a higher number of events on these. We should probably aim high to not prevent "valid" (who's to decide what's valid anyway) edge cases and only have it as a fallback for error conditions to prevent production crashes. If the limit we decide on turns out to be too low for certain use cases we haven't thought of, we can always go higher or allow overriding by users in a truly non-breaking manner.

@tigrannajaryan @carlosalberto @SergeyKanzhelev since you voted for moving this out of GA - what do you think?

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Sep 10, 2020

since you voted for moving this out of GA - what do you think?

@arminru I am certainly not against having this capability and believe it is useful but we need to make some sacrifices to cut the scope and deliver the GA.

If some users are used to recording 10k events on a span and after updating the SDK suddenly 9k of them are dropped because a "sanity limit" of 1k is introduced to the SDK (to limit memory usage in case of usage errors as @bogdandrutu pointed out, for example), this would be a breaking change in my opinion.

Some behavioral changes are likely inevitable when we release new versions. Formally, one can call this a breaking change but it is not the sort of the breaking change I think we should worry the most. If we can avoid hard breaking changes that make it impossible to recompile your app with a new version of OpenTelemetry because function signatures are different I think that will likely be good enough.

If we want to absolutely guarantee no changes in the behavior we can set the default to infinity when we introduce the Span data limiting and trimming capability.

[UPDATE] Just to make it clear: I don't advocate for weak compatibility guarantees, but I think we need to strike the right balance between being careful and being able to postpone functionality to after GA. It is very difficult to be completely certain that we will not need to change the behavior after the GA. If we set the absolute certainty of non-breaking behavioral changes after GA as the criteria for the GA then we will slow down a lot and I don't think we can meet our GA timelines.

@toumorokoshi
Copy link
Member

I would say if we can't come to a fast consensus, then postponing makes sense.

Although I think this could be summed up into two changes to the spec:

  • set a common limit across events, links, attributes
  • add the need for logging if any of the above are added and discarded.

And to @arminru's point,we just choose a number that's well beyond 99.99% of use cases. Don't want to throw numbers out without data, but 1000 seems well above the needed amount.

@toumorokoshi
Copy link
Member

If that would be acceptable, I'm happy to file a PR.

@tigrannajaryan
Copy link
Member

If that would be acceptable, I'm happy to file a PR.

Sounds good, please do it.

toumorokoshi added a commit to toumorokoshi/opentelemetry-specification that referenced this issue Sep 13, 2020
Fixes open-telemetry#182

To help safeguard against a common error, adding default limits
to the number of events per span in the SDK.
SergeyKanzhelev added a commit that referenced this issue Oct 14, 2020
* SDK Trace: Adding span collection limits

Fixes #182

To help safeguard against a common error, adding default limits
to the number of events per span in the SDK.

* Adding to the changelog

* Addressing feedback

Adding optional additional configurations for span limits.

Modifying the unmlimited value to -1 to enable restricting collections
to 0 elements.

* Addressing feedback

- modifying logging messaging to SHOULD, to imply configurability
- clarifying collections types rather than using "each"

* Update specification/trace/sdk.md

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>

* Clarifying limits are in elements, not bytes

Ensuring there is no ambiguation between limits being in bytes, or
numbers of elements in a collection.

* Removing customizable span collection limit

Addressing feedback arguing for a non-configurable limit,
as is the case with other limits present in OpenTelemetry.

* fixing lint

* Update specification/trace/sdk.md

Co-authored-by: Nikita Salnikov-Tarnovski <gnikem@gmail.com>

* Update specification/trace/sdk.md

Co-authored-by: Sergey Kanzhelev <S.Kanzhelev@live.com>

* adding collection limits for attributes, links

Adding collection limits for spans in the compliance matrix
for attributes and links as well as events.

* Making collection limits optional

Addressing feedback to make collection limits optional for now. The
motivation is to allow innovation around safe SDK behavior, while
providing a guideline and awareness for the issue.

Reducing the logging on discarded attributes, events, or links to
once per process to further reduce spam.

* Updating log recommendations.

Giving the SDK implementors more freedom in how they choose to log,
while setting guidelines on the frequency.

* adding back limits to spec-compliance-matrix

To help make consumers aware of which SDKS implement span
collection limits.

* Apply suggestions from code review

Co-authored-by: Nikita Salnikov-Tarnovski <gnikem@gmail.com>

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
Co-authored-by: Nikita Salnikov-Tarnovski <gnikem@gmail.com>
Co-authored-by: Sergey Kanzhelev <S.Kanzhelev@live.com>
TuckTuckFloof pushed a commit to TuckTuckFloof/opentelemetry-specification that referenced this issue Oct 15, 2020
carlosalberto pushed a commit to carlosalberto/opentelemetry-specification that referenced this issue Oct 21, 2024
carlosalberto pushed a commit to carlosalberto/opentelemetry-specification that referenced this issue Oct 23, 2024
carlosalberto pushed a commit to carlosalberto/opentelemetry-specification that referenced this issue Oct 31, 2024
* SDK Trace: Adding span collection limits

Fixes open-telemetry#182

To help safeguard against a common error, adding default limits
to the number of events per span in the SDK.

* Adding to the changelog

* Addressing feedback

Adding optional additional configurations for span limits.

Modifying the unmlimited value to -1 to enable restricting collections
to 0 elements.

* Addressing feedback

- modifying logging messaging to SHOULD, to imply configurability
- clarifying collections types rather than using "each"

* Update specification/trace/sdk.md

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>

* Clarifying limits are in elements, not bytes

Ensuring there is no ambiguation between limits being in bytes, or
numbers of elements in a collection.

* Removing customizable span collection limit

Addressing feedback arguing for a non-configurable limit,
as is the case with other limits present in OpenTelemetry.

* fixing lint

* Update specification/trace/sdk.md

Co-authored-by: Nikita Salnikov-Tarnovski <gnikem@gmail.com>

* Update specification/trace/sdk.md

Co-authored-by: Sergey Kanzhelev <S.Kanzhelev@live.com>

* adding collection limits for attributes, links

Adding collection limits for spans in the compliance matrix
for attributes and links as well as events.

* Making collection limits optional

Addressing feedback to make collection limits optional for now. The
motivation is to allow innovation around safe SDK behavior, while
providing a guideline and awareness for the issue.

Reducing the logging on discarded attributes, events, or links to
once per process to further reduce spam.

* Updating log recommendations.

Giving the SDK implementors more freedom in how they choose to log,
while setting guidelines on the frequency.

* adding back limits to spec-compliance-matrix

To help make consumers aware of which SDKS implement span
collection limits.

* Apply suggestions from code review

Co-authored-by: Nikita Salnikov-Tarnovski <gnikem@gmail.com>

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
Co-authored-by: Nikita Salnikov-Tarnovski <gnikem@gmail.com>
Co-authored-by: Sergey Kanzhelev <S.Kanzhelev@live.com>
carlosalberto pushed a commit to carlosalberto/opentelemetry-specification that referenced this issue Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK needs discussion Need more information before all suitable labels can be applied release:after-ga Not required before GA release, and not going to work on before GA spec:trace Related to the specification/trace directory
Projects
None yet
Development

Successfully merging a pull request may close this issue.