Skip to content

Commit

Permalink
SDK Trace: Adding span collection limits (#942)
Browse files Browse the repository at this point in the history
* SDK Trace: Adding span collection limits

Fixes #182

To help safeguard against a common error, adding default limits
to the number of events per span in the SDK.

* Adding to the changelog

* Addressing feedback

Adding optional additional configurations for span limits.

Modifying the unmlimited value to -1 to enable restricting collections
to 0 elements.

* Addressing feedback

- modifying logging messaging to SHOULD, to imply configurability
- clarifying collections types rather than using "each"

* Update specification/trace/sdk.md

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>

* Clarifying limits are in elements, not bytes

Ensuring there is no ambiguation between limits being in bytes, or
numbers of elements in a collection.

* Removing customizable span collection limit

Addressing feedback arguing for a non-configurable limit,
as is the case with other limits present in OpenTelemetry.

* fixing lint

* Update specification/trace/sdk.md

Co-authored-by: Nikita Salnikov-Tarnovski <gnikem@gmail.com>

* Update specification/trace/sdk.md

Co-authored-by: Sergey Kanzhelev <S.Kanzhelev@live.com>

* adding collection limits for attributes, links

Adding collection limits for spans in the compliance matrix
for attributes and links as well as events.

* Making collection limits optional

Addressing feedback to make collection limits optional for now. The
motivation is to allow innovation around safe SDK behavior, while
providing a guideline and awareness for the issue.

Reducing the logging on discarded attributes, events, or links to
once per process to further reduce spam.

* Updating log recommendations.

Giving the SDK implementors more freedom in how they choose to log,
while setting guidelines on the frequency.

* adding back limits to spec-compliance-matrix

To help make consumers aware of which SDKS implement span
collection limits.

* Apply suggestions from code review

Co-authored-by: Nikita Salnikov-Tarnovski <gnikem@gmail.com>

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
Co-authored-by: Nikita Salnikov-Tarnovski <gnikem@gmail.com>
Co-authored-by: Sergey Kanzhelev <S.Kanzhelev@live.com>
  • Loading branch information
4 people authored Oct 14, 2020
1 parent 6647ba3 commit 507884f
Show file tree
Hide file tree
Showing 4 changed files with 37 additions and 3 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ New:
([#981](https://github.com/open-telemetry/opentelemetry-specification/pull/981))
- Define PropagationOnly Span to simplify active Span logic in Context
([#994](https://github.com/open-telemetry/opentelemetry-specification/pull/994))
- Add limits to the number of attributes, events, and links in SDK Spans
([#942](https://github.com/open-telemetry/opentelemetry-specification/pull/942))
- Add Metric SDK specification (partial): covering terminology and Accumulator component
([#626](https://github.com/open-telemetry/opentelemetry-specification/pull/626))
- Add `Shutdown` function to `*Provider` SDK ([#1074](https://github.com/open-telemetry/opentelemetry-specification/pull/1074))
Expand Down
6 changes: 6 additions & 0 deletions spec-compliance-matrix.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,9 @@ status of the feature is not known.
|IsRecording becomes false after End | | | | | | | | | | |
|Set status | + | + | + | + | + | + | + | + | + | + |
|Safe for concurrent calls | + | + | + | [-](https://github.com/open-telemetry/opentelemetry-python/issues/1157) | + | + | + | + | + | + |
|events collection size limit | | | | | | | | | | |
|attribute collection size limit | | | | | | | | | | |
|links collection size limit | | | | | | | | | | |
|[Span attributes](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/api.md#set-attributes)|
|SetAttribute | + | + | + | + | + | + | + | + | + | + |
|Set order preserved | + | - | + | + | + | + | + | + | + | + |
Expand Down Expand Up @@ -127,6 +130,9 @@ status of the feature is not known.
|OTEL_EXPORTER_JAEGER_* | | | | [-](https://github.com/open-telemetry/opentelemetry-python/issues/1056) | + | - | - | | - | - |
|OTEL_EXPORTER_ZIPKIN_* | | | | + | | - | - | | - | - |
|OTEL_EXPORTER | | | | [-](https://github.com/open-telemetry/opentelemetry-python/issues/1155) | | | | | | |
|OTEL_SPAN_ATTRIBUTE_COUNT_LIMIT | | | | | | | | | | |
|OTEL_SPAN_EVENT_COUNT_LIMIT | | | | | | | | | | |
|OTEL_SPAN_LINK_COUNT_LIMIT | | | | | | | | | | |

## Exporters

Expand Down
8 changes: 8 additions & 0 deletions specification/sdk-environment-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,14 @@ Additional values can be specified in the respective SDK's documentation, in cas
| OTEL_BSP_MAX_QUEUE_SIZE | Maximum queue size | 2048 | |
| OTEL_BSP_MAX_EXPORT_BATCH_SIZE | Maximum batch size | 512 | Must be less than or equal to OTEL_BSP_MAX_QUEUE_SIZE |

## Span Collection Limits

| Name | Description | Default | Notes |
| ------------------------------- | ------------------------------------ | ------- | ----- |
| OTEL_SPAN_ATTRIBUTE_COUNT_LIMIT | Maximum allowed span attribute count | 1000 | |
| OTEL_SPAN_EVENT_COUNT_LIMIT | Maximum allowed span event count | 1000 | |
| OTEL_SPAN_LINK_COUNT_LIMIT | Maximum allowed span link count | 1000 | |

## OTLP Exporter

See [OpenTelemetry Protocol Exporter Configuration Options](./protocol/exporter.md).
Expand Down
24 changes: 21 additions & 3 deletions specification/trace/sdk.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
* [Sampling](#sampling)
* [Tracer Provider](#tracer-provider)
* [Additional Span Interfaces](#additional-span-interfaces)
* [Limits on Span Collections](#limits-on-span-collections)
* [Span Processor](#span-processor)
* [Span Exporter](#span-exporter)

Expand Down Expand Up @@ -35,7 +36,7 @@ The OpenTelemetry API has two properties responsible for the data collection:
specification](https://www.w3.org/TR/trace-context/#sampled-flag). This flag indicates that the `Span` has been
`sampled` and will be exported. [Span Exporters](#span-exporter) MUST
receive those spans which have `Sampled` flag set to true and they SHOULD NOT receive the ones
that do not.
that do not.

The flag combination `SampledFlag == false` and `IsRecording == true`
means that the current `Span` does record information, but most likely the child
Expand Down Expand Up @@ -240,7 +241,7 @@ Thus, the SDK specification defines sets of possible requirements for
It must also be able to reliably determine whether the Span has ended
(some languages might implement this by having an end timestamp of `null`,
others might have an explicit `hasEnded` boolean).

A function receiving this as argument might not be able to modify the Span.

Note: Typically this will be implemented with a new interface or
Expand All @@ -261,7 +262,24 @@ Thus, the SDK specification defines sets of possible requirements for
that the [span creation API](api.md#span-creation) returned (or will return) to the user
(for example, the `Span` could be one of the parameters passed to such a function,
or a getter could be provided).


## Limits on Span Collections

Erroneous code can add unintended attributes, events, and links to a span. If
these collections are unbounded, they can quickly exhaust available memory,
resulting in crashes that are difficult to recover from safely.

To protect against such errors, SDK Spans MAY discard attributes, links, and
events that would increase the number of elements of each collection beyond
the recommended limit of 1000 elements. SDKs MAY provide a way to change this limit.

If there is a configurable limit, the SDK SHOULD honor the environment variables
specified in [SDK environment variables](../sdk-environment-variables.md#span-collection-limits).

There SHOULD be a log emitted to indicate to the user that an attribute, event,
or link was discarded due to such a limit. To prevent excessive logging, the log
should not be emitted once per span, or per discarded attribute, event, or links.

## Span processor

Span processor is an interface which allows hooks for span start and end method
Expand Down

0 comments on commit 507884f

Please sign in to comment.