From 507884f02618a91247fe3634a8c6c5ac99a32a40 Mon Sep 17 00:00:00 2001 From: Yusuke Tsutsumi Date: Wed, 14 Oct 2020 09:07:50 -0700 Subject: [PATCH] SDK Trace: Adding span collection limits (#942) * SDK Trace: Adding span collection limits Fixes #182 To help safeguard against a common error, adding default limits to the number of events per span in the SDK. * Adding to the changelog * Addressing feedback Adding optional additional configurations for span limits. Modifying the unmlimited value to -1 to enable restricting collections to 0 elements. * Addressing feedback - modifying logging messaging to SHOULD, to imply configurability - clarifying collections types rather than using "each" * Update specification/trace/sdk.md Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> * Clarifying limits are in elements, not bytes Ensuring there is no ambiguation between limits being in bytes, or numbers of elements in a collection. * Removing customizable span collection limit Addressing feedback arguing for a non-configurable limit, as is the case with other limits present in OpenTelemetry. * fixing lint * Update specification/trace/sdk.md Co-authored-by: Nikita Salnikov-Tarnovski * Update specification/trace/sdk.md Co-authored-by: Sergey Kanzhelev * adding collection limits for attributes, links Adding collection limits for spans in the compliance matrix for attributes and links as well as events. * Making collection limits optional Addressing feedback to make collection limits optional for now. The motivation is to allow innovation around safe SDK behavior, while providing a guideline and awareness for the issue. Reducing the logging on discarded attributes, events, or links to once per process to further reduce spam. * Updating log recommendations. Giving the SDK implementors more freedom in how they choose to log, while setting guidelines on the frequency. * adding back limits to spec-compliance-matrix To help make consumers aware of which SDKS implement span collection limits. * Apply suggestions from code review Co-authored-by: Nikita Salnikov-Tarnovski Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> Co-authored-by: Nikita Salnikov-Tarnovski Co-authored-by: Sergey Kanzhelev --- CHANGELOG.md | 2 ++ spec-compliance-matrix.md | 6 ++++++ specification/sdk-environment-variables.md | 8 ++++++++ specification/trace/sdk.md | 24 +++++++++++++++++++--- 4 files changed, 37 insertions(+), 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index da51a64ba8c..0d231aa6a6e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -45,6 +45,8 @@ New: ([#981](https://github.com/open-telemetry/opentelemetry-specification/pull/981)) - Define PropagationOnly Span to simplify active Span logic in Context ([#994](https://github.com/open-telemetry/opentelemetry-specification/pull/994)) +- Add limits to the number of attributes, events, and links in SDK Spans + ([#942](https://github.com/open-telemetry/opentelemetry-specification/pull/942)) - Add Metric SDK specification (partial): covering terminology and Accumulator component ([#626](https://github.com/open-telemetry/opentelemetry-specification/pull/626)) - Add `Shutdown` function to `*Provider` SDK ([#1074](https://github.com/open-telemetry/opentelemetry-specification/pull/1074)) diff --git a/spec-compliance-matrix.md b/spec-compliance-matrix.md index ead8b51d4fe..77d50404d5c 100644 --- a/spec-compliance-matrix.md +++ b/spec-compliance-matrix.md @@ -43,6 +43,9 @@ status of the feature is not known. |IsRecording becomes false after End | | | | | | | | | | | |Set status | + | + | + | + | + | + | + | + | + | + | |Safe for concurrent calls | + | + | + | [-](https://github.com/open-telemetry/opentelemetry-python/issues/1157) | + | + | + | + | + | + | +|events collection size limit | | | | | | | | | | | +|attribute collection size limit | | | | | | | | | | | +|links collection size limit | | | | | | | | | | | |[Span attributes](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/api.md#set-attributes)| |SetAttribute | + | + | + | + | + | + | + | + | + | + | |Set order preserved | + | - | + | + | + | + | + | + | + | + | @@ -127,6 +130,9 @@ status of the feature is not known. |OTEL_EXPORTER_JAEGER_* | | | | [-](https://github.com/open-telemetry/opentelemetry-python/issues/1056) | + | - | - | | - | - | |OTEL_EXPORTER_ZIPKIN_* | | | | + | | - | - | | - | - | |OTEL_EXPORTER | | | | [-](https://github.com/open-telemetry/opentelemetry-python/issues/1155) | | | | | | | +|OTEL_SPAN_ATTRIBUTE_COUNT_LIMIT | | | | | | | | | | | +|OTEL_SPAN_EVENT_COUNT_LIMIT | | | | | | | | | | | +|OTEL_SPAN_LINK_COUNT_LIMIT | | | | | | | | | | | ## Exporters diff --git a/specification/sdk-environment-variables.md b/specification/sdk-environment-variables.md index 9ee48d896eb..7245634b4fb 100644 --- a/specification/sdk-environment-variables.md +++ b/specification/sdk-environment-variables.md @@ -22,6 +22,14 @@ Additional values can be specified in the respective SDK's documentation, in cas | OTEL_BSP_MAX_QUEUE_SIZE | Maximum queue size | 2048 | | | OTEL_BSP_MAX_EXPORT_BATCH_SIZE | Maximum batch size | 512 | Must be less than or equal to OTEL_BSP_MAX_QUEUE_SIZE | +## Span Collection Limits + +| Name | Description | Default | Notes | +| ------------------------------- | ------------------------------------ | ------- | ----- | +| OTEL_SPAN_ATTRIBUTE_COUNT_LIMIT | Maximum allowed span attribute count | 1000 | | +| OTEL_SPAN_EVENT_COUNT_LIMIT | Maximum allowed span event count | 1000 | | +| OTEL_SPAN_LINK_COUNT_LIMIT | Maximum allowed span link count | 1000 | | + ## OTLP Exporter See [OpenTelemetry Protocol Exporter Configuration Options](./protocol/exporter.md). diff --git a/specification/trace/sdk.md b/specification/trace/sdk.md index 09451c1a7a1..6a315d11b6f 100644 --- a/specification/trace/sdk.md +++ b/specification/trace/sdk.md @@ -7,6 +7,7 @@ * [Sampling](#sampling) * [Tracer Provider](#tracer-provider) * [Additional Span Interfaces](#additional-span-interfaces) +* [Limits on Span Collections](#limits-on-span-collections) * [Span Processor](#span-processor) * [Span Exporter](#span-exporter) @@ -35,7 +36,7 @@ The OpenTelemetry API has two properties responsible for the data collection: specification](https://www.w3.org/TR/trace-context/#sampled-flag). This flag indicates that the `Span` has been `sampled` and will be exported. [Span Exporters](#span-exporter) MUST receive those spans which have `Sampled` flag set to true and they SHOULD NOT receive the ones - that do not. + that do not. The flag combination `SampledFlag == false` and `IsRecording == true` means that the current `Span` does record information, but most likely the child @@ -240,7 +241,7 @@ Thus, the SDK specification defines sets of possible requirements for It must also be able to reliably determine whether the Span has ended (some languages might implement this by having an end timestamp of `null`, others might have an explicit `hasEnded` boolean). - + A function receiving this as argument might not be able to modify the Span. Note: Typically this will be implemented with a new interface or @@ -261,7 +262,24 @@ Thus, the SDK specification defines sets of possible requirements for that the [span creation API](api.md#span-creation) returned (or will return) to the user (for example, the `Span` could be one of the parameters passed to such a function, or a getter could be provided). - + +## Limits on Span Collections + +Erroneous code can add unintended attributes, events, and links to a span. If +these collections are unbounded, they can quickly exhaust available memory, +resulting in crashes that are difficult to recover from safely. + +To protect against such errors, SDK Spans MAY discard attributes, links, and +events that would increase the number of elements of each collection beyond +the recommended limit of 1000 elements. SDKs MAY provide a way to change this limit. + +If there is a configurable limit, the SDK SHOULD honor the environment variables +specified in [SDK environment variables](../sdk-environment-variables.md#span-collection-limits). + +There SHOULD be a log emitted to indicate to the user that an attribute, event, +or link was discarded due to such a limit. To prevent excessive logging, the log +should not be emitted once per span, or per discarded attribute, event, or links. + ## Span processor Span processor is an interface which allows hooks for span start and end method