From 3cfd5ada680c3177c147e1894dcb0b4b1ce5e6c9 Mon Sep 17 00:00:00 2001 From: Jay DeLuca Date: Tue, 5 Mar 2024 11:57:55 -0500 Subject: [PATCH 1/2] Rework metric signal overview (#3916) --- specification/overview.md | 98 ++++++++++++++++++--------------------- 1 file changed, 46 insertions(+), 52 deletions(-) diff --git a/specification/overview.md b/specification/overview.md index ca6ef858fdb..c8e2a302db1 100644 --- a/specification/overview.md +++ b/specification/overview.md @@ -22,10 +22,9 @@ weight: 1 * [Links between spans](#links-between-spans) - [Metric Signal](#metric-signal) * [Recording raw measurements](#recording-raw-measurements) - + [Measure](#measure) - + [Measurement](#measurement) - * [Recording metrics with predefined aggregation](#recording-metrics-with-predefined-aggregation) + + [Instruments](#instruments) * [Metrics data model and SDK](#metrics-data-model-and-sdk) + + [Views](#views) - [Log Signal](#log-signal) * [Data model](#data-model) - [Baggage Signal](#baggage-signal) @@ -218,65 +217,53 @@ scenarios. ## Metric Signal -OpenTelemetry allows to record raw measurements or metrics with predefined -aggregation and a [set of attributes](./common/README.md#attribute). +OpenTelemetry allows recording raw measurements or metrics with predefined +aggregations and a [set of attributes](common/README.md#attribute). -Recording raw measurements using OpenTelemetry API allows to defer to end-user -the decision on what aggregation algorithm should be applied for this metric as -well as defining attributes (dimensions). It will be used in client libraries like -gRPC to record raw measurements "server_latency" or "received_bytes". So end -user will decide what type of aggregated values should be collected out of these -raw measurements. It may be simple average or elaborate histogram calculation. - -Recording of metrics with the pre-defined aggregation using OpenTelemetry API is -not less important. It allows to collect values like cpu and memory usage, or -simple metrics like "queue length". +Using the OpenTelemetry API to record raw measurements gives end-users the +flexibility to choose the aggregation algorithm for a given metric. This functionality +is particularly useful in client libraries such as gRPC, where it enables the +recording of raw measurements like "server_latency" or "received_bytes." End-users +then have the autonomy to decide on the aggregation method for these raw measurements, +options for which range from straightforward averages to more complex histogram calculations. ### Recording raw measurements -The main classes used to record raw measurements are `Measure` and -`Measurement`. List of `Measurement`s alongside the additional context can be -recorded using OpenTelemetry API. So user may define to aggregate those -`Measurement`s and use the context passed alongside to define additional -attributes of the resulting metric. - -#### Measure - -`Measure` describes the type of the individual values recorded by a library. It -defines a contract between the library exposing the measurements and an -application that will aggregate those individual measurements into a `Metric`. -`Measure` is identified by name, description and a unit of values. - -#### Measurement - -`Measurement` describes a single value to be collected for a `Measure`. -`Measurement` is an empty interface in API surface. This interface is defined in -SDK. +The primary components involved in recording raw measurements using the OpenTelemetry +API are `Measurement`, `Instrument` and `Meter`. A `Meter` is obtained from a +`MeterProvider` and used to create an `Instrument`, which is then responsible for capturing +[measurements](metrics/api.md#measurement). -### Recording metrics with predefined aggregation - -The base class for all types of pre-aggregated metrics is called `Metric`. It -defines basic metric properties like a name and attributes. Classes inheriting from -the `Metric` define their aggregation type as well as a structure of individual -measurements or Points. API defines the following types of pre-aggregated -metrics: +``` ++------------------+ +| MeterProvider | +-----------------+ +--------------+ +| Meter A | Measurements... | | Metrics... | | +| Instrument X +-----------------> In-memory state +-------------> MetricReader | +| Instrument Y | | | | | +| Meter B | +-----------------+ +--------------+ +| Instrument Z | +| ... | +-----------------+ +--------------+ +| ... | Measurements... | | Metrics... | | +| ... +-----------------> In-memory state +-------------> MetricReader | +| ... | | | | | +| ... | +-----------------+ +--------------+ ++------------------+ +``` -- Counter metric to report instantaneous measurement. Counter values can go - up or stay the same, but can never go down. Counter values cannot be - negative. There are two types of counter metric values - `double` and `long`. -- Gauge metric to report instantaneous measurement of a numeric value. Gauges can - go both up and down. The gauges values can be negative. There are two types of - gauge metric values - `double` and `long`. +#### Instruments -API allows to construct the `Metric` of a chosen type. SDK defines the way to -query the current value of a `Metric` to be exported. +[Instruments](metrics/api.md#instrument) are used to report `Measurement`s, and are identified +by a name, kind, description and a unit of values. -Every type of a `Metric` has it's API to record values to be aggregated. API -supports both - push and pull model of setting the `Metric` value. +There are several types of metric instruments for specific use cases, such as counters for +incrementing values, gauges for capturing current values, and histograms for capturing +distributions of measurements. Instruments can be synchronous, meaning that they are invoked +inline by application logic, or asynchronous where the user registers a callback +function that is invoked on demand by the SDK. ### Metrics data model and SDK -Metrics data model is [specified here](metrics/data-model.md) and is based on +The Metrics data model is [specified here](metrics/data-model.md) and is based on [metrics.proto](https://github.com/open-telemetry/opentelemetry-proto/blob/master/opentelemetry/proto/metrics/v1/metrics.proto). This data model defines three semantics: An Event model used by the API, an in-flight data model used by the SDK and OTLP, and a TimeSeries model which @@ -284,7 +271,7 @@ denotes how exporters should interpret the in-flight model. Different exporters have different capabilities (e.g. which data types are supported) and different constraints (e.g. which characters are allowed in attribute -keys). Metrics is intended to be a superset of what's possible, not a lowest +keys). Metrics is intended to be a superset of what's possible, not the lowest common denominator that's supported everywhere. All exporters consume data from Metrics Data Model via a Metric Producer interface defined in OpenTelemetry SDK. @@ -297,6 +284,13 @@ from the backend. See [Metrics Data Model Specification](metrics/data-model.md) for more information. +#### Views + +[Views](metrics/sdk.md#view) are configurations that specify how the data from an `Instrument` should be processed, +aggregated, and exported. They can be applied globally through the `MeterProvider` or more +specifically at the `Meter` level. A `View` allows the customization of metric data beyond the default +collection behavior, enabling specific aggregations, transformations, and filtering of metrics. + ## Log Signal ### Data model From 4f71c1642e541220e293bacf8c0d58f1fbcd3111 Mon Sep 17 00:00:00 2001 From: jack-berg <34418638+jack-berg@users.noreply.github.com> Date: Wed, 6 Mar 2024 10:40:29 -0600 Subject: [PATCH 2/2] Prohibit attribute value from evolving to contain complex types (#3858) If we aren't going to accept complex attribute types (#2888) we should explicitly rule them out of future designs. Doing so cements the idea that attributes are "metadata" instead of "data", since if attributes were data, we would not want to artificially limit their structure. Once its clear that attributes are metadata and restricted to a limited set of types, its easy to determine that use cases which require complex types (like event payloads) should seek to put the data elsewhere (like in a log record body). While I was in favor of supporting complex attribute types (#2888) I believe its more important that we commit one way or the other. The uncertainty around the question of whether this type of evolution will occur has muddied the waters of several related conversations. There was consensus on codifying this in the 1/30/24 spec SIG meeting. We should capitalize on this momentum and get this over the finish line. Stalling out just to revisit this same debate in the future is a bad use of time. --- specification/common/README.md | 35 +++++++++++++++++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-) diff --git a/specification/common/README.md b/specification/common/README.md index a9d993b3b18..f4c22c3fdb0 100644 --- a/specification/common/README.md +++ b/specification/common/README.md @@ -16,6 +16,7 @@ path_base_for_github_subdir: - [Attribute](#attribute) + * [Standard Attribute](#standard-attribute) * [Attribute Limits](#attribute-limits) + [Configurable Parameters](#configurable-parameters) + [Exempt Entities](#exempt-entities) @@ -33,7 +34,7 @@ An `Attribute` is a key-value pair, which MUST have the following properties: - The attribute key MUST be a non-`null` and non-empty string. - Case sensitivity of keys is preserved. Keys that differ in casing are treated as distinct keys. -- The attribute value is either: +- The attribute value is either[1]: - A primitive type: string, boolean, double precision floating point (IEEE 754-1985) or signed 64 bit integer. - An array of primitive type values. The array MUST be homogeneous, i.e., it MUST NOT contain values of different types. @@ -65,6 +66,38 @@ See [Requirement Level](https://github.com/open-telemetry/semantic-conventions/b See [this document](attribute-type-mapping.md) to find out how to map values obtained outside OpenTelemetry into OpenTelemetry attribute values. +**[1]**: NOTE: extending the set of attribute value types is a breaking change. +This was decided after extensive debate, with arguments as follows: + +* Limiting the types of attribute values to a set which has proved sufficient + during several years of OpenTelemetry's development is a useful guardrail for + design. In taking additional value types off the table, we narrow the solution + space and have more productive design conversations. +* We proposed extending support for complex value types and received significant + pushback. Removing the bounds significantly increases the burden on data + consumers. Adding additional simple value types doesn't cause the same level + of burden, but these can be encoded using existing primitive types. For + example, datetime can be encoded as a string or 64 bit integer. +* Limiting attribute value types to primitives and arrays of primitives supports + OpenTelemetry's intent that attributes are metadata, and facilitates the + ability for data consumers to create search indexes and perform other + statistical analysis. + +### Standard Attribute + +Attributes are used in various places throughout the OpenTelemetry data model. +We designate the [previous attribute section](#attribute) as the standard +attribute definition, in order to facilitate more intuitive and consistent API / +SDK design. + +The standard attribute definition SHOULD be used to represent attributes in data +modeling unless there is a strong justification to diverge. For example, the Log +Data Model has an extended [attributes](../logs/data-model.md#field-attributes) +definition allowing values of [type `Any`](../logs/data-model.md#type-any). This +reflects that LogRecord attributes are expected to model data produced from +external log APIs, which do not necessarily have the same value type +restrictions as the standard attribute definition. + ### Attribute Limits Execution of erroneous code can result in unintended attributes. If there are no