From c2772630cc91b38932c436d8b6ea633329f8ef5b Mon Sep 17 00:00:00 2001 From: Ashok Chandrasekar Date: Fri, 31 May 2024 07:28:46 +0000 Subject: [PATCH 01/13] Add LLM model server metrics This change adds common model server metrics that we want to standardize on. It starts of with two common latency metrics - time per output token and time to first token. --- docs/gen-ai/gen-ai-metrics.md | 75 +++++++++++++++++++++++++++++++++-- model/metrics/gen-ai.yaml | 16 ++++++++ 2 files changed, 88 insertions(+), 3 deletions(-) diff --git a/docs/gen-ai/gen-ai-metrics.md b/docs/gen-ai/gen-ai-metrics.md index 8e381c00db..31175f6bc3 100644 --- a/docs/gen-ai/gen-ai-metrics.md +++ b/docs/gen-ai/gen-ai-metrics.md @@ -2,10 +2,12 @@ linkTitle: Generative AI metrics ---> -# Semantic Conventions for Generative AI Client Metrics +# Semantic Conventions for Generative AI Metrics **Status**: [Experimental][DocumentStatus] +## Generative AI Client Metrics + The conventions described in this section are specific to Generative AI client applications. @@ -22,8 +24,6 @@ and attributes but more may be added in the future. -## Generative AI Client Metrics - The following metric instruments describe Generative AI operations. An operation may be a request to an LLM, a function call, or some other distinct action within a larger Generative AI workflow. @@ -179,6 +179,75 @@ Instrumentations SHOULD document the list of errors they report. + + + + + +## Generative AI Model Server Metrics + +The following metric instruments describe Generative AI model servers' +operational metrics. It includes both functional and performance metrics. + +### Metric: `gen_ai.server.latency.time_per_output_token` + +This metric is [recommended][MetricRecommended] to report the model server +latency in terms of mean time per token generated for any model servers which +support serving LLMs. + +For example, if a model server which serves LLMs reports latency information, +it SHOULD be used. + +If instrumentation cannot obtain this information at a request level and break +it down into the buckets mentioned below, then it MUST NOT report this metric. + +This metric SHOULD be specified with [ExplicitBucketBoundaries] of +[0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5, 0.75, 1.0, 2.5]. + + + + + + + + +| Name | Instrument Type | Unit (UCUM) | Description | Stability | +| -------- | --------------- | ----------- | -------------- | --------- | +| `gen_ai.server.latency.time_per_output_token` | Histogram | `s` | Mean time per output token generated | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + + + + + + + +### Metric: `gen_ai.server.latency.time_per_output_token` + +This metric is [recommended][MetricRecommended] to report the model server +latency in terms of time spent to generate the first token of the response for +any modle servers which support serving LLMs. + +For example, if a model server which serves LLMs reports latency information, +it SHOULD be used. + +If instrumentation cannot obtain this information at a request level and break +it down into the buckets mentioned below, then it MUST NOT report this metric. + +This metric SHOULD be specified with [ExplicitBucketBoundaries] of +[0.001, 0.005, 0.01, 0.02, 0.04, 0.06, 0.08, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0]. + + + + + + + + +| Name | Instrument Type | Unit (UCUM) | Description | Stability | +| -------- | --------------- | ----------- | -------------- | --------- | +| `gen_ai.server.latency.time_to_first_token` | Histogram | `s` | Mean time/latency to generate first token | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + + diff --git a/model/metrics/gen-ai.yaml b/model/metrics/gen-ai.yaml index 8398e8f0c6..ff8ea14775 100644 --- a/model/metrics/gen-ai.yaml +++ b/model/metrics/gen-ai.yaml @@ -43,3 +43,19 @@ groups: The `error.type` SHOULD match the error code returned by the Generative AI provider or the client library, the canonical name of exception that occurred, or another low-cardinality error identifier. Instrumentations SHOULD document the list of errors they report. + - id: metric.gen_ai.server.latency.time_per_output_token + type: metric + metric_name: gen_ai.server.latency.time_per_output_token + brief: 'Mean time per output token generated' + instrument: histogram + unit: "s" + stability: experimental + extends: metric_attributes.gen_ai + - id: metric.gen_ai.server.latency.time_to_first_token + type: metric + metric_name: gen_ai.server.latency.time_to_first_token + brief: 'Mean time/latency to generate first token' + instrument: histogram + unit: "s" + stability: experimental + extends: metric_attributes.gen_ai From 3b81d8fa6dfdca2184526fd862338db62092b25d Mon Sep 17 00:00:00 2001 From: Ashok Chandrasekar Date: Fri, 31 May 2024 18:26:55 +0000 Subject: [PATCH 02/13] Add changelog --- .chloggen/1102.yaml | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100755 .chloggen/1102.yaml diff --git a/.chloggen/1102.yaml b/.chloggen/1102.yaml new file mode 100755 index 0000000000..b094f4d184 --- /dev/null +++ b/.chloggen/1102.yaml @@ -0,0 +1,22 @@ +# Use this changelog template to create an entry for release notes. +# +# If your change doesn't affect end users you should instead start +# your pull request title with [chore] or use the "Skip Changelog" label. + +# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' +change_type: + +# The name of the area of concern in the attributes-registry, (e.g. http, cloud, db) +component: + +# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). +note: + +# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists. +# The values here must be integers. +issues: [] + +# (Optional) One or more lines of additional information to render under the primary note. +# These lines will be padded with 2 spaces and then inserted directly into the document. +# Use pipe (|) for multiline entries. +subtext: From 57f9e895f65f33ed2288e19ab705449201113ec5 Mon Sep 17 00:00:00 2001 From: Ashok Chandrasekar Date: Fri, 31 May 2024 20:26:13 +0000 Subject: [PATCH 03/13] Address typos and clarify description --- .chloggen/1102.yaml | 8 ++++---- docs/gen-ai/gen-ai-metrics.md | 10 +++++----- model/metrics/gen-ai.yaml | 4 ++-- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/.chloggen/1102.yaml b/.chloggen/1102.yaml index b094f4d184..b5f485d12f 100755 --- a/.chloggen/1102.yaml +++ b/.chloggen/1102.yaml @@ -4,17 +4,17 @@ # your pull request title with [chore] or use the "Skip Changelog" label. # One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' -change_type: +change_type: enhancement # The name of the area of concern in the attributes-registry, (e.g. http, cloud, db) -component: +component: gen-ai # A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). -note: +note: Add GenAI model server server metrics for measuring LLM serving latency # Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists. # The values here must be integers. -issues: [] +issues: [1102] # (Optional) One or more lines of additional information to render under the primary note. # These lines will be padded with 2 spaces and then inserted directly into the document. diff --git a/docs/gen-ai/gen-ai-metrics.md b/docs/gen-ai/gen-ai-metrics.md index 31175f6bc3..48ebfc6be7 100644 --- a/docs/gen-ai/gen-ai-metrics.md +++ b/docs/gen-ai/gen-ai-metrics.md @@ -192,7 +192,7 @@ operational metrics. It includes both functional and performance metrics. ### Metric: `gen_ai.server.latency.time_per_output_token` This metric is [recommended][MetricRecommended] to report the model server -latency in terms of mean time per token generated for any model servers which +latency in terms of time per token generated for any model servers which support serving LLMs. For example, if a model server which serves LLMs reports latency information, @@ -213,7 +213,7 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of | Name | Instrument Type | Unit (UCUM) | Description | Stability | | -------- | --------------- | ----------- | -------------- | --------- | -| `gen_ai.server.latency.time_per_output_token` | Histogram | `s` | Mean time per output token generated | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.server.latency.time_per_output_token` | Histogram | `s` | Time per output token generated | ![Experimental](https://img.shields.io/badge/-experimental-blue) | @@ -221,11 +221,11 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of -### Metric: `gen_ai.server.latency.time_per_output_token` +### Metric: `gen_ai.server.latency.time_to_first_token` This metric is [recommended][MetricRecommended] to report the model server latency in terms of time spent to generate the first token of the response for -any modle servers which support serving LLMs. +any model servers which support serving LLMs. For example, if a model server which serves LLMs reports latency information, it SHOULD be used. @@ -245,7 +245,7 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of | Name | Instrument Type | Unit (UCUM) | Description | Stability | | -------- | --------------- | ----------- | -------------- | --------- | -| `gen_ai.server.latency.time_to_first_token` | Histogram | `s` | Mean time/latency to generate first token | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.server.latency.time_to_first_token` | Histogram | `s` | Time to generate first token | ![Experimental](https://img.shields.io/badge/-experimental-blue) | diff --git a/model/metrics/gen-ai.yaml b/model/metrics/gen-ai.yaml index ff8ea14775..eb8b444e33 100644 --- a/model/metrics/gen-ai.yaml +++ b/model/metrics/gen-ai.yaml @@ -46,7 +46,7 @@ groups: - id: metric.gen_ai.server.latency.time_per_output_token type: metric metric_name: gen_ai.server.latency.time_per_output_token - brief: 'Mean time per output token generated' + brief: 'Time per output token generated' instrument: histogram unit: "s" stability: experimental @@ -54,7 +54,7 @@ groups: - id: metric.gen_ai.server.latency.time_to_first_token type: metric metric_name: gen_ai.server.latency.time_to_first_token - brief: 'Mean time/latency to generate first token' + brief: 'Time to generate first token' instrument: histogram unit: "s" stability: experimental From 479d2dcb864e222d2a97dc35d3cee93f7866f50d Mon Sep 17 00:00:00 2001 From: Ashok Chandrasekar Date: Mon, 3 Jun 2024 20:38:09 +0000 Subject: [PATCH 04/13] Add request_duration metric and fix description of other ones --- docs/gen-ai/gen-ai-metrics.md | 35 +++++++++++++++++++++++++++++++++-- model/metrics/gen-ai.yaml | 12 ++++++++++-- 2 files changed, 43 insertions(+), 4 deletions(-) diff --git a/docs/gen-ai/gen-ai-metrics.md b/docs/gen-ai/gen-ai-metrics.md index 48ebfc6be7..0b4549b40c 100644 --- a/docs/gen-ai/gen-ai-metrics.md +++ b/docs/gen-ai/gen-ai-metrics.md @@ -189,6 +189,37 @@ Instrumentations SHOULD document the list of errors they report. The following metric instruments describe Generative AI model servers' operational metrics. It includes both functional and performance metrics. +### Metric: `gen_ai.server.latency.request_duration` + +This metric is [recommended][MetricRecommended] to report the model server +latency in terms of time spent per request. + +For example, if a GenAI model server reports latency information, it SHOULD be +used. + +If instrumentation cannot obtain this information at a request level and break +it down into the buckets mentioned below, then it MUST NOT report this metric. + +This metric SHOULD be specified with [ExplicitBucketBoundaries] of +[1.0, 2.5, 5.0, 10.0, 15.0, 20.0, 30.0, 40.0, 50.0, 60.0]. + + + + + + + + +| Name | Instrument Type | Unit (UCUM) | Description | Stability | +| -------- | --------------- | ----------- | -------------- | --------- | +| `gen_ai.server.latency.request_duration` | Histogram | `s` | Time (end to end latency) to complete a request | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + + + + + + + ### Metric: `gen_ai.server.latency.time_per_output_token` This metric is [recommended][MetricRecommended] to report the model server @@ -213,7 +244,7 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of | Name | Instrument Type | Unit (UCUM) | Description | Stability | | -------- | --------------- | ----------- | -------------- | --------- | -| `gen_ai.server.latency.time_per_output_token` | Histogram | `s` | Time per output token generated | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.server.latency.time_per_output_token` | Histogram | `s` | Time per output token generated for successful responses | ![Experimental](https://img.shields.io/badge/-experimental-blue) | @@ -245,7 +276,7 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of | Name | Instrument Type | Unit (UCUM) | Description | Stability | | -------- | --------------- | ----------- | -------------- | --------- | -| `gen_ai.server.latency.time_to_first_token` | Histogram | `s` | Time to generate first token | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.server.latency.time_to_first_token` | Histogram | `s` | Time to generate first token for successful responses | ![Experimental](https://img.shields.io/badge/-experimental-blue) | diff --git a/model/metrics/gen-ai.yaml b/model/metrics/gen-ai.yaml index eb8b444e33..8fe0a2d4e2 100644 --- a/model/metrics/gen-ai.yaml +++ b/model/metrics/gen-ai.yaml @@ -43,10 +43,18 @@ groups: The `error.type` SHOULD match the error code returned by the Generative AI provider or the client library, the canonical name of exception that occurred, or another low-cardinality error identifier. Instrumentations SHOULD document the list of errors they report. + - id: metric.gen_ai.server.latency.request_duration + type: metric + metric_name: gen_ai.server.latency.request_duration + brief: 'Time (end to end latency) to complete a request' + instrument: histogram + unit: "s" + stability: experimental + extends: metric_attributes.gen_ai - id: metric.gen_ai.server.latency.time_per_output_token type: metric metric_name: gen_ai.server.latency.time_per_output_token - brief: 'Time per output token generated' + brief: 'Time per output token generated for successful responses' instrument: histogram unit: "s" stability: experimental @@ -54,7 +62,7 @@ groups: - id: metric.gen_ai.server.latency.time_to_first_token type: metric metric_name: gen_ai.server.latency.time_to_first_token - brief: 'Time to generate first token' + brief: 'Time to generate first token for successful responses' instrument: histogram unit: "s" stability: experimental From 1d2341fc4aae5620aeae5bd37bf48e9a05f1d1cf Mon Sep 17 00:00:00 2001 From: Ashok Chandrasekar Date: Mon, 3 Jun 2024 21:00:30 +0000 Subject: [PATCH 05/13] Add markdown toc --- docs/gen-ai/gen-ai-metrics.md | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/docs/gen-ai/gen-ai-metrics.md b/docs/gen-ai/gen-ai-metrics.md index 0b4549b40c..018125abc0 100644 --- a/docs/gen-ai/gen-ai-metrics.md +++ b/docs/gen-ai/gen-ai-metrics.md @@ -6,14 +6,6 @@ linkTitle: Generative AI metrics **Status**: [Experimental][DocumentStatus] -## Generative AI Client Metrics - -The conventions described in this section are specific to Generative AI client -applications. - -**Disclaimer:** These are initial Generative AI client metric instruments -and attributes but more may be added in the future. - @@ -21,9 +13,21 @@ and attributes but more may be added in the future. - [Generative AI Client Metrics](#generative-ai-client-metrics) - [Metric: `gen_ai.client.token.usage`](#metric-gen_aiclienttokenusage) - [Metric: `gen_ai.client.operation.duration`](#metric-gen_aiclientoperationduration) +- [Generative AI Model Server Metrics](#generative-ai-model-server-metrics) + - [Metric: `gen_ai.server.latency.request_duration`](#metric-gen_aiserverlatencyrequest_duration) + - [Metric: `gen_ai.server.latency.time_per_output_token`](#metric-gen_aiserverlatencytime_per_output_token) + - [Metric: `gen_ai.server.latency.time_to_first_token`](#metric-gen_aiserverlatencytime_to_first_token) +## Generative AI Client Metrics + +The conventions described in this section are specific to Generative AI client +applications. + +**Disclaimer:** These are initial Generative AI client metric instruments +and attributes but more may be added in the future. + The following metric instruments describe Generative AI operations. An operation may be a request to an LLM, a function call, or some other distinct action within a larger Generative AI workflow. From b3117f432591d2697b75ae76e3121e9951628e08 Mon Sep 17 00:00:00 2001 From: Ashok Chandrasekar Date: Tue, 4 Jun 2024 22:45:44 -0700 Subject: [PATCH 06/13] Update .chloggen/1102.yaml to fix typo Co-authored-by: Liudmila Molkova --- .chloggen/1102.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.chloggen/1102.yaml b/.chloggen/1102.yaml index b5f485d12f..b80789491d 100755 --- a/.chloggen/1102.yaml +++ b/.chloggen/1102.yaml @@ -10,7 +10,7 @@ change_type: enhancement component: gen-ai # A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). -note: Add GenAI model server server metrics for measuring LLM serving latency +note: Add GenAI model server metrics for measuring LLM serving latency # Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists. # The values here must be integers. From 8e10b93185827cc89afc34234cbd7002f7dbf38e Mon Sep 17 00:00:00 2001 From: Ashok Chandrasekar Date: Tue, 4 Jun 2024 22:48:06 -0700 Subject: [PATCH 07/13] Update docs/gen-ai/gen-ai-metrics.md metric description Co-authored-by: Liudmila Molkova --- docs/gen-ai/gen-ai-metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/gen-ai/gen-ai-metrics.md b/docs/gen-ai/gen-ai-metrics.md index 018125abc0..009d249da1 100644 --- a/docs/gen-ai/gen-ai-metrics.md +++ b/docs/gen-ai/gen-ai-metrics.md @@ -216,7 +216,7 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of | Name | Instrument Type | Unit (UCUM) | Description | Stability | | -------- | --------------- | ----------- | -------------- | --------- | -| `gen_ai.server.latency.request_duration` | Histogram | `s` | Time (end to end latency) to complete a request | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.server.latency.request_duration` | Histogram | `s` | Generative AI server request duration such as time-to-last byte or last output token | ![Experimental](https://img.shields.io/badge/-experimental-blue) | From c2506350390f517822f9f2ea306a7e542decacb8 Mon Sep 17 00:00:00 2001 From: Ashok Chandrasekar Date: Thu, 6 Jun 2024 21:42:41 +0000 Subject: [PATCH 08/13] Drop latency from the metric names --- docs/gen-ai/gen-ai-metrics.md | 24 ++++++++++++------------ model/metrics/gen-ai.yaml | 14 +++++++------- 2 files changed, 19 insertions(+), 19 deletions(-) diff --git a/docs/gen-ai/gen-ai-metrics.md b/docs/gen-ai/gen-ai-metrics.md index 009d249da1..c618d9ee61 100644 --- a/docs/gen-ai/gen-ai-metrics.md +++ b/docs/gen-ai/gen-ai-metrics.md @@ -14,9 +14,9 @@ linkTitle: Generative AI metrics - [Metric: `gen_ai.client.token.usage`](#metric-gen_aiclienttokenusage) - [Metric: `gen_ai.client.operation.duration`](#metric-gen_aiclientoperationduration) - [Generative AI Model Server Metrics](#generative-ai-model-server-metrics) - - [Metric: `gen_ai.server.latency.request_duration`](#metric-gen_aiserverlatencyrequest_duration) - - [Metric: `gen_ai.server.latency.time_per_output_token`](#metric-gen_aiserverlatencytime_per_output_token) - - [Metric: `gen_ai.server.latency.time_to_first_token`](#metric-gen_aiserverlatencytime_to_first_token) + - [Metric: `gen_ai.server.request_duration`](#metric-gen_aiserverrequest_duration) + - [Metric: `gen_ai.server.time_per_output_token`](#metric-gen_aiservertime_per_output_token) + - [Metric: `gen_ai.server.time_to_first_token`](#metric-gen_aiservertime_to_first_token) @@ -193,7 +193,7 @@ Instrumentations SHOULD document the list of errors they report. The following metric instruments describe Generative AI model servers' operational metrics. It includes both functional and performance metrics. -### Metric: `gen_ai.server.latency.request_duration` +### Metric: `gen_ai.server.request_duration` This metric is [recommended][MetricRecommended] to report the model server latency in terms of time spent per request. @@ -207,7 +207,7 @@ it down into the buckets mentioned below, then it MUST NOT report this metric. This metric SHOULD be specified with [ExplicitBucketBoundaries] of [1.0, 2.5, 5.0, 10.0, 15.0, 20.0, 30.0, 40.0, 50.0, 60.0]. - + @@ -216,7 +216,7 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of | Name | Instrument Type | Unit (UCUM) | Description | Stability | | -------- | --------------- | ----------- | -------------- | --------- | -| `gen_ai.server.latency.request_duration` | Histogram | `s` | Generative AI server request duration such as time-to-last byte or last output token | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.server.request_duration` | Histogram | `s` | Generative AI server request duration such as time-to-last byte or last output token | ![Experimental](https://img.shields.io/badge/-experimental-blue) | @@ -224,7 +224,7 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of -### Metric: `gen_ai.server.latency.time_per_output_token` +### Metric: `gen_ai.server.time_per_output_token` This metric is [recommended][MetricRecommended] to report the model server latency in terms of time per token generated for any model servers which @@ -239,7 +239,7 @@ it down into the buckets mentioned below, then it MUST NOT report this metric. This metric SHOULD be specified with [ExplicitBucketBoundaries] of [0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5, 0.75, 1.0, 2.5]. - + @@ -248,7 +248,7 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of | Name | Instrument Type | Unit (UCUM) | Description | Stability | | -------- | --------------- | ----------- | -------------- | --------- | -| `gen_ai.server.latency.time_per_output_token` | Histogram | `s` | Time per output token generated for successful responses | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.server.time_per_output_token` | Histogram | `s` | Time per output token generated for successful responses | ![Experimental](https://img.shields.io/badge/-experimental-blue) | @@ -256,7 +256,7 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of -### Metric: `gen_ai.server.latency.time_to_first_token` +### Metric: `gen_ai.server.time_to_first_token` This metric is [recommended][MetricRecommended] to report the model server latency in terms of time spent to generate the first token of the response for @@ -271,7 +271,7 @@ it down into the buckets mentioned below, then it MUST NOT report this metric. This metric SHOULD be specified with [ExplicitBucketBoundaries] of [0.001, 0.005, 0.01, 0.02, 0.04, 0.06, 0.08, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0]. - + @@ -280,7 +280,7 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of | Name | Instrument Type | Unit (UCUM) | Description | Stability | | -------- | --------------- | ----------- | -------------- | --------- | -| `gen_ai.server.latency.time_to_first_token` | Histogram | `s` | Time to generate first token for successful responses | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.server.time_to_first_token` | Histogram | `s` | Time to generate first token for successful responses | ![Experimental](https://img.shields.io/badge/-experimental-blue) | diff --git a/model/metrics/gen-ai.yaml b/model/metrics/gen-ai.yaml index 8fe0a2d4e2..3cfdb30c55 100644 --- a/model/metrics/gen-ai.yaml +++ b/model/metrics/gen-ai.yaml @@ -43,25 +43,25 @@ groups: The `error.type` SHOULD match the error code returned by the Generative AI provider or the client library, the canonical name of exception that occurred, or another low-cardinality error identifier. Instrumentations SHOULD document the list of errors they report. - - id: metric.gen_ai.server.latency.request_duration + - id: metric.gen_ai.server.request_duration type: metric - metric_name: gen_ai.server.latency.request_duration - brief: 'Time (end to end latency) to complete a request' + metric_name: gen_ai.server.request_duration + brief: 'Generative AI server request duration such as time-to-last byte or last output token' instrument: histogram unit: "s" stability: experimental extends: metric_attributes.gen_ai - - id: metric.gen_ai.server.latency.time_per_output_token + - id: metric.gen_ai.server.time_per_output_token type: metric - metric_name: gen_ai.server.latency.time_per_output_token + metric_name: gen_ai.server.time_per_output_token brief: 'Time per output token generated for successful responses' instrument: histogram unit: "s" stability: experimental extends: metric_attributes.gen_ai - - id: metric.gen_ai.server.latency.time_to_first_token + - id: metric.gen_ai.server.time_to_first_token type: metric - metric_name: gen_ai.server.latency.time_to_first_token + metric_name: gen_ai.server.time_to_first_token brief: 'Time to generate first token for successful responses' instrument: histogram unit: "s" From 993411add825e8de35ded1a896955517e926fbcd Mon Sep 17 00:00:00 2001 From: Ashok Chandrasekar Date: Thu, 6 Jun 2024 21:56:45 +0000 Subject: [PATCH 09/13] Render attribute table for the metrics --- docs/gen-ai/gen-ai-metrics.md | 117 +++++++++++++++++++++++++++++++++- 1 file changed, 114 insertions(+), 3 deletions(-) diff --git a/docs/gen-ai/gen-ai-metrics.md b/docs/gen-ai/gen-ai-metrics.md index c618d9ee61..8ad3ec53ca 100644 --- a/docs/gen-ai/gen-ai-metrics.md +++ b/docs/gen-ai/gen-ai-metrics.md @@ -14,9 +14,9 @@ linkTitle: Generative AI metrics - [Metric: `gen_ai.client.token.usage`](#metric-gen_aiclienttokenusage) - [Metric: `gen_ai.client.operation.duration`](#metric-gen_aiclientoperationduration) - [Generative AI Model Server Metrics](#generative-ai-model-server-metrics) - - [Metric: `gen_ai.server.request_duration`](#metric-gen_aiserverrequest_duration) - - [Metric: `gen_ai.server.time_per_output_token`](#metric-gen_aiservertime_per_output_token) - - [Metric: `gen_ai.server.time_to_first_token`](#metric-gen_aiservertime_to_first_token) + - [Metric: `gen_ai.server.request_duration`](#metric-gen_aiserverlatencyrequest_duration) + - [Metric: `gen_ai.server.time_per_output_token`](#metric-gen_aiserverlatencytime_per_output_token) + - [Metric: `gen_ai.server.time_to_first_token`](#metric-gen_aiserverlatencytime_to_first_token) @@ -219,6 +219,43 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of | `gen_ai.server.request_duration` | Histogram | `s` | Generative AI server request duration such as time-to-last byte or last output token | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + + + + + + + + + + + + +| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | +|---|---|---|---|---|---| +| [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the LLM a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [2] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the LLM a response was generated from. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [3] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | + +**[1]:** The actual GenAI product may differ from the one identified by the client. For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` is set to `openai` based on the instrumentation's best knowledge. + +**[2]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. + +**[3]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available. + + + +`gen_ai.system` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `openai` | OpenAI | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + + + @@ -251,6 +288,43 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of | `gen_ai.server.time_per_output_token` | Histogram | `s` | Time per output token generated for successful responses | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + + + + + + + + + + + + +| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | +|---|---|---|---|---|---| +| [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the LLM a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [2] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the LLM a response was generated from. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [3] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | + +**[1]:** The actual GenAI product may differ from the one identified by the client. For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` is set to `openai` based on the instrumentation's best knowledge. + +**[2]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. + +**[3]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available. + + + +`gen_ai.system` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `openai` | OpenAI | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + + + @@ -283,6 +357,43 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of | `gen_ai.server.time_to_first_token` | Histogram | `s` | Time to generate first token for successful responses | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + + + + + + + + + + + + +| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | +|---|---|---|---|---|---| +| [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the LLM a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [2] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the LLM a response was generated from. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [3] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | + +**[1]:** The actual GenAI product may differ from the one identified by the client. For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` is set to `openai` based on the instrumentation's best knowledge. + +**[2]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. + +**[3]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available. + + + +`gen_ai.system` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `openai` | OpenAI | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + + + From 28548d8d4d2ed7090e1e39a692aaa26a592a5485 Mon Sep 17 00:00:00 2001 From: Ashok Chandrasekar Date: Fri, 14 Jun 2024 06:37:37 +0000 Subject: [PATCH 10/13] Addressed error type, buckets and descriptions --- docs/attributes-registry/gen-ai.md | 4 +- docs/gen-ai/gen-ai-metrics.md | 99 +++++++++++++++--------------- model/metrics/gen-ai.yaml | 16 ++++- model/registry/gen-ai.yaml | 2 +- 4 files changed, 67 insertions(+), 54 deletions(-) diff --git a/docs/attributes-registry/gen-ai.md b/docs/attributes-registry/gen-ai.md index 36a69ad7dc..32f68e9c6b 100644 --- a/docs/attributes-registry/gen-ai.md +++ b/docs/attributes-registry/gen-ai.md @@ -11,7 +11,7 @@ This document defines the attributes used to describe telemetry in the context of Generative Artificial Intelligence (GenAI) Models requests and responses. | Attribute | Type | Description | Examples | Stability | -| ---------------------------------- | -------- | ------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------- | ---------------------------------------------------------------- | +| ---------------------------------- | -------- |--------------------------------------------------------------------------------------------------| ----------------------------------------------------------------------- | ---------------------------------------------------------------- | | `gen_ai.completion` | string | The full response received from the GenAI model. [1] | `[{'role': 'assistant', 'content': 'The capital of France is Paris.'}]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.operation.name` | string | The name of the operation being performed. | `chat`; `completion` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.prompt` | string | The full prompt sent to the GenAI model. [2] | `[{'role': 'user', 'content': 'What is the capital of France?'}]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | @@ -26,7 +26,7 @@ This document defines the attributes used to describe telemetry in the context o | `gen_ai.response.finish_reasons` | string[] | Array of reasons the model stopped generating tokens, corresponding to each generation received. | `["stop"]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.response.model` | string | The name of the model that generated the response. | `gpt-4-0613` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `gen_ai.system` | string | The Generative AI product as identified by the client instrumentation. [3] | `openai` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.system` | string | The Generative AI product as identified by the client or server instrumentation. [3] | `openai` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.token.type` | string | The type of token being counted. | `input`; `output` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.usage.completion_tokens` | int | The number of tokens used in the GenAI response (completion). | `180` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.usage.prompt_tokens` | int | The number of tokens used in the GenAI input or prompt. | `100` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | diff --git a/docs/gen-ai/gen-ai-metrics.md b/docs/gen-ai/gen-ai-metrics.md index 8ad3ec53ca..981b6c1a90 100644 --- a/docs/gen-ai/gen-ai-metrics.md +++ b/docs/gen-ai/gen-ai-metrics.md @@ -14,9 +14,9 @@ linkTitle: Generative AI metrics - [Metric: `gen_ai.client.token.usage`](#metric-gen_aiclienttokenusage) - [Metric: `gen_ai.client.operation.duration`](#metric-gen_aiclientoperationduration) - [Generative AI Model Server Metrics](#generative-ai-model-server-metrics) - - [Metric: `gen_ai.server.request_duration`](#metric-gen_aiserverlatencyrequest_duration) - - [Metric: `gen_ai.server.time_per_output_token`](#metric-gen_aiserverlatencytime_per_output_token) - - [Metric: `gen_ai.server.time_to_first_token`](#metric-gen_aiserverlatencytime_to_first_token) + - [Metric: `gen_ai.server.request_duration`](#metric-gen_aiserverrequest_duration) + - [Metric: `gen_ai.server.time_per_output_token`](#metric-gen_aiservertime_per_output_token) + - [Metric: `gen_ai.server.time_to_first_token`](#metric-gen_aiservertime_to_first_token) @@ -69,14 +69,14 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of [1, 4, 16, 64 -| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | -|---|---|---|---|---|---| -| [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`gen_ai.token.type`](/docs/attributes-registry/gen-ai.md) | string | The type of token being counted. | `input`; `output` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [2] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | -| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | +|---|---|-------------------------------------------------------------------------------------------------------------------|---|---|---| +| [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.token.type`](/docs/attributes-registry/gen-ai.md) | string | The type of token being counted. | `input`; `output` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [2] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [3] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | **[1]:** The actual GenAI product may differ from the one identified by the client. For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` is set to `openai` based on the instrumentation's best knowledge. @@ -142,14 +142,14 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of [ 0.01, 0.02, -| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | -|---|---|---|---|---|---| -| [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [2] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` if the operation ended in an error | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | -| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [3] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | -| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | +|---|---|-------------------------------------------------------------------------------------------------------------------|---|---|---| +| [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [2] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` if the operation ended in an error | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [3] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [4] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | **[1]:** The actual GenAI product may differ from the one identified by the client. For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` is set to `openai` based on the instrumentation's best knowledge. @@ -198,14 +198,8 @@ operational metrics. It includes both functional and performance metrics. This metric is [recommended][MetricRecommended] to report the model server latency in terms of time spent per request. -For example, if a GenAI model server reports latency information, it SHOULD be -used. - -If instrumentation cannot obtain this information at a request level and break -it down into the buckets mentioned below, then it MUST NOT report this metric. - This metric SHOULD be specified with [ExplicitBucketBoundaries] of -[1.0, 2.5, 5.0, 10.0, 15.0, 20.0, 30.0, 40.0, 50.0, 60.0]. +[0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12,10.24, 20.48, 40.96, 81.92]. @@ -235,17 +229,29 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of |---|---|---|---|---|---| | [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the LLM a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [2] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [2] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` if the operation ended in an error | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [3] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the LLM a response was generated from. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [3] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [4] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | **[1]:** The actual GenAI product may differ from the one identified by the client. For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` is set to `openai` based on the instrumentation's best knowledge. -**[2]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. +**[2]:** The `error.type` SHOULD match the error code returned by the Generative AI service, +the canonical name of exception that occurred, or another low-cardinality error identifier. +Instrumentations SHOULD document the list of errors they report. -**[3]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available. +**[3]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. +**[4]:** When observed from the client side, and when communicating through an intermediary, `server.address` SHOULD represent the server address behind any intermediaries, for example proxies, if it's available. + + + +`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | `gen_ai.system` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. @@ -264,14 +270,12 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of ### Metric: `gen_ai.server.time_per_output_token` This metric is [recommended][MetricRecommended] to report the model server -latency in terms of time per token generated for any model servers which -support serving LLMs. - -For example, if a model server which serves LLMs reports latency information, -it SHOULD be used. - -If instrumentation cannot obtain this information at a request level and break -it down into the buckets mentioned below, then it MUST NOT report this metric. +latency in terms of time per token generated after the first token for any model +servers which support serving LLMs. It is measured by subtracting the time taken +to generate the first output token from the request duration and dividing the +rest of the duration by the number of output tokens generated after the first +token. This is important in measuring the performance of the decode phase of LLM +inference. This metric SHOULD be specified with [ExplicitBucketBoundaries] of [0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5, 0.75, 1.0, 2.5]. @@ -285,7 +289,7 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of | Name | Instrument Type | Unit (UCUM) | Description | Stability | | -------- | --------------- | ----------- | -------------- | --------- | -| `gen_ai.server.time_per_output_token` | Histogram | `s` | Time per output token generated for successful responses | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.server.time_per_output_token` | Histogram | `s` | Time per output token generated after the first token for successful responses | ![Experimental](https://img.shields.io/badge/-experimental-blue) | @@ -304,7 +308,7 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of |---|---|---|---|---|---| | [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the LLM a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [2] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the LLM a response was generated from. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [3] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | @@ -334,13 +338,10 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of This metric is [recommended][MetricRecommended] to report the model server latency in terms of time spent to generate the first token of the response for -any model servers which support serving LLMs. - -For example, if a model server which serves LLMs reports latency information, -it SHOULD be used. - -If instrumentation cannot obtain this information at a request level and break -it down into the buckets mentioned below, then it MUST NOT report this metric. +any model servers which support serving LLMs. It helps measure the time spent in +the queue and the prefill phase. It is important especially for streaming +requests. It is calculated at a request level and is reported as a histogram +using the buckets mentioned below. This metric SHOULD be specified with [ExplicitBucketBoundaries] of [0.001, 0.005, 0.01, 0.02, 0.04, 0.06, 0.08, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0]. @@ -373,7 +374,7 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of |---|---|---|---|---|---| | [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the LLM a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [2] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the LLM a response was generated from. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [3] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | diff --git a/model/metrics/gen-ai.yaml b/model/metrics/gen-ai.yaml index 3cfdb30c55..d83698c067 100644 --- a/model/metrics/gen-ai.yaml +++ b/model/metrics/gen-ai.yaml @@ -16,6 +16,18 @@ groups: requirement_level: required - ref: gen_ai.operation.name requirement_level: required + - id: metric_attributes.gen_ai.server + type: attribute_group + brief: 'This group describes GenAI server metrics attributes' + extends: metric_attributes.gen_ai + attributes: + - ref: error.type + requirement_level: + conditionally_required: "if the operation ended in an error" + note: | + The `error.type` SHOULD match the error code returned by the Generative AI service, + the canonical name of exception that occurred, or another low-cardinality error identifier. + Instrumentations SHOULD document the list of errors they report. - id: metric.gen_ai.client.token.usage type: metric metric_name: gen_ai.client.token.usage @@ -50,11 +62,11 @@ groups: instrument: histogram unit: "s" stability: experimental - extends: metric_attributes.gen_ai + extends: metric_attributes.gen_ai.server - id: metric.gen_ai.server.time_per_output_token type: metric metric_name: gen_ai.server.time_per_output_token - brief: 'Time per output token generated for successful responses' + brief: 'Time per output token generated after the first token for successful responses' instrument: histogram unit: "s" stability: experimental diff --git a/model/registry/gen-ai.yaml b/model/registry/gen-ai.yaml index 394f053a21..2bbf16e7c4 100644 --- a/model/registry/gen-ai.yaml +++ b/model/registry/gen-ai.yaml @@ -25,7 +25,7 @@ groups: stability: experimental value: "cohere" brief: 'Cohere' - brief: The Generative AI product as identified by the client instrumentation. + brief: The Generative AI product as identified by the client or server instrumentation. note: > The actual GenAI product may differ from the one identified by the client. For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` From 2873d9f8a6a6153b025a7246ba6fd8909ea850ca Mon Sep 17 00:00:00 2001 From: Ashok Chandrasekar Date: Thu, 20 Jun 2024 21:49:06 +0000 Subject: [PATCH 11/13] Fix formatting after addressing conflicts --- docs/attributes-registry/gen-ai.md | 2 +- docs/gen-ai/gen-ai-metrics.md | 56 ++++++++++++++++++------------ docs/gen-ai/gen-ai-spans.md | 2 +- 3 files changed, 36 insertions(+), 24 deletions(-) diff --git a/docs/attributes-registry/gen-ai.md b/docs/attributes-registry/gen-ai.md index 32f68e9c6b..7030cb3bd3 100644 --- a/docs/attributes-registry/gen-ai.md +++ b/docs/attributes-registry/gen-ai.md @@ -11,7 +11,7 @@ This document defines the attributes used to describe telemetry in the context of Generative Artificial Intelligence (GenAI) Models requests and responses. | Attribute | Type | Description | Examples | Stability | -| ---------------------------------- | -------- |--------------------------------------------------------------------------------------------------| ----------------------------------------------------------------------- | ---------------------------------------------------------------- | +| ---------------------------------- | -------- | ------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------- | ---------------------------------------------------------------- | | `gen_ai.completion` | string | The full response received from the GenAI model. [1] | `[{'role': 'assistant', 'content': 'The capital of France is Paris.'}]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.operation.name` | string | The name of the operation being performed. | `chat`; `completion` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `gen_ai.prompt` | string | The full prompt sent to the GenAI model. [2] | `[{'role': 'user', 'content': 'What is the capital of France?'}]` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | diff --git a/docs/gen-ai/gen-ai-metrics.md b/docs/gen-ai/gen-ai-metrics.md index 981b6c1a90..1e01f1dd92 100644 --- a/docs/gen-ai/gen-ai-metrics.md +++ b/docs/gen-ai/gen-ai-metrics.md @@ -69,14 +69,14 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of [1, 4, 16, 64 -| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | -|---|---|-------------------------------------------------------------------------------------------------------------------|---|---|---| -| [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`gen_ai.token.type`](/docs/attributes-registry/gen-ai.md) | string | The type of token being counted. | `input`; `output` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [2] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | -| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | +|---|---|---|---|---|---| +| [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.token.type`](/docs/attributes-registry/gen-ai.md) | string | The type of token being counted. | `input`; `output` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [2] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [3] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | **[1]:** The actual GenAI product may differ from the one identified by the client. For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` is set to `openai` based on the instrumentation's best knowledge. @@ -142,14 +142,14 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of [ 0.01, 0.02, -| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | -|---|---|-------------------------------------------------------------------------------------------------------------------|---|---|---| -| [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [2] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` if the operation ended in an error | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | -| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [3] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | -| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | +|---|---|---|---|---|---| +| [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [2] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` if the operation ended in an error | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [3] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [4] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | **[1]:** The actual GenAI product may differ from the one identified by the client. For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` is set to `openai` based on the instrumentation's best knowledge. @@ -228,14 +228,15 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of | Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | |---|---|---|---|---|---| | [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the LLM a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [2] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` if the operation ended in an error | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [3] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | -| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the LLM a response was generated from. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [4] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | **[1]:** The actual GenAI product may differ from the one identified by the client. For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` is set to `openai` based on the instrumentation's best knowledge. +For custom model, a custom friendly name SHOULD be used. If none of these options apply, the `gen_ai.system` SHOULD be set to `_OTHER`. **[2]:** The `error.type` SHOULD match the error code returned by the Generative AI service, the canonical name of exception that occurred, or another low-cardinality error identifier. @@ -258,7 +259,10 @@ Instrumentations SHOULD document the list of errors they report. | Value | Description | Stability | |---|---|---| +| `anthropic` | Anthropic | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `cohere` | Cohere | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `openai` | OpenAI | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `vertex_ai` | Vertex AI | ![Experimental](https://img.shields.io/badge/-experimental-blue) | @@ -307,13 +311,14 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of | Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | |---|---|---|---|---|---| | [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the LLM a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [2] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | -| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the LLM a response was generated from. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [3] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | **[1]:** The actual GenAI product may differ from the one identified by the client. For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` is set to `openai` based on the instrumentation's best knowledge. +For custom model, a custom friendly name SHOULD be used. If none of these options apply, the `gen_ai.system` SHOULD be set to `_OTHER`. **[2]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. @@ -325,7 +330,10 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of | Value | Description | Stability | |---|---|---| +| `anthropic` | Anthropic | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `cohere` | Cohere | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `openai` | OpenAI | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `vertex_ai` | Vertex AI | ![Experimental](https://img.shields.io/badge/-experimental-blue) | @@ -373,13 +381,14 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of | Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | |---|---|---|---|---|---| | [`gen_ai.operation.name`](/docs/attributes-registry/gen-ai.md) | string | The name of the operation being performed. | `chat`; `completion` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the LLM a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [1] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [2] | `80`; `8080`; `443` | `Conditionally Required` If `sever.address` is set. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | -| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the LLM a response was generated from. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.response.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the model that generated the response. | `gpt-4-0613` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [3] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | **[1]:** The actual GenAI product may differ from the one identified by the client. For example, when using OpenAI client libraries to communicate with Mistral, the `gen_ai.system` is set to `openai` based on the instrumentation's best knowledge. +For custom model, a custom friendly name SHOULD be used. If none of these options apply, the `gen_ai.system` SHOULD be set to `_OTHER`. **[2]:** When observed from the client side, and when communicating through an intermediary, `server.port` SHOULD represent the server port behind any intermediaries, for example proxies, if it's available. @@ -391,7 +400,10 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of | Value | Description | Stability | |---|---|---| +| `anthropic` | Anthropic | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `cohere` | Cohere | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | `openai` | OpenAI | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `vertex_ai` | Vertex AI | ![Experimental](https://img.shields.io/badge/-experimental-blue) | diff --git a/docs/gen-ai/gen-ai-spans.md b/docs/gen-ai/gen-ai-spans.md index d6837106df..d2fb54b987 100644 --- a/docs/gen-ai/gen-ai-spans.md +++ b/docs/gen-ai/gen-ai-spans.md @@ -46,7 +46,7 @@ These attributes track input data and metadata for a request to an GenAI model. | Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | |---|---|---|---|---|---| | [`gen_ai.request.model`](/docs/attributes-registry/gen-ai.md) | string | The name of the GenAI model a request is being made to. [1] | `gpt-4` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client instrumentation. [2] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`gen_ai.system`](/docs/attributes-registry/gen-ai.md) | string | The Generative AI product as identified by the client or server instrumentation. [2] | `openai` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`gen_ai.request.frequency_penalty`](/docs/attributes-registry/gen-ai.md) | double | The frequency penalty setting for the GenAI request. | `0.1` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`gen_ai.request.max_tokens`](/docs/attributes-registry/gen-ai.md) | int | The maximum number of tokens the model generates for a request. | `100` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | [`gen_ai.request.presence_penalty`](/docs/attributes-registry/gen-ai.md) | double | The presence penalty setting for the GenAI request. | `0.1` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | From 2c8c8353a61bb13b7a0aaf2bc1493cba7eb535cc Mon Sep 17 00:00:00 2001 From: Ashok Chandrasekar Date: Thu, 20 Jun 2024 20:16:44 -0700 Subject: [PATCH 12/13] Apply suggestions from code review Co-authored-by: Drew Robbins Co-authored-by: Liudmila Molkova --- model/metrics/gen-ai.yaml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/model/metrics/gen-ai.yaml b/model/metrics/gen-ai.yaml index d83698c067..650bb15824 100644 --- a/model/metrics/gen-ai.yaml +++ b/model/metrics/gen-ai.yaml @@ -21,8 +21,8 @@ groups: brief: 'This group describes GenAI server metrics attributes' extends: metric_attributes.gen_ai attributes: - - ref: error.type - requirement_level: + - ref: error.type + requirement_level: conditionally_required: "if the operation ended in an error" note: | The `error.type` SHOULD match the error code returned by the Generative AI service, @@ -57,7 +57,7 @@ groups: Instrumentations SHOULD document the list of errors they report. - id: metric.gen_ai.server.request_duration type: metric - metric_name: gen_ai.server.request_duration + metric_name: gen_ai.server.request.duration brief: 'Generative AI server request duration such as time-to-last byte or last output token' instrument: histogram unit: "s" From 9cd3b0d88e55a31c0e8a75199203d4e6593538eb Mon Sep 17 00:00:00 2001 From: Ashok Chandrasekar Date: Fri, 21 Jun 2024 03:37:20 +0000 Subject: [PATCH 13/13] Fix formatting and naming --- docs/gen-ai/gen-ai-metrics.md | 10 +++++----- model/metrics/gen-ai.yaml | 4 ++-- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/gen-ai/gen-ai-metrics.md b/docs/gen-ai/gen-ai-metrics.md index 1e01f1dd92..9f8e165cff 100644 --- a/docs/gen-ai/gen-ai-metrics.md +++ b/docs/gen-ai/gen-ai-metrics.md @@ -14,7 +14,7 @@ linkTitle: Generative AI metrics - [Metric: `gen_ai.client.token.usage`](#metric-gen_aiclienttokenusage) - [Metric: `gen_ai.client.operation.duration`](#metric-gen_aiclientoperationduration) - [Generative AI Model Server Metrics](#generative-ai-model-server-metrics) - - [Metric: `gen_ai.server.request_duration`](#metric-gen_aiserverrequest_duration) + - [Metric: `gen_ai.server.request.duration`](#metric-gen_aiserverrequestduration) - [Metric: `gen_ai.server.time_per_output_token`](#metric-gen_aiservertime_per_output_token) - [Metric: `gen_ai.server.time_to_first_token`](#metric-gen_aiservertime_to_first_token) @@ -193,7 +193,7 @@ Instrumentations SHOULD document the list of errors they report. The following metric instruments describe Generative AI model servers' operational metrics. It includes both functional and performance metrics. -### Metric: `gen_ai.server.request_duration` +### Metric: `gen_ai.server.request.duration` This metric is [recommended][MetricRecommended] to report the model server latency in terms of time spent per request. @@ -201,7 +201,7 @@ latency in terms of time spent per request. This metric SHOULD be specified with [ExplicitBucketBoundaries] of [0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12,10.24, 20.48, 40.96, 81.92]. - + @@ -210,7 +210,7 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of | Name | Instrument Type | Unit (UCUM) | Description | Stability | | -------- | --------------- | ----------- | -------------- | --------- | -| `gen_ai.server.request_duration` | Histogram | `s` | Generative AI server request duration such as time-to-last byte or last output token | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `gen_ai.server.request.duration` | Histogram | `s` | Generative AI server request duration such as time-to-last byte or last output token | ![Experimental](https://img.shields.io/badge/-experimental-blue) | @@ -218,7 +218,7 @@ This metric SHOULD be specified with [ExplicitBucketBoundaries] of - + diff --git a/model/metrics/gen-ai.yaml b/model/metrics/gen-ai.yaml index 650bb15824..7c9979cf21 100644 --- a/model/metrics/gen-ai.yaml +++ b/model/metrics/gen-ai.yaml @@ -24,7 +24,7 @@ groups: - ref: error.type requirement_level: conditionally_required: "if the operation ended in an error" - note: | + note: | The `error.type` SHOULD match the error code returned by the Generative AI service, the canonical name of exception that occurred, or another low-cardinality error identifier. Instrumentations SHOULD document the list of errors they report. @@ -55,7 +55,7 @@ groups: The `error.type` SHOULD match the error code returned by the Generative AI provider or the client library, the canonical name of exception that occurred, or another low-cardinality error identifier. Instrumentations SHOULD document the list of errors they report. - - id: metric.gen_ai.server.request_duration + - id: metric.gen_ai.server.request.duration type: metric metric_name: gen_ai.server.request.duration brief: 'Generative AI server request duration such as time-to-last byte or last output token'