Request for comments: service.type and service.distro Resource attributes #554

tigrannajaryan · 2023-11-23T01:25:01Z

Problem Description

service.name Resource attribute is currently defined as the "Logical name of the service". The expectation is that service.name will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace).

Otel Collector sets service.name by default to be the name of the executable (e.g. otelcorecol or otelcontribcol).

Collector's service.name can be overridden by the operator using service.telemetry.resource setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names.

However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees.

This issue talk a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention.

This issue shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol.

Proposal

This is a request for comments for adding the following Recommended, experimental Resource semantic conventions:

service.type - an FQDN that uniquely identifies the type of the service, e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique.
service.distro - an FQDN that uniquely identifies the distribution of the service. A number of distributions can belong to the same service.type. For example OpenTelemetry Collector has multiple known distributions, e.g. io.opentelemetry.collector-contrib, com.splunk.opentelemetry-collector, com.amazon.adot-collector, etc. [as an alternate instead of reverse company FQDN we may also require service.distro to point to the source code repo of the distro (e.g. git/github repo) for OSS agents].

Note that having a separate service.type allows OpAMP if wanted by the operator to manage the same type of agents in a similar way even though their service.name values may be different due to different logical roles they have.

Another example with NGINX: service.type will be set to com.nginx by NGINX developers, while service.name is set to "api-gateway" by the operator, denoting the logical role that the particular NGINX deployment serves in this particular system.

I am looking for comments and thoughts on this proposal before I submit a PR.

The text was updated successfully, but these errors were encountered:

tigrannajaryan · 2023-11-23T01:26:05Z

@open-telemetry/collector-approvers @open-telemetry/collector-contrib-approvers I would also like your opinion on this proposal.

jpkrohling · 2023-11-23T09:51:20Z

While I see the appeal for service.type, I'm not too keen on service.distro. I can't think of a workload that has different distributions other than the Collector. Do you know other examples that help validate that this is needed beyond the collector?

In any case, I do believe that the collector has to provide the distribution as part of the resource attributes for its own telemetry.

pyohannes · 2023-11-23T11:10:57Z

The main intent of semantic conventions is defining the shape of telemetry data, and I have some mixed feelings when I see them used in modelling parts of protocol specifications.

Do you know other examples that help validate that this is needed beyond the collector?

That's also something I'm wondering about: do we see value in putting those attributes on telemetry data? Or is the primary intent to satisfy requirements stemming from coupling the OpAmp spec with semantic conventions?

andrzej-stencel · 2023-11-23T13:06:13Z

I can't think of a workload that has different distributions other than the Collector. Do you know other examples that help validate that this is needed beyond the collector?

@jpkrohling here's an example from @tigrannajaryan:

For example PostgreSQL has a bunch of forks and derived databases which this attribute can indicate.

#396 (comment)

This is not to say I agree with this reasoning. Wouldn't all these different databases have a different service.type, if that existed?

I'm also worried that service.type is too vague. Different people might want to set it to different things in different contexts.

If this is needed for OpAMP (as this comment suggests: #396 (comment)), why not intruduce an OpAMP-specific attribute like opamp.agent.type as proposed by @jack-berg here: open-telemetry/opamp-spec#131 (comment)?

jpkrohling · 2023-11-23T13:47:27Z

Wouldn't all these different databases have a different service.type, if that existed?

Yes. In the same vein, Elasticsearch vs. OpenSearch. As a user, I would likely prefer to have each fork on its own service.type. Collector core vs. contrib are not forks, they really are different distributions.

tigrannajaryan · 2023-11-23T17:17:40Z

The main intent of semantic conventions is defining the shape of telemetry data, and I have some mixed feelings when I see them used in modelling parts of protocol specifications.

To be clear: the only reason OpAMP is concerned with this is to direct the Collector to report useful telemetry about itself and to be identified in a way that is useful for the end user. There is no other use for agent type in any other capabilities that OpAMP supports.

tigrannajaryan · 2023-11-23T17:20:29Z

That's also something I'm wondering about: do we see value in putting those attributes on telemetry data? Or is the primary intent to satisfy requirements stemming from coupling the OpAmp spec with semantic conventions?

I think there is value in putting this in telemetry. If I am running a cache that I give a service name "foo" it is likely useful for me to also know that that cache is actually a Redis instance.

If we don't see the value of this then we should not add a service.type. Instead we can go with agent.type and other agent-specific conventions.

tigrannajaryan · 2023-11-23T17:21:54Z

If this is needed for OpAMP (as this comment suggests: #396 (comment)), why not intruduce an OpAMP-specific attribute like opamp.agent.type as proposed by @jack-berg here: open-telemetry/opamp-spec#131 (comment)?

Yes, that's the alternate if we don't believe that service.type is generally useful for all services.

pyohannes · 2023-11-24T10:49:16Z

To be clear: the only reason OpAMP is concerned with this is to direct the Collector to report useful telemetry about itself and to be identified in a way that is useful for the end user. There is no other use for agent type in any other capabilities that OpAMP supports.

I was referring to the identification part, where semantic convention attributes seem to be part of the OpAMP spec.

From a non-OpAMP point of view, I find service.distro quite confusing. For collectors I think this would overlap with telemetry.distro.name, for non-collectors or non-agents I'm not sure what would go into this attribute.

I think there is value in putting this in telemetry. If I am running a cache that I give a service name "foo" it is likely useful for me to also know that that cache is actually a Redis instance.

That's definitely a valid use case (for service.type), as currently the only other way to achieve this is deducing the "type" from other (resource) attributes. Do you think it makes sense beyond third-party OTel-enabled services (e. g. redis, nginx, kafka, collector)?

tigrannajaryan · 2023-11-24T20:19:34Z

I was referring to the identification part, where semantic convention attributes seem to be part of the OpAMP spec.

Yes, we are in agreement here, that's what I meant by "to be identified in a way that is useful for the end user".

From a non-OpAMP point of view, I find service.distro quite confusing.

This is probably somewhat unique to Otel Collector. If we look at Postgres for example there is a ton of Postgres-based databases, but I am not sure I would call them a "distro", perhaps they would be their own "service types".

Do you think it makes sense beyond third-party OTel-enabled services (e. g. redis, nginx, kafka, collector)?

Probably of quite a limited use for the first-party services. Something developed and used internally in most cases has a well known internal name that is recorded in service.name, and which likely is enough. However, I can imagine in big organizations you may see a first-party service that is generic enough to be built once but deployed in many different places and roles to warrant a difference between the service.type and service.name (e.g. if an org implements their own storage system they may give it its own common service.type but use different service.names depending on what it is used for). This should be likely rare, but perhaps not impossible.

joaopgrassi · 2023-11-27T12:24:00Z

Envoy proxy can also be extended and built with custom extensions, essentially being another "distro". For ex, the Istio project uses it's own distro of Envoy (https://github.com/istio/proxy)

jack-berg · 2023-11-27T19:44:55Z

I see value in having a service.type (not as sure about service.distro) outside of the OpAmp use case.

Speaking from the perspective of an observability vendor, I want to be able to identify the type of telemetry source and provide different observability experiences based on the data that type typically emits. For example, the signals that matter for a web service are not the same as for a kafka broker, or a postgres instance, or redis instance, etc etc.

Currently, we have to rely on duck typing: the presence of certain attribute keys or values might suggest that the telemetry is coming from a particular type. This is brittle and often insufficient.

I like the idea of FQDN service.type, which is optional. The FQDN implies that its hierarchical, allowing systems inspecting service.type to look for certain prefixes of interest, while ignore suffixes that may offer additional info. I like it being optional because I think service.type will tend to be most important in situations where you're using the collector to scrape observability from some system. Whether its a redis instance, a postgres instance, a kafka broker, or anything else, you want the telemetry to have a user-asigned logical service.name while still conveying the type of system.

tigrannajaryan · 2023-11-28T15:25:17Z

@jaronoff97 @jlegoff @arminru @andykellr any opinions on this?

jmacd · 2023-11-28T16:12:06Z

I support the two proposed new attributes. In this related work, open-telemetry/oteps#238, I've tried to establish a uniqueness requirement for resource attributes of some kind. The solution proposed in this issue will help. We want to be sure that no two agents with identical attributes handle a pipeline, otherwise they appear to be double-counting the items that pass through them both. If we have semantic conventions that ensure pipelines have unique stage identities, we can do more to automate pipeline monitoring.

jsuereth · 2023-11-28T19:25:42Z

My quick reaction to one comment is that I think the use case of OpAMP implies the use of service.role over service.type.

Specifically, the OTEL collector being used as a an agent vs. gateway that seems more like a "role" and not something a FQDN would address.

tigrannajaryan · 2023-11-28T20:00:52Z

Specifically, the OTEL collector being used as a an agent vs. gateway that seems more like a "role" and not something a FQDN would address.

@jsuereth I would expect the role to be recorded in service.name. It is defined as "Logical name of the service", which seems to be a fine location for what one would call a "role".

jaronoff97 · 2023-11-28T23:46:52Z

To make the OpAMP use case clear here, assume we have an OpAMP client that works on the collector and reports the collectors config under the effective_config section. It's then implied that an OpAMP server who is able to read and unmarshal this effective configuration will want to do so by using the collectors unmarshaling logic. This is relatively straightforward, a collector OpAMP client connects to a collector OpAMP server.

As OpAMP is adopted by more agents, however, this is more complicated. Because OpAMP allows for a partial implementation, any agent type that implements the protocol can connect to our server, not just an OpAMP collector client. The next OpAMP client implementation is by the OpAMP bridge in the operator which reports on the Kubernetes configuration for a collector, this is decidedly different than our OpAMP collector client's effective configuration.

In the future, say an agent like Prometheus were to have an OpAMP client, our OpAMP server would want to be able to marshal this configuration using Prometheus' unmarshal logic.

All of this necessitates a type that an OpAMP server can switch on to know what type of client it's talking to. Without wading too much in to the discussion of this, I would posit that any collector will have the same structure of configuration, the difference being who built the image. To that end, I would recommend that any images that are functionally equivalent and share a configuration structure should have equivalent service.type, but differing service.distro which (to me) says who exactly built the image you're using. This would result in service.role for the OpAMP use case having client and server which would allow an end user to understand how their OpAMP agents and servers are being run.

I'll tie it up in this example:

sequenceDiagram
    participant P as OpAMP-Client (Prometheus, vanilla)
    participant OTC as OpAMP-Client (OTC, vanilla)
    participant SOTC as OpAMP-Client (OTC, Splunk)
    participant OB as OpAMP-Client (Bridge, vanilla)
    participant OPS as OpAMP Server
    P->>+OPS: sends effective config
    OTC->>+OPS: sends effective config
    SOTC->>+OPS: sends effective config
    OB->>+OPS: sends effective config

    OPS-->>-OTC: Responds OK
    OPS-->>-SOTC: Responds OK
    OPS-->>-P: Responds OK
    OPS-->>-OB: Responds OK

We can see that each OpAMP client is able to report its role, (type, distro) which allows a single server to understand each request. All of this would be supported by the above proposal. Thank you @tigrannajaryan

tigrannajaryan · 2023-11-29T15:27:40Z

@jaronoff97 I am not sure I understand why we need "role" in the use case you described. Wouldn't "role" be always "client" for all of the connecting agents in the diagram you show?

jaronoff97 · 2023-11-29T15:35:11Z

yes – I don't think it's terribly useful for us outside of a reporting case where you could image the client and server emitting the same metric (connections for example) to an external telemetry vendor. Without the role, the vendor would receive

connections{service.type=OTC, service.distro=otel} 1 # from the client 
connections{service.type=OTC, service.distro=otel} 1 # from the server

however with a role it would be more descriptive:

connections{service.type=OTC, service.distro=otel, service.role=client} 1
connections{service.type=OTC, service.distro=otel, service.role=server} 1

This example is admittedly contrived, and there's a better use case for collectors IMO.

tigrannajaryan · 2023-11-29T15:56:06Z

The http connection metrics already have their own, distinct way of differentiating the client and server metrics. They have different metric names, there is no need for an attribute: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/http/http-metrics.md

jaronoff97 · 2023-11-29T16:04:09Z

still, i think role would be useful here for the opamp supervisor case. The supervisor acts as a middleman and without the role the telemetry would look like this:

flowchart TD
    A[OpAMP Extension] -->|http| B(OpAMP Supervisor)
    B -->|http| C{OpAMP Server}

# from the opamp extension's client
http.client.request.duration{service.type=OTC, service.distro=otel} 1
# from the opamp supervisor's server
http.server.request.duration{service.type=OTC, service.distro=otel} 1 
# from the opamp supervisor's client
http.client.request.duration{service.type=OTC, service.distro=otel} 1 
# from the opamp server
http.server.request.duration{service.type=OTC, service.distro=otel} 10

Without role, it's unclear the relationship between these related metrics. In this example we have high latency in the opamp server, but it's unclear if that would be affecting the opamp extension or the opamp supervisor. Adding in the role would allow the telemetry to understand the relationship between these metrics:

# from the opamp extension's client
http.client.request.duration{service.type=OTC, service.distro=otel, service.role="extension"} 1
# from the opamp supervisor's server
http.server.request.duration{service.type=OTC, service.distro=otel, service.role="supervisor"} 1 
# from the opamp supervisor's client
http.client.request.duration{service.type=OTC, service.distro=otel, service.role="supervisor"} 1 
# from the opamp server
http.server.request.duration{service.type=OTC, service.distro=otel, service.role="server"} 10

tigrannajaryan · 2023-11-29T16:17:10Z

I think Supervisor should use a different service.type for its own metrics, not the same as Collector.

cbandy · 2023-11-30T15:48:53Z

still, i think role would be useful here for the opamp supervisor case. The supervisor acts as a middleman and without the role the telemetry would look like this:

I don't know what I'm talking about, but this looks to me like maybe an example of how metrics have less fidelity than traces. There are three resources in the diagram, but no mention of their identities in the metric attributes/labels.

To make the OpAMP use case clear here, assume we have an OpAMP client that works on the collector and reports the collectors config under the effective_config section.

The next OpAMP client implementation is by the OpAMP bridge in the operator which reports on the Kubernetes configuration for a collector, this is decidedly different than our OpAMP collector client's effective configuration.

If we look at Postgres for example there is a ton of Postgres-based databases, but I am not sure I would call them a "distro", perhaps they would be their own "service types".

I don't know what I'm talking about, but this sounds like OpAMP has a concept of what makes two things similar or different. Should the discussion here be about an attribute in an opamp namespace/domain?

tigrannajaryan · 2023-12-01T16:12:29Z

Should the discussion here be about an attribute in an opamp namespace/domain?

That is the alternate approach if we decide that service.type is not generally useful for all services. I see some supporting voices that it can be useful so I am going to submit a PR that adds it and continue the discussion there.

Contributes to open-telemetry#554 Contributes to open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 ## Problem Description `service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace). Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol). Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names. However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees. This [issue](open-telemetry#396) talk a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention. This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol. ## Proposed Change This is a request for comments for adding the following Recommended, experimental Resource semantic conventions: - `service.type` - an FQDN that uniquely identifies the type of the service, e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique. Note that having a separate `service.type` allows OpAMP if wanted by the operator to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have. Another example with NGINX: `service.type` will be set to com.nginx by NGINX developers, while `service.name` is set to "api-gateway" by the operator, denoting the logical role that the particular NGINX deployment serves in this particular system.

Contributes to open-telemetry#554 Contributes to open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 `service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace). Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol). Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names. However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees. This [issue](open-telemetry#396) talk a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention. This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol. This change adds `service.type` as Recommended, experimental Resource semantic convention. The value is a string in reverse domain notation that uniquely identifies the type of the service, e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique. Note that having a separate `service.type` allows OpAMP if wanted by the operator to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have. Another example with NGINX: `service.type` will be set to com.nginx by NGINX developers, while `service.name` is set to "api-gateway" by the operator, denoting the logical role that the particular NGINX deployment serves in this particular system.

Contributes to open-telemetry#554 Contributes to open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 Problem Description =================== `service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace). Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol). Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names. However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees. This [issue](open-telemetry#396) talk a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention. This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol. Proposed Change =============== This change adds `service.type` as Recommended, experimental Resource semantic convention. The value is a string in reverse domain notation that uniquely identifies the type of the service, e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique. Note that having a separate `service.type` allows OpAMP if wanted by the operator to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have. Another example with NGINX: `service.type` will be set to com.nginx by NGINX developers, while `service.name` is set to "api-gateway" by the operator, denoting the logical role that the particular NGINX deployment serves in this particular system.

Contributes to open-telemetry#554 Contributes to open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 Problem Description =================== `service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace`). Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol). Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names. However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees. This [issue](open-telemetry#396) talks a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention. This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol. Changes ======= This change adds `service.type` as a Recommended, experimental Resource semantic convention. The value is a string in reverse domain notation that uniquely identifies the type of the service (the type of the product deployed as the service), e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique. For OpAMP having a separate `service.type` allows OpAMP, if desired by the operator, to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have. An example unrelated to OpAMP, when using NGINX: `service.type` will be set to "com.nginx", while `service.name` is set to "api-gateway", denoting the logical role that the particular NGINX deployment serves in this particular system.

tigrannajaryan · 2023-12-01T18:47:11Z

PR for service.type: #575

Contributes to open-telemetry#554 Contributes to open-telemetry#396 Contributes to open-telemetry/opamp-spec#131 Problem Description =================== `service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace`). Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol). Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names. However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees. This [issue](open-telemetry#396) talks a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention. This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol. Changes ======= This change adds `service.type` as a Recommended, experimental Resource semantic convention. The value is a string in reverse domain notation that uniquely identifies the type of the service (the type of the product deployed as the service), e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique. For OpAMP having a separate `service.type` allows OpAMP, if desired by the operator, to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have. An example unrelated to OpAMP, when using NGINX: `service.type` will be set to "com.nginx", while `service.name` is set to "api-gateway", denoting the logical role that the particular NGINX deployment serves in this particular system.

tigrannajaryan · 2024-02-20T21:38:58Z

All, the PR that adds service.type is created, but I and others have doubts that this is the right way. Please comment on the PR with arguments in favour or against it.

tigrannajaryan · 2024-04-18T16:24:04Z

Closing this as "won't do". We will likely instead propose agent.* semantic conventions.

github-actions bot assigned reyang Nov 23, 2023

This was referenced Nov 23, 2023

Add agent resource type #396

Open

Opamp spec overloads definition of service.name open-telemetry/opamp-spec#131

Open

tigrannajaryan assigned tigrannajaryan and unassigned reyang Dec 1, 2023

tigrannajaryan mentioned this issue Dec 1, 2023

Add service.type experimental Resource attribute #575

Closed

3 tasks

github-actions bot added the Stale label Feb 8, 2024

joaopgrassi removed the Stale label Feb 14, 2024

tigrannajaryan closed this as completed Apr 18, 2024

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for comments: service.type and service.distro Resource attributes #554

Request for comments: service.type and service.distro Resource attributes #554

tigrannajaryan commented Nov 23, 2023

tigrannajaryan commented Nov 23, 2023

jpkrohling commented Nov 23, 2023

pyohannes commented Nov 23, 2023

andrzej-stencel commented Nov 23, 2023 •

edited

Loading

jpkrohling commented Nov 23, 2023

tigrannajaryan commented Nov 23, 2023

tigrannajaryan commented Nov 23, 2023

tigrannajaryan commented Nov 23, 2023

pyohannes commented Nov 24, 2023

tigrannajaryan commented Nov 24, 2023

joaopgrassi commented Nov 27, 2023

jack-berg commented Nov 27, 2023

tigrannajaryan commented Nov 28, 2023

jmacd commented Nov 28, 2023 •

edited

Loading

jsuereth commented Nov 28, 2023

tigrannajaryan commented Nov 28, 2023

jaronoff97 commented Nov 28, 2023

tigrannajaryan commented Nov 29, 2023

jaronoff97 commented Nov 29, 2023

tigrannajaryan commented Nov 29, 2023

jaronoff97 commented Nov 29, 2023

tigrannajaryan commented Nov 29, 2023

cbandy commented Nov 30, 2023

tigrannajaryan commented Dec 1, 2023

tigrannajaryan commented Dec 1, 2023

tigrannajaryan commented Feb 20, 2024

tigrannajaryan commented Apr 18, 2024

Request for comments: service.type and service.distro Resource attributes #554

Request for comments: service.type and service.distro Resource attributes #554

Comments

tigrannajaryan commented Nov 23, 2023

Problem Description

Proposal

tigrannajaryan commented Nov 23, 2023

jpkrohling commented Nov 23, 2023

pyohannes commented Nov 23, 2023

andrzej-stencel commented Nov 23, 2023 • edited Loading

jpkrohling commented Nov 23, 2023

tigrannajaryan commented Nov 23, 2023

tigrannajaryan commented Nov 23, 2023

tigrannajaryan commented Nov 23, 2023

pyohannes commented Nov 24, 2023

tigrannajaryan commented Nov 24, 2023

joaopgrassi commented Nov 27, 2023

jack-berg commented Nov 27, 2023

tigrannajaryan commented Nov 28, 2023

jmacd commented Nov 28, 2023 • edited Loading

jsuereth commented Nov 28, 2023

tigrannajaryan commented Nov 28, 2023

jaronoff97 commented Nov 28, 2023

tigrannajaryan commented Nov 29, 2023

jaronoff97 commented Nov 29, 2023

tigrannajaryan commented Nov 29, 2023

jaronoff97 commented Nov 29, 2023

tigrannajaryan commented Nov 29, 2023

cbandy commented Nov 30, 2023

tigrannajaryan commented Dec 1, 2023

tigrannajaryan commented Dec 1, 2023

tigrannajaryan commented Feb 20, 2024

tigrannajaryan commented Apr 18, 2024

andrzej-stencel commented Nov 23, 2023 •

edited

Loading

jmacd commented Nov 28, 2023 •

edited

Loading