Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for comments: service.type and service.distro Resource attributes #554

Closed
tigrannajaryan opened this issue Nov 23, 2023 · 27 comments
Assignees

Comments

@tigrannajaryan
Copy link
Member

Problem Description

service.name Resource attribute is currently defined as the "Logical name of the service". The expectation is that service.name will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace).

Otel Collector sets service.name by default to be the name of the executable (e.g. otelcorecol or otelcontribcol).

Collector's service.name can be overridden by the operator using service.telemetry.resource setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names.

However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees.

This issue talk a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention.

This issue shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol.

Proposal

This is a request for comments for adding the following Recommended, experimental Resource semantic conventions:

  • service.type - an FQDN that uniquely identifies the type of the service, e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique.
  • service.distro - an FQDN that uniquely identifies the distribution of the service. A number of distributions can belong to the same service.type. For example OpenTelemetry Collector has multiple known distributions, e.g. io.opentelemetry.collector-contrib, com.splunk.opentelemetry-collector, com.amazon.adot-collector, etc. [as an alternate instead of reverse company FQDN we may also require service.distro to point to the source code repo of the distro (e.g. git/github repo) for OSS agents].

Note that having a separate service.type allows OpAMP if wanted by the operator to manage the same type of agents in a similar way even though their service.name values may be different due to different logical roles they have.

Another example with NGINX: service.type will be set to com.nginx by NGINX developers, while service.name is set to "api-gateway" by the operator, denoting the logical role that the particular NGINX deployment serves in this particular system.

I am looking for comments and thoughts on this proposal before I submit a PR.

@tigrannajaryan
Copy link
Member Author

@open-telemetry/collector-approvers @open-telemetry/collector-contrib-approvers I would also like your opinion on this proposal.

@jpkrohling
Copy link
Member

While I see the appeal for service.type, I'm not too keen on service.distro. I can't think of a workload that has different distributions other than the Collector. Do you know other examples that help validate that this is needed beyond the collector?

In any case, I do believe that the collector has to provide the distribution as part of the resource attributes for its own telemetry.

@pyohannes
Copy link
Contributor

The main intent of semantic conventions is defining the shape of telemetry data, and I have some mixed feelings when I see them used in modelling parts of protocol specifications.

Do you know other examples that help validate that this is needed beyond the collector?

That's also something I'm wondering about: do we see value in putting those attributes on telemetry data? Or is the primary intent to satisfy requirements stemming from coupling the OpAmp spec with semantic conventions?

@andrzej-stencel
Copy link
Member

andrzej-stencel commented Nov 23, 2023

I can't think of a workload that has different distributions other than the Collector. Do you know other examples that help validate that this is needed beyond the collector?

@jpkrohling here's an example from @tigrannajaryan:

For example PostgreSQL has a bunch of forks and derived databases which this attribute can indicate.

#396 (comment)

This is not to say I agree with this reasoning. Wouldn't all these different databases have a different service.type, if that existed?

I'm also worried that service.type is too vague. Different people might want to set it to different things in different contexts.

If this is needed for OpAMP (as this comment suggests: #396 (comment)), why not intruduce an OpAMP-specific attribute like opamp.agent.type as proposed by @jack-berg here: open-telemetry/opamp-spec#131 (comment)?

@jpkrohling
Copy link
Member

Wouldn't all these different databases have a different service.type, if that existed?

Yes. In the same vein, Elasticsearch vs. OpenSearch. As a user, I would likely prefer to have each fork on its own service.type. Collector core vs. contrib are not forks, they really are different distributions.

@tigrannajaryan
Copy link
Member Author

The main intent of semantic conventions is defining the shape of telemetry data, and I have some mixed feelings when I see them used in modelling parts of protocol specifications.

To be clear: the only reason OpAMP is concerned with this is to direct the Collector to report useful telemetry about itself and to be identified in a way that is useful for the end user. There is no other use for agent type in any other capabilities that OpAMP supports.

@tigrannajaryan
Copy link
Member Author

That's also something I'm wondering about: do we see value in putting those attributes on telemetry data? Or is the primary intent to satisfy requirements stemming from coupling the OpAmp spec with semantic conventions?

I think there is value in putting this in telemetry. If I am running a cache that I give a service name "foo" it is likely useful for me to also know that that cache is actually a Redis instance.

If we don't see the value of this then we should not add a service.type. Instead we can go with agent.type and other agent-specific conventions.

@tigrannajaryan
Copy link
Member Author

If this is needed for OpAMP (as this comment suggests: #396 (comment)), why not intruduce an OpAMP-specific attribute like opamp.agent.type as proposed by @jack-berg here: open-telemetry/opamp-spec#131 (comment)?

Yes, that's the alternate if we don't believe that service.type is generally useful for all services.

@pyohannes
Copy link
Contributor

To be clear: the only reason OpAMP is concerned with this is to direct the Collector to report useful telemetry about itself and to be identified in a way that is useful for the end user. There is no other use for agent type in any other capabilities that OpAMP supports.

I was referring to the identification part, where semantic convention attributes seem to be part of the OpAMP spec.

From a non-OpAMP point of view, I find service.distro quite confusing. For collectors I think this would overlap with telemetry.distro.name, for non-collectors or non-agents I'm not sure what would go into this attribute.

I think there is value in putting this in telemetry. If I am running a cache that I give a service name "foo" it is likely useful for me to also know that that cache is actually a Redis instance.

That's definitely a valid use case (for service.type), as currently the only other way to achieve this is deducing the "type" from other (resource) attributes. Do you think it makes sense beyond third-party OTel-enabled services (e. g. redis, nginx, kafka, collector)?

@tigrannajaryan
Copy link
Member Author

I was referring to the identification part, where semantic convention attributes seem to be part of the OpAMP spec.

Yes, we are in agreement here, that's what I meant by "to be identified in a way that is useful for the end user".

From a non-OpAMP point of view, I find service.distro quite confusing.

This is probably somewhat unique to Otel Collector. If we look at Postgres for example there is a ton of Postgres-based databases, but I am not sure I would call them a "distro", perhaps they would be their own "service types".

Do you think it makes sense beyond third-party OTel-enabled services (e. g. redis, nginx, kafka, collector)?

Probably of quite a limited use for the first-party services. Something developed and used internally in most cases has a well known internal name that is recorded in service.name, and which likely is enough. However, I can imagine in big organizations you may see a first-party service that is generic enough to be built once but deployed in many different places and roles to warrant a difference between the service.type and service.name (e.g. if an org implements their own storage system they may give it its own common service.type but use different service.names depending on what it is used for). This should be likely rare, but perhaps not impossible.

@joaopgrassi
Copy link
Member

Envoy proxy can also be extended and built with custom extensions, essentially being another "distro". For ex, the Istio project uses it's own distro of Envoy (https://github.com/istio/proxy)

@jack-berg
Copy link
Member

I see value in having a service.type (not as sure about service.distro) outside of the OpAmp use case.

Speaking from the perspective of an observability vendor, I want to be able to identify the type of telemetry source and provide different observability experiences based on the data that type typically emits. For example, the signals that matter for a web service are not the same as for a kafka broker, or a postgres instance, or redis instance, etc etc.

Currently, we have to rely on duck typing: the presence of certain attribute keys or values might suggest that the telemetry is coming from a particular type. This is brittle and often insufficient.

I like the idea of FQDN service.type, which is optional. The FQDN implies that its hierarchical, allowing systems inspecting service.type to look for certain prefixes of interest, while ignore suffixes that may offer additional info. I like it being optional because I think service.type will tend to be most important in situations where you're using the collector to scrape observability from some system. Whether its a redis instance, a postgres instance, a kafka broker, or anything else, you want the telemetry to have a user-asigned logical service.name while still conveying the type of system.

@tigrannajaryan
Copy link
Member Author

@jaronoff97 @jlegoff @arminru @andykellr any opinions on this?

@jmacd
Copy link
Contributor

jmacd commented Nov 28, 2023

I support the two proposed new attributes. In this related work, open-telemetry/oteps#238, I've tried to establish a uniqueness requirement for resource attributes of some kind. The solution proposed in this issue will help. We want to be sure that no two agents with identical attributes handle a pipeline, otherwise they appear to be double-counting the items that pass through them both. If we have semantic conventions that ensure pipelines have unique stage identities, we can do more to automate pipeline monitoring.

@jsuereth
Copy link
Contributor

My quick reaction to one comment is that I think the use case of OpAMP implies the use of service.role over service.type.

Specifically, the OTEL collector being used as a an agent vs. gateway that seems more like a "role" and not something a FQDN would address.

@tigrannajaryan
Copy link
Member Author

Specifically, the OTEL collector being used as a an agent vs. gateway that seems more like a "role" and not something a FQDN would address.

@jsuereth I would expect the role to be recorded in service.name. It is defined as "Logical name of the service", which seems to be a fine location for what one would call a "role".

@jaronoff97
Copy link

To make the OpAMP use case clear here, assume we have an OpAMP client that works on the collector and reports the collectors config under the effective_config section. It's then implied that an OpAMP server who is able to read and unmarshal this effective configuration will want to do so by using the collectors unmarshaling logic. This is relatively straightforward, a collector OpAMP client connects to a collector OpAMP server.

As OpAMP is adopted by more agents, however, this is more complicated. Because OpAMP allows for a partial implementation, any agent type that implements the protocol can connect to our server, not just an OpAMP collector client. The next OpAMP client implementation is by the OpAMP bridge in the operator which reports on the Kubernetes configuration for a collector, this is decidedly different than our OpAMP collector client's effective configuration.

In the future, say an agent like Prometheus were to have an OpAMP client, our OpAMP server would want to be able to marshal this configuration using Prometheus' unmarshal logic.

All of this necessitates a type that an OpAMP server can switch on to know what type of client it's talking to. Without wading too much in to the discussion of this, I would posit that any collector will have the same structure of configuration, the difference being who built the image. To that end, I would recommend that any images that are functionally equivalent and share a configuration structure should have equivalent service.type, but differing service.distro which (to me) says who exactly built the image you're using. This would result in service.role for the OpAMP use case having client and server which would allow an end user to understand how their OpAMP agents and servers are being run.

I'll tie it up in this example:

sequenceDiagram
    participant P as OpAMP-Client (Prometheus, vanilla)
    participant OTC as OpAMP-Client (OTC, vanilla)
    participant SOTC as OpAMP-Client (OTC, Splunk)
    participant OB as OpAMP-Client (Bridge, vanilla)
    participant OPS as OpAMP Server
    P->>+OPS: sends effective config
    OTC->>+OPS: sends effective config
    SOTC->>+OPS: sends effective config
    OB->>+OPS: sends effective config

    OPS-->>-OTC: Responds OK
    OPS-->>-SOTC: Responds OK
    OPS-->>-P: Responds OK
    OPS-->>-OB: Responds OK
Loading

We can see that each OpAMP client is able to report its role, (type, distro) which allows a single server to understand each request. All of this would be supported by the above proposal. Thank you @tigrannajaryan

@tigrannajaryan
Copy link
Member Author

@jaronoff97 I am not sure I understand why we need "role" in the use case you described. Wouldn't "role" be always "client" for all of the connecting agents in the diagram you show?

@jaronoff97
Copy link

yes – I don't think it's terribly useful for us outside of a reporting case where you could image the client and server emitting the same metric (connections for example) to an external telemetry vendor. Without the role, the vendor would receive

connections{service.type=OTC, service.distro=otel} 1 # from the client 
connections{service.type=OTC, service.distro=otel} 1 # from the server

however with a role it would be more descriptive:

connections{service.type=OTC, service.distro=otel, service.role=client} 1
connections{service.type=OTC, service.distro=otel, service.role=server} 1

This example is admittedly contrived, and there's a better use case for collectors IMO.

@tigrannajaryan
Copy link
Member Author

The http connection metrics already have their own, distinct way of differentiating the client and server metrics. They have different metric names, there is no need for an attribute: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/http/http-metrics.md

@jaronoff97
Copy link

still, i think role would be useful here for the opamp supervisor case. The supervisor acts as a middleman and without the role the telemetry would look like this:

flowchart TD
    A[OpAMP Extension] -->|http| B(OpAMP Supervisor)
    B -->|http| C{OpAMP Server}
Loading
# from the opamp extension's client
http.client.request.duration{service.type=OTC, service.distro=otel} 1
# from the opamp supervisor's server
http.server.request.duration{service.type=OTC, service.distro=otel} 1 
# from the opamp supervisor's client
http.client.request.duration{service.type=OTC, service.distro=otel} 1 
# from the opamp server
http.server.request.duration{service.type=OTC, service.distro=otel} 10

Without role, it's unclear the relationship between these related metrics. In this example we have high latency in the opamp server, but it's unclear if that would be affecting the opamp extension or the opamp supervisor. Adding in the role would allow the telemetry to understand the relationship between these metrics:

# from the opamp extension's client
http.client.request.duration{service.type=OTC, service.distro=otel, service.role="extension"} 1
# from the opamp supervisor's server
http.server.request.duration{service.type=OTC, service.distro=otel, service.role="supervisor"} 1 
# from the opamp supervisor's client
http.client.request.duration{service.type=OTC, service.distro=otel, service.role="supervisor"} 1 
# from the opamp server
http.server.request.duration{service.type=OTC, service.distro=otel, service.role="server"} 10

@tigrannajaryan
Copy link
Member Author

I think Supervisor should use a different service.type for its own metrics, not the same as Collector.

@cbandy
Copy link

cbandy commented Nov 30, 2023

still, i think role would be useful here for the opamp supervisor case. The supervisor acts as a middleman and without the role the telemetry would look like this:

I don't know what I'm talking about, but this looks to me like maybe an example of how metrics have less fidelity than traces. There are three resources in the diagram, but no mention of their identities in the metric attributes/labels.

To make the OpAMP use case clear here, assume we have an OpAMP client that works on the collector and reports the collectors config under the effective_config section.

The next OpAMP client implementation is by the OpAMP bridge in the operator which reports on the Kubernetes configuration for a collector, this is decidedly different than our OpAMP collector client's effective configuration.

If we look at Postgres for example there is a ton of Postgres-based databases, but I am not sure I would call them a "distro", perhaps they would be their own "service types".

I don't know what I'm talking about, but this sounds like OpAMP has a concept of what makes two things similar or different. Should the discussion here be about an attribute in an opamp namespace/domain?

@tigrannajaryan
Copy link
Member Author

Should the discussion here be about an attribute in an opamp namespace/domain?

That is the alternate approach if we decide that service.type is not generally useful for all services. I see some supporting voices that it can be useful so I am going to submit a PR that adds it and continue the discussion there.

tigrannajaryan added a commit to tigrannajaryan/semantic-conventions that referenced this issue Dec 1, 2023
Contributes to open-telemetry#554
Contributes to open-telemetry#396
Contributes to open-telemetry/opamp-spec#131

## Problem Description

`service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace).

Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol).

Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names.

However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees.

This [issue](open-telemetry#396) talk a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention.

This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol.

## Proposed Change

This is a request for comments for adding the following Recommended, experimental Resource semantic conventions:

- `service.type` - an FQDN that uniquely identifies the type of the service, e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique.

Note that having a separate `service.type` allows OpAMP if wanted by the operator to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have.

Another example with NGINX: `service.type` will be set to com.nginx by NGINX developers, while `service.name` is set to "api-gateway" by the operator, denoting the logical role that the particular NGINX deployment serves in this particular system.
tigrannajaryan added a commit to tigrannajaryan/semantic-conventions that referenced this issue Dec 1, 2023
Contributes to open-telemetry#554
Contributes to open-telemetry#396
Contributes to open-telemetry/opamp-spec#131

`service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace).

Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol).

Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names.

However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees.

This [issue](open-telemetry#396) talk a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention.

This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol.

This change adds `service.type` as Recommended, experimental Resource semantic convention.

The value is a string in reverse domain notation that uniquely identifies the type of the service, e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique.

Note that having a separate `service.type` allows OpAMP if wanted by the operator to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have.

Another example with NGINX: `service.type` will be set to com.nginx by NGINX developers, while `service.name` is set to "api-gateway" by the operator, denoting the logical role that the particular NGINX deployment serves in this particular system.
tigrannajaryan added a commit to tigrannajaryan/semantic-conventions that referenced this issue Dec 1, 2023
Contributes to open-telemetry#554

Contributes to open-telemetry#396

Contributes to open-telemetry/opamp-spec#131

Problem Description
===================

`service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service) as the "Logical name of the service". The expectation is that `service.name` will be set by the operator of the service to a value that describes the role of the service in the overall observable set of entities the operator has (within a `service.namespace).

Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276) `service.name` by default to be the name of the executable (e.g. otelcorecol or otelcontribcol).

Collector's `service.name` can be overridden by the operator using `service.telemetry.resource` setting of Collector's config file. This is typically expected in any non-trivial infrastructure where the same Collector executable can be used as a locally running agent on a host, as a standalone gateway that serves as an intermediary between agents and the backends, as part of Kubernetes operator, etc. The roles in these cases are sufficiently different to warrant different logical names.

However, there is currently no semantic convention for an attribute that specifies the type of a service that may have different logical roles when used in different places in the infrastructure, yet be identically produced, i.e. be the exact same executable. The executable file name to some extent can serve that purpose but nothing prevents different service types from having the same executable file name, it has poor uniqueness guarantees.

This [issue](open-telemetry#396) talk a bit more about why we would want to have the type of an agent (Otel Collector in our case) to be a well-defined semantic convention.

This [issue](open-telemetry/opamp-spec#131) shows how the agent type would be useful in the context of agent management. The issue talks about how it is important to tie agent's own telemetry's Resource to the attributes that identify that agent in the context of the OpAMP protocol.

Proposed Change
===============

This change adds `service.type` as Recommended, experimental Resource semantic convention.

The value is a string in reverse domain notation that uniquely identifies the type of the service, e.g. io.opentelemetry.collector, io.redis, etc. Unlike (service.namespace,service.name,service.instance.id) triplet the (service.namespace,service.type,service.instance.id) triplet is not guaranteed to be globally unique.

Note that having a separate `service.type` allows OpAMP if wanted by the operator to manage the same type of agents in a similar way even though their `service.name` values may be different due to different logical roles they have.

Another example with NGINX: `service.type` will be set to com.nginx by NGINX developers, while `service.name` is set to "api-gateway" by the operator, denoting the logical role that the particular NGINX deployment serves in this particular system.
tigrannajaryan added a commit to tigrannajaryan/semantic-conventions that referenced this issue Dec 1, 2023
Contributes to open-telemetry#554

Contributes to open-telemetry#396

Contributes to open-telemetry/opamp-spec#131

Problem Description
===================

`service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service)
as the "Logical name of the service". The expectation is that `service.name` will be set
by the operator of the service to a value that describes the role of the service
in the overall observable set of entities the operator has (within a `service.namespace`).

Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276)
`service.name` by default to be the name of the executable (e.g. otelcorecol or
otelcontribcol).

Collector's `service.name` can be overridden by the operator using
`service.telemetry.resource` setting of Collector's config file. This is typically
expected in any non-trivial infrastructure where the same Collector executable can
be used as a locally running agent on a host, as a standalone gateway that serves
as an intermediary between agents and the backends, as part of Kubernetes operator,
etc. The roles in these cases are sufficiently different to warrant different
logical names.

However, there is currently no semantic convention for an attribute that
specifies the type of a service that may have different logical roles when
used in different places in the infrastructure, yet be identically produced,
i.e. be the exact same executable. The executable file name to some extent
can serve that purpose but nothing prevents different service types from having
the same executable file name, it has poor uniqueness guarantees.

This [issue](open-telemetry#396)
talks a bit more about why we would want to have the type of an agent (Otel
Collector in our case) to be a well-defined semantic convention.

This [issue](open-telemetry/opamp-spec#131) shows
how the agent type would be useful in the context of agent management. The issue
talks about how it is important to tie agent's own telemetry's Resource to the
attributes that identify that agent in the context of the OpAMP protocol.

Changes
=======

This change adds `service.type` as a Recommended, experimental Resource
semantic convention.

The value is a string in reverse domain notation that uniquely identifies
the type of the service (the type of the product deployed as the service),
e.g. io.opentelemetry.collector, io.redis, etc.
Unlike (service.namespace,service.name,service.instance.id) triplet
the (service.namespace,service.type,service.instance.id) triplet is not
guaranteed to be globally unique.

For OpAMP having a separate `service.type` allows OpAMP, if desired by the
operator, to manage the same type of agents in a similar way even though
their `service.name` values may be different due to different logical roles they have.

An example unrelated to OpAMP, when using NGINX: `service.type` will be set
to "com.nginx", while `service.name` is set to "api-gateway", denoting the
logical role that the particular NGINX deployment serves in this particular system.
tigrannajaryan added a commit to tigrannajaryan/semantic-conventions that referenced this issue Dec 1, 2023
Contributes to open-telemetry#554

Contributes to open-telemetry#396

Contributes to open-telemetry/opamp-spec#131

Problem Description
===================

`service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service)
as the "Logical name of the service". The expectation is that `service.name` will be set
by the operator of the service to a value that describes the role of the service
in the overall observable set of entities the operator has (within a `service.namespace`).

Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276)
`service.name` by default to be the name of the executable (e.g. otelcorecol or
otelcontribcol).

Collector's `service.name` can be overridden by the operator using
`service.telemetry.resource` setting of Collector's config file. This is typically
expected in any non-trivial infrastructure where the same Collector executable can
be used as a locally running agent on a host, as a standalone gateway that serves
as an intermediary between agents and the backends, as part of Kubernetes operator,
etc. The roles in these cases are sufficiently different to warrant different
logical names.

However, there is currently no semantic convention for an attribute that
specifies the type of a service that may have different logical roles when
used in different places in the infrastructure, yet be identically produced,
i.e. be the exact same executable. The executable file name to some extent
can serve that purpose but nothing prevents different service types from having
the same executable file name, it has poor uniqueness guarantees.

This [issue](open-telemetry#396)
talks a bit more about why we would want to have the type of an agent (Otel
Collector in our case) to be a well-defined semantic convention.

This [issue](open-telemetry/opamp-spec#131) shows
how the agent type would be useful in the context of agent management. The issue
talks about how it is important to tie agent's own telemetry's Resource to the
attributes that identify that agent in the context of the OpAMP protocol.

Changes
=======

This change adds `service.type` as a Recommended, experimental Resource
semantic convention.

The value is a string in reverse domain notation that uniquely identifies
the type of the service (the type of the product deployed as the service),
e.g. io.opentelemetry.collector, io.redis, etc.
Unlike (service.namespace,service.name,service.instance.id) triplet
the (service.namespace,service.type,service.instance.id) triplet is not
guaranteed to be globally unique.

For OpAMP having a separate `service.type` allows OpAMP, if desired by the
operator, to manage the same type of agents in a similar way even though
their `service.name` values may be different due to different logical roles they have.

An example unrelated to OpAMP, when using NGINX: `service.type` will be set
to "com.nginx", while `service.name` is set to "api-gateway", denoting the
logical role that the particular NGINX deployment serves in this particular system.
@tigrannajaryan
Copy link
Member Author

PR for service.type: #575

tigrannajaryan added a commit to tigrannajaryan/semantic-conventions that referenced this issue Jan 26, 2024
Contributes to open-telemetry#554

Contributes to open-telemetry#396

Contributes to open-telemetry/opamp-spec#131

Problem Description
===================

`service.name` Resource attribute is [currently defined](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/resource#service)
as the "Logical name of the service". The expectation is that `service.name` will be set
by the operator of the service to a value that describes the role of the service
in the overall observable set of entities the operator has (within a `service.namespace`).

Otel Collector [sets](https://github.com/open-telemetry/opentelemetry-collector/blob/7e3e725a2952728560b9f5f71867ad6358b1977f/service/service.go#L276)
`service.name` by default to be the name of the executable (e.g. otelcorecol or
otelcontribcol).

Collector's `service.name` can be overridden by the operator using
`service.telemetry.resource` setting of Collector's config file. This is typically
expected in any non-trivial infrastructure where the same Collector executable can
be used as a locally running agent on a host, as a standalone gateway that serves
as an intermediary between agents and the backends, as part of Kubernetes operator,
etc. The roles in these cases are sufficiently different to warrant different
logical names.

However, there is currently no semantic convention for an attribute that
specifies the type of a service that may have different logical roles when
used in different places in the infrastructure, yet be identically produced,
i.e. be the exact same executable. The executable file name to some extent
can serve that purpose but nothing prevents different service types from having
the same executable file name, it has poor uniqueness guarantees.

This [issue](open-telemetry#396)
talks a bit more about why we would want to have the type of an agent (Otel
Collector in our case) to be a well-defined semantic convention.

This [issue](open-telemetry/opamp-spec#131) shows
how the agent type would be useful in the context of agent management. The issue
talks about how it is important to tie agent's own telemetry's Resource to the
attributes that identify that agent in the context of the OpAMP protocol.

Changes
=======

This change adds `service.type` as a Recommended, experimental Resource
semantic convention.

The value is a string in reverse domain notation that uniquely identifies
the type of the service (the type of the product deployed as the service),
e.g. io.opentelemetry.collector, io.redis, etc.
Unlike (service.namespace,service.name,service.instance.id) triplet
the (service.namespace,service.type,service.instance.id) triplet is not
guaranteed to be globally unique.

For OpAMP having a separate `service.type` allows OpAMP, if desired by the
operator, to manage the same type of agents in a similar way even though
their `service.name` values may be different due to different logical roles they have.

An example unrelated to OpAMP, when using NGINX: `service.type` will be set
to "com.nginx", while `service.name` is set to "api-gateway", denoting the
logical role that the particular NGINX deployment serves in this particular system.
@github-actions github-actions bot added the Stale label Feb 8, 2024
@joaopgrassi joaopgrassi removed the Stale label Feb 14, 2024
@tigrannajaryan
Copy link
Member Author

All, the PR that adds service.type is created, but I and others have doubts that this is the right way. Please comment on the PR with arguments in favour or against it.

@tigrannajaryan
Copy link
Member Author

Closing this as "won't do". We will likely instead propose agent.* semantic conventions.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests