Add database resource specification #2916

portertech · 2022-11-03T18:00:52Z

What are you trying to achieve?

Add a database resource specification for attributes that are consistent and identifying across all database receivers in the collector.

Additional context.

We (Sumo Logic) are looking to create contrib and spec issues to propose adding resource attribute(s) to mysqlreceiver, please see resource_attributes. Specifically, a human-friendly identifier of the database to be used for filtering etc. Tracing uses db.name, but it has a caveat: “In some SQL databases, the database name to be used is called “schema name”. In case there are multiple layers that could be considered for database name (e.g. Oracle instance name and schema name), the database name to be used is the more specific layer (e.g. Oracle schema name)“. I suspect this is fine in the case of MySQL, since “In MySQL, physically, a schema is synonymous with a database”. This exploration lead to the realization that there's a need for a database resource specification.

Related mysqlreceiver metrics effort.

Possible Database Resource Attributes

Considering the following attributes at this time:

`db.cluster.name`

Name of the database cluster (configured manually by the user), it serves as a human-friendly database identifier.

`db.cluster.address`

Network address used by end users to connect to the database. There can be several addresses used to connect to a single database and they may be different than the address used to collect metrics. Tracing has net.peer.addr, but this attribute doesn’t make much sense in the metrics context.

`db.cluster.port`

Network port used by end users to connect to the database. Tracing has net.peer.port, but this attribute doesn’t make much sense in the metrics context.

Current Concerns

Prometheus Exporter

Raised by @jsuereth:

Specification of export of resource attributes: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md#resource-attributes-1

Implications:

For a database metric, e.g. database.commits would not have the cluster name, cluster address, cluster port, etc.
There would be a target_info metric where those exist and you have to link the two metrics together

The text was updated successfully, but these errors were encountered:

portertech · 2022-11-14T17:25:58Z

@djaglowski I'm curious to hear your thoughts on this and get your take on the possible attributes that I've proposed above.

djaglowski · 2022-11-14T19:21:27Z

I may be in the minority on this but I am very skeptical that any broad set of database technologies can share much of a metadata model without requiring an unreasonable number of caveats and disclaimers. The architectures don't quite line up, so the attributes won't line up either. There are enough similarities to make this attractive, but I would argue that the model will not generalize well, that the exercise of developing it will be unnecessarily problematic, and that the utility of the result will be low.

In my opinion, we'd be better off defining a set of attributes per technology, each highly accurate to that one specific technology.

but it has a caveat: “In some SQL databases, the database name to be used is called “schema name”. In case there are multiple layers that could be considered for database name (e.g. Oracle instance name and schema name), the database name to be used is the more specific layer (e.g. Oracle schema name)“

This is a great example of the type of problem I would expect to see many times over with the "unified" approach. Seemingly similar architectures quickly diverge when you get into the details. The model becomes less useful as you add more technologies, because you end up having to add more and more caveats and special accommodations.

All that said, I don't want to stand in the way if others in the community want to pursue this approach. My suggestion then would be to work through a reasonable cross section of databases and the architectural elements we may wish to represent in a unified model.

What are the broad architectural categories that should be included or excluded from the model? (e.g. relational, columnar, nosql, cloud, graph, in-memory, etc)
What are the cross-cutting elements that we expect to include in a unified data model? (e.g. clusters, nodes, instances, schemas, databases, tables, indices, etc)

We don't have to pin everything out up front, but I think we should prove to ourselves that this approach is tractable. To me, that means we can do the following:

Define a list of representative database technologies which we believe could share meaningful parts of this unified model.
Identify a list of architectural elements that apply to a at least a proportion of these databases.
Articulate how each database in list 1 would map its architectural elements into those defined in list 2.

What is primarily important in my opinion is the degree to which we find ourselves trying to fit square pegs into round holes. This should help clarify the extent to which a unified model may be useful.

Database Type	Cluster	Node	Instance	Database	Schema	Table	Index
mysql
postgresql
oracle
sqlserver
cassandra
couchbase
mongodb
elasticsearch
hbase
redis
aerospike

portertech added the spec:resource Related to the specification/resource directory label Nov 3, 2022

github-actions bot assigned jsuereth Nov 3, 2022

portertech mentioned this issue Nov 3, 2022

[receiver/mysql] Add database resource attributes open-telemetry/opentelemetry-collector-contrib#16063

Closed

arminru added the area:semantic-conventions Related to semantic conventions label Nov 8, 2022

niuxiaojuan-github mentioned this issue Sep 18, 2023

Project Proposal: Database server-side semantic conventions open-telemetry/community#1678

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add database resource specification #2916

Add database resource specification #2916

portertech commented Nov 3, 2022 •

edited

Loading

portertech commented Nov 14, 2022

djaglowski commented Nov 14, 2022

Add database resource specification #2916

Add database resource specification #2916

Comments

portertech commented Nov 3, 2022 • edited Loading

Possible Database Resource Attributes

db.cluster.name

db.cluster.address

db.cluster.port

Current Concerns

Prometheus Exporter

portertech commented Nov 14, 2022

djaglowski commented Nov 14, 2022

portertech commented Nov 3, 2022 •

edited

Loading

`db.cluster.name`

`db.cluster.address`

`db.cluster.port`