Skip to content
This repository has been archived by the owner on Nov 8, 2022. It is now read-only.

RFC: Metric schema metadata #871

Closed
jcooklin opened this issue Apr 22, 2016 · 5 comments
Closed

RFC: Metric schema metadata #871

jcooklin opened this issue Apr 22, 2016 · 5 comments
Labels

Comments

@jcooklin
Copy link
Collaborator

jcooklin commented Apr 22, 2016

This spec propose the following changes related to the metric type.

  • Add (long) description describing metrics
  • Add (short) description of namespace elements
  • Add the field 'unit'
  • Remove the field 'source'
  • Replace the field label with the same behavior on the namespace itself

Key terms and concepts

  • Metric identifier - The namespace provides the identify for the metric.
  • Namespace - Uniquely identifies a metric using a tree structure (e.g. /intel/foo/bar/baz).
    • Static - Given a collector that exposes '/intel/sys/load'
      • Query: /intel/sys/load
    • Result: single metric metric{/intel/sys/load, 1}
    • Dynamic - Given a collector that expose /intel/cpu/*/llc/misses'
      • Query: /intel/cpu/*/llc/misses
    • Result: many; []metrics{metric{/intel/cpu/0/llc/misses, 123},metric{/intel/cpu/1/llc/misses, 456},...}
    • Query: /intel/cpu/0/llc/misses
    • Result: single metric metric{/intel/cpu/0/llc/misses, 123}
  • Tags - Tags are key value pair that can be added to the metric at any point along the collect → process → publish pipeline. It does not change or impact the metrics identity.
  • Labels - Labels assist in describing namespaces that contain data. They provide a means to decode the namespace identifying components of the namespace that specify grouping and aggregation.
    • Given the dynamic namespace "/intel/perf/pids/*/llc/misses" the associated label would be "label{name: "process id", index: 3}". The label is of particular interest when publishing to InfluxDB and other similar databases. For instance, given this example it is recommended that we publish to a series named /intel/perf/pids/llc/misses so that the metrics would be stored in a single series. The reason being that you can merge the metrics using using influxdb tags within a series/measurement but it is not possible to merge (or join) across measurements. In this example the series that would be written to would be /intel/perf/pids/llc/misses where an influxdb tags (process_id: xyz) could be created in the publisher to disambiguate the data points. In summary, labels provide a means to decode data from the namespace enabling the appropriate transformations to publish data to the given data source.
  • Source - The source field is the source for confusion (pun intended).
    This field does not contribute to the identity of the metric.

Use Cases

Use Cases

  • As a user I should be provided all available meta for each metric in the catalog.
  • As a user I should be provided a "long" description of the metric
    • Given a metric in the catalog the user should be provided what it measures, how it was derived and any other relevant details.
  • As a plugin author I should have a means to identify if there is data encoded in the namespace.
  • As a plugin author I should be able to communicate details about the metric that are stored in the metric cataloged and not sent along the collect → process → publish
    • This includes the long description of the metric as well as the (short) description stored in the label.
  • As a user I would like to the know the units for a given metric.
  • As a framework developer I would like to enforce that collectors provide a tag for units and that they are compliant with http://metrics20.org/spec/#units

Proposal

  • Add (long) description to the cataloged metric
  • Merge the behavior achieved through a label into the namespace
    • The namespace will be an array of NamespaceElements which can have fields for value, name and description.
  • Add the field unit to the plugin metric type
  • Remove the source field from the plugin metric type
    • What this field was communicating can be more effectively communicated through tags
  • Add standard tags
    • Currently the only standard tag is plugin_running_on
@thomastaylor312
Copy link
Contributor

@jcooklin This looks great. The only comment I have on it is about the source. I agree that the source should go, but shouldn't there be a field that says where the metric came from? I feel like where the metric came from is part of the identity of a metric.

An example from my use cases: A CPU measurement from a file server can mean something entirely different for us then a CPU from a server running jobs.

Thoughts?

@geauxvirtual
Copy link
Contributor

Source field in the metric has become quite the discussion topic as it has different meanings when a metric is collected on the host snapd is running on vs a metric collected through a proxy plugin. Identifying source through tags can provide a lot more meaningful metadata about where that metric was collected from that can not be easily done through a single source string field on the metric.

@IRCody
Copy link
Contributor

IRCody commented Apr 22, 2016

The source term is overloaded and I think it means different things to different people.

The data that is being collected should be uniquely identified by the namespace. In the example above you can imagine a namespace path like /path/goes/here/{hostname}/cpu. The dynamic parts of the namespace (in this example hostname) are always identified so that a plugin author can use that information. If a plugin author wanted to add meta-data (whatever that data is) they could do so via tags as @geauxvirtual suggests. It would be up to the plugin author to know what meta-data is relevant to each metric being collected.

@candysmurf
Copy link
Contributor

👍 Agreed, using tags is the great way to add a lot of resources into a plugin. It's up to the plugin authors.

@thomastaylor312
Copy link
Contributor

Yeah, I participated in that last discussion as well, I just think that where the metric actually came from (not the machine collecting them) is critical to identifying the metric. It sounds like it is just better to leave that up to the plugin authors though. We might want to call that out in the documentation though when we get to that point.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants