Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Support efficient "lastUpdate" endpoint #1570

Closed
shanson7 opened this issue Dec 10, 2019 · 12 comments · Fixed by #1580
Closed

Support efficient "lastUpdate" endpoint #1570

shanson7 opened this issue Dec 10, 2019 · 12 comments · Fixed by #1580
Assignees
Milestone

Comments

@shanson7
Copy link
Collaborator

We have a use case (detecting absence of metrics) where we want an efficient method of querying for a the lastUpdate value for series that match an expression. Specifically, we don't care what the last value was, just the most recent timestamp. This means that the query should be very efficient since it never needs to fetch any data.

Existing endpoint /tags/findSeries is close, but don't offer the lastUpdate value.

I am willing to implement the endpoint, but submitting the issue to kickoff design discussion / feedback.

@replay
Copy link
Contributor

replay commented Dec 10, 2019

how about we also add an optional meta=true parameter to the find endpoint (similar to the render endpoint) which would return additional meta data about the results? that way this doesn't require a new endpoint just to get meta data, and we also don't slow down the normal cluster internal find calls by adding unnecessary data to them because they wouldn't set the meta=true

@Dieterbe
Copy link
Contributor

or we support a "format" param similar to /metrics/find, and introduce a format for both of these that includes lastUpdate.

@shanson7
Copy link
Collaborator Author

I'm not sure meta=true as it it exists for the render endpoint would apply here. That returns internal stats/details at the top-level of the request. lastUpdate would apply to each series in the result individually.

I do agree that the implementation should not interfere with existing cluster operation. I'm also ok with it being a parameter to the existing endpoint.

@shanson7
Copy link
Collaborator Author

Currently, you can do:

tags/findSeries?expr=tag=value&format=json&from=1569913200

and get something like :

[
  "metric1;tag=value", "metric2;tag=value", ... , "metricN;tag=value"
]

If I understand @Dieterbe correctly, we might support something like:

tags/findSeries?expr=tag=value&format=json-full&from=1569913200

and get something similar to :

[
 { "name" : "metric1", "tags":["tag=value"], "interval": 30, "lastUpdate": 1569913260 },
...
]

@shanson7
Copy link
Collaborator Author

I we want something even more targeted (read: efficient JSON encoding):

tags/findSeries?expr=tag=value&format=json-lastUpdate&from=1569913200

and get something similar to :

{
  "series":  {
     "metric1;tag=value" : 1569913260,
     "metric2;tag=value" : 1569913230,
      ...
  }
}

@Dieterbe
Copy link
Contributor

i would be open to either or both (a full dump of each MetricDefinition or idx.Archive), or the id-lastUpdate pairs. Thoughts on messagepack? efficiency maybe not worth the extra hassle...

@shanson7
Copy link
Collaborator Author

shanson7 commented Dec 10, 2019

I'm ok with messagepack. It makes the integer encoding/decoding more efficient. Might only be a small portion of data in this case (compared to long series names). Maybe the format option (to choose msgp or json) should be left alone and a new option could be added for what fields to return?

Implementation note: The index/find_by_tag and index/find responses already includes tons of data including the lastUpdate, so that shouldn't need to change.

Edit: I was wrong. tags/findSeries does not currently support format

@Dieterbe
Copy link
Contributor

Dieterbe commented Dec 11, 2019

a new option could be added for what fields to return

this seems to verge on overkill.
I think the format is all we need. while ideally we wouldn't conflate serialization/encoding format with the concern of which data to return, IIRC historically the graphite has done so in some cases. e.g. 'completer' format assumes json IIRC, though it also has a format 'treejson' for example.
I would suggest to expand on this concept and provide formats of the form <datamodel><serialization> e.g. lastupdatejson and lastupdatemsgp, or definitionjson, definitionmsgp, etc.

@shanson7
Copy link
Collaborator Author

That's fair. I'll probably implement only the following initially:
series_json (this will be the default format) : Return the current format
definition_json : Full definition (more bloated, but good if we need all the series info)
lastupdate_json: Optimized for the particular use case of getting last update time

Names are definitely open to suggestions for improvement

@Dieterbe
Copy link
Contributor

i like the separator. i was thinking - originally but removed it for consistency with graphite, which is probably not that important.

@Dieterbe
Copy link
Contributor

Dieterbe commented Jan 6, 2020

The master implementation of *Server.graphiteTagFindSeries() seems weird.

	seriesNames := make([]string, 0, len(series))
	for _, serie := range series {
		seriesNames = append(seriesNames, serie.Pattern)
	}
	response.Write(ctx, response.NewJson(200, seriesNames, ""))

the strings it is returning is:

	Pattern string // pattern used for index lookup. typically user input like foo.{b,a}r.*

not the actual series name. heh :?

definition_json : Full definition (more bloated, but good if we need all the series info)

if this is something we don't need yet, i'm hesitant to commit to a particular API.
Note that findByTag and "/index/find_by_tag" return idx.Node which is richer than MetricDefinition. it contains extra info about the position in the tree, metatags, and information in idx.Archive about storage-schemas, aggregation-rules, index-rules and a lastSave timestamp. If you want an endpoint that contains "all the series info", this should be included as well. But we may want to refactor the index or at least these structures, so i'm hesitant to commit to their representation.
Why do you want "all the series info"?

I see that your PR doesn't add this, so I think we're good then.

@shanson7
Copy link
Collaborator Author

shanson7 commented Jan 6, 2020

Yeah, I punted on the full definition, since it's a non-trivial structure (given that there can be multiple definitions for a single series name). But, the format support is there.

@Dieterbe Dieterbe added this to the sprint-6 milestone Jan 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants