add support for seriesByTag #774

shanson7 · 2017-11-29T19:00:06Z

This PR adds support for seriesByTag natively into metrictank.

The change was a bit more intrusive than I would have hoped.

Change idx to return nodes (similar to Find)
Use either findSeries or clusterFindByTag dependent on if seriesByTag is used or old pattern expressions.
Add the new tags element to the response JSON

WIP - Series by tag

Dieterbe

not done reviewing yet. some early notes.
also it's not clear to me why we need peerQueryWithPeer; the only caller of it is s.clusterFindByTag and when I trace the uses of that function I don't see what we use the node property for?

Dieterbe · 2017-11-29T19:52:45Z

idx/memory/memory.go

@@ -541,8 +541,8 @@ KEYS:
 // resolveIDs resolves a list of ids (TagIDs) into a list of complete
 // metric names, including tags. it assumes that at least a read lock


out of date doc

Dieterbe · 2017-11-29T19:53:44Z

idx/idx.go

@@ -139,7 +139,7 @@ type MetricIndex interface {
 	// where the LastUpdate time is >= from will be returned as results.


out of date doc

Dieterbe · 2017-11-30T01:06:58Z

idx/memory/memory.go

+		res[i] = idx.Node{
+			Path:        def.NameWithTags(),
+			Leaf:        true,
+			HasChildren: false,


this seems incorrect. why are we hardcoding these values?
You can see in Find() and find how it's done. basically pull the memory.Node out of tree.Items via the path and then :

idxNode := idx.Node{ Path: n.Path, Leaf: n.Leaf(), HasChildren: n.HasChildren(), } if idxNode.Leaf { idxNode.Defs = make([]idx.Archive, 0, len(n.Defs)) for _, id := range n.Defs { (...)

Dieterbe · 2017-11-30T02:43:49Z

expr/func_aggregate.go

@@ -48,11 +48,27 @@ func (s *FuncAggregate) Exec(cache map[Req][]models.Series) ([]models.Series, er
 	out := pointSlicePool.Get().([]schema.Point)
 	s.agg.function(series, &out)

+	// Build up the set of common tags to add to the tags map


the set of the what to add to the what? this needs some clarifications.
something like: the tag for the aggregated series is only the tags that are common to all input series ?

Dieterbe · 2017-11-30T03:03:12Z

expr/func_aliasbynode.go

@@ -33,6 +33,9 @@ func (s *FuncAliasByNode) Exec(cache map[Req][]models.Series) ([]models.Series,
 	}
 	for i, serie := range series {
 		metric := extractMetric(serie.Target)
+		if len(metric) == 0 {


in which case do we expect this (meaning specifically: get a "" back and it being valid) to happen?

for something like aliasByNode(sumSeries(seriesByTag('host=~h.*9','name=~cpu.*')),0) there is no metric name / path expression to extract. In this case, the function expression is perfectly valid, but we can't pick out the name. Now, it's possible that there is one matching name (e.g. cpu.percent.idle) in which case we can pick it out of the name tag. This is a fairly contrived example, but it basically matches the behavior of this recent graphite change

Except now that I'm looking, I see that I forgot to split on ;

makes sense, please add that as a comment in the code.

Dieterbe · 2017-11-30T03:10:55Z

expr/func_get.go

+		serie.Tags["name"] = tagSplits[0]
+		out = append(out, serie)
+	}
+


why put all this logic in this function? seems like the series should already be in this shape when they are stored in the cache, before processing functions are run.

It seemed like the least intrusive place to put it. It could be done after the data fetch in graphite.go::executePlan, but I already added more explicit handling logic there than I liked. I could add a function similar to dataprocessor.go::mergeSeries that does the tag extraction.

What do you think?

I think this should go before the call to plan.Run, not in the get function.
the expr functions tasks are to pass through series from the input to the output, potentially performing operations on them as the data passes through. not to set up initial state for the series.
correct me if i'm wrong, but this looks like initial state for the series that should be set up once

BTW, sidenode in executePlan, the call to mergeSeries, do we need to do anything to make it work with tags? the background of that function is when you switch the interval setting of a series, you end up with two different metricdefs for the same series, so they are merged. but obviously we shouldn't merge all series together that have the same name but different tags.

Not sure. Many of the expr functions modify the Tags, so I don't think it's out of place for the get function to create them in the first place. Of course, it's definitely doable to just move it before plan.Run

As far as mergeSeries, the target is the NameWithTags(), so it wouldn't over merge.

As far as mergeSeries, the target is the NameWithTags(), so it wouldn't over merge.

perfect :)

Not sure. Many of the expr functions modify the Tags, so I don't think it's out of place for the get function to create them in the first place. Of course, it's definitely doable to just move it before plan.Run

keep in mind that FuncGet is not always the "source of data". for certain unit tests it's FuncMock in func_mock_test.go (grep the tests for NewMock). should this function then also set the initial tags? what if the source is another function? I think there's a difference between modifying the tag set like as in FuncAggregate.Exec and setting the initial tags.

FuncMock also bypasses the cache, so setting the proper value in the cache won't make a difference. In that case, the tags would need to be passed in.

We could add a function to something like dataprocessor.go (extractTags or something) and call that from either:

server::executePlan and FuncMock::Exec

FuncGet::Exec and FuncMock::Exec

server::executePlan or FuncGet::Exec and leave the callers of FuncMock::Exec to specify test tags explicitly.

FuncMock also bypasses the cache, so setting the proper value in the cache won't make a difference. In that case, the tags would need to be passed in.

cache is nothing more than "all the series that have been loaded and are ready to go into the processing chain". same thing for FuncMock's data attribute: a series with some data ready to go into the processing chain. the implementation is a bit different because of different situation (live requests in running process vs unit tests) but conceptually, very similar.

how about we add it as a method in api/models/series.go, like Series.SetTags()
I'm OK with 2 as long as you add a note to the Tags Field on the Series model that the property must be set by calling the SetTags method and then it should become obvious for the reader, if they search for SetTags they can see where it is set.

leave the callers of FuncMock::Exec to specify test tags explicitly.

i also thought about this, but i don't think we should add this concern into the tests for processing functions. they should be able to assume tags are correctly set.

Dieterbe · 2017-11-30T03:13:14Z

expr/parse.go

@@ -337,21 +337,33 @@ func parseString(s string) (string, string, error) {
 	return s[:i], s[i+1:], nil
 }

-// exctractMetric searches for a metric name in `m'
+// extractMetric searches for a metric name in `m'
 // metric name is defined to be a series of name characters terminated by a comma


this function has never been well documented, but TestExtractMetric makes up for a lot. the changes you're doing warrant updates to TestExtractMetric though, just so that's clearer what the extra stuff is for
(and maybe it's time to just document this function)

Dieterbe · 2017-11-30T03:17:15Z

expr/plan.go

@@ -98,8 +98,20 @@ func newplan(e *expr, context Context, stable bool, reqs []Req) (GraphiteFunc, [
 		req := NewReq(e.str, context.from, context.to, context.consol)
 		reqs = append(reqs, req)
 		return NewGet(req), reqs, nil
+	} else if e.etype == etFunc && e.str == "seriesByTag" {


this needs comments about why we need this special case. why can't we just use the etFunc handling below?

That's fair. I toyed around and half implemented a version that adds the seriesByTag function but I couldn't figure out how to properly integrate it.

The only solution I came up with was checking if plan.Funcs[0].(type) was a seriesByTag function.

I'm definitely open to suggestions here, as I do not like this particular section of code (nor the other side that parses it again).

shanson7 · 2017-11-30T15:09:06Z

I'll go ahead and update the docs and add some tests for extractMetrics. I'll also dig a little on moving seriesByTag logic into a func_seriesByTag file.

Re: peerQueryWithPeer is basically a refactor of the code that is used in findSeries where we need to know which peer found which series. This info is used in Server::getTargets to target the nodes for their data. My plan was to submit a follow up PR that modified findSeries to use peerQueryWithPeer, as this PR was already large enough, IMO.

Dieterbe · 2017-11-30T15:22:25Z

i compared peerQueryWithPeer to peerQuery and saw that they are exactly the same, minus the tracking of the peer. If we need this peer-tracking, i would suggest to just add it to peerQuery (and ignore the peer/node info where we don't need it). but i'm still not clear on why we need this at all. I don't see it being used in Server::getTargets, can you point me to it, or will this be in the next PR?

shanson7 · 2017-11-30T15:32:37Z

In Server::clusterFindByTag r.Peer gets saved into series.Node and is used in Server::getTargets to build up the remoteReqs for each node.

I'm ok with making it just one function, but right now findSeries doesn't use peerQuery (I think only because it needs the node info). So, that would still need to be modified to use peerQuery.

Dieterbe · 2017-11-30T16:50:16Z

ah now i'm with you.

but right now findSeries doesn't use peerQuery (I think only because it needs the node info)

Te reason is simply that findSeries predates peerQuery. the latter was introduced only a month ago as @replay used it for his tag api endpoint stuff (as part of #750) If porting other functions such as findSeries to use it, results in better code, that's fine with me. but it's also low priority as far as i'm concerned. it's certainly sub-optimal that in master we have 2 different approaches of talking to peers and i'ld like to smooth this out at some point. my suggestion was mainly to not introduce a 3rd different approach, and basically move peerQueryWithPeer's implementation into the current peerQuery function, and adjust callers of peerQuery where needed (not findSeries as it doesn't use it)

shanson7 · 2017-11-30T21:35:04Z

The only part that I'm not fully satisfied with is how the seriesByTag string is parsed from string/encoded to string/parsed from string. I'm not sure how exactly to create the request base for the seriesByTag without making more significant changes to the Plan struct (which I'm also open to).

Dieterbe · 2017-12-01T22:46:02Z

do we have to update expr/func_aliassub.go where it calls extractMetric and the returned metric may be "" ?

shanson7 · 2017-12-01T22:58:14Z

do we have to update expr/func_aliassub.go where it calls extractMetric and the returned metric may be "" ?

Ah, interesting. Actually a look at graphite-web and a quick A/B test indicate that MT should not be extracting the metric name.

Dieterbe · 2017-12-01T23:04:25Z

The only part that I'm not fully satisfied with is how the seriesByTag string is parsed from string/encoded to string/parsed from string. I'm not sure how exactly to create the request base for the seriesByTag without making more significant changes to the Plan struct (which I'm also open to).

i'm not sure if this helps your question at all, but personally i wonder if things wouldn't be more elegant if getting series by tag expressions was a more.. "native" experience tied into the existing expression system.
e.g. instead of sumSeries(seriesByTag("name=cpu.*", "server=~a[1-9]+") something like sumSeries(cpu.* server=~a[1-9]+). and instead of avg(seriesByTag("foo=bar")) something like avg(foo=bar) (clearly i left out a few details here about how to unambiguously determine whether an expression is a "classic" expression vs a tag based one, but just talking about the general concept here.
getting series via a function call instead of a "plain" expression, seems inconsistent with the previous behavior where function calls were just for data processing. and our code base isn't well prepared for it, which seems to be the cause of all these weird exceptions and extra complications we have to add to the code. @DanCech any thoughts? surely you've spent some time thinking about this stuff.

DanCech · 2017-12-01T23:12:29Z

Once you start getting into the implications of handling spaces and dealing with deciding whether a given string is a regular path expression or a set of tag expressions etc I suspect you'll find that it's a lot more complicated than handling seriesByTag

DanCech · 2017-12-01T23:20:08Z

side note: in https://github.com/graphite-project/graphite-web/pull/2128/files#diff-7e4c8d4b570e5ede35fc26b9e4eec8e5 you can see how I'm actually changing graphite-web to handle seriesByTag specially rather than calling it as a function, which helps us to avoid the previous duplicated tagdb lookup, as well as allowing us to pass seriesByTag through to finders that support it (which we'll use when MT supports it natively instead of using the http tagdb to make a request to get the resolved series for each seriesByTag expression, then sending the resolved lists of series back in the cluster render request)

Dieterbe · 2017-12-02T15:17:29Z

expr/func_aggregate.go

+	if len(series[0].Tags) == 0 {
+		commonTags = make(map[string]string, 0)
+	} else {
+		commonTags = series[0].Tags


any change made to commonTags will be reflected in series[0].Tags

Is that ok? Other functions modify the input series directly as well.

i looked through the code, turns out this was OK. because the map is set via Series.SetTags() and thus any changes to it are only visible in the scope of that http handler for the render request.

I changed it to copy the tags in f3da5d1 anyway

can you change it back to your first approach (it will be a more efficient and faster).
sorry that i set you on the wrong track :)

While likely true, the benchmarks show almost no difference between the approaches (since the other work greatly outweighs the cost of the map building). I like this approach, because it removes cognitive load (not having to worry about the lifetime of series tags or if other functions might need to reuse it, etc) as well as being a few lines of code smaller.

If you have strong feelings about it, I can change it back.

shanson7 · 2017-12-04T21:50:54Z

I feel like this PR is getting rather bloated. I do have follow up changes that support aliasByTags and groupByTags as well. What's left with this PR?

Dieterbe · 2017-12-04T22:20:56Z

idx/memory/memory.go

+			corruptIndex.Inc()
+			log.Error(3, "memory-idx: node %s missing. Index is corrupt.", name)
+			continue
+		}


note that because we went with the "set len up front, then fill in res[i] as we go" approach, in this case, as well as the above corruptIndex case, we will have an uninitialized slot in the slice for pos i which seems like it would probably break stuff. so instead, let's res := make([]idx.Node, 0, len(ids)) and then appendthe valid entries

shanson7 and others added 4 commits November 28, 2017 17:30

Support seriesByTag functionality

e42415e

Add the tags struct to return JSON

64f06e2

Trim quotes around expressions, conditionally allow equals sign in names

e4bd6fa

Merge pull request #3 from bloomberg/seriesByTag

dfb51fb

WIP - Series by tag

Dieterbe self-requested a review November 29, 2017 19:47

Dieterbe suggested changes Nov 30, 2017

View reviewed changes

Dieterbe reviewed Nov 30, 2017

View reviewed changes

shanson7 added 4 commits November 30, 2017 11:46

Improve/fix comments and function documentation

79054ca

Fill in idx.Node with correct values in resolveIds

faa0d3b

Fix string literal parsing, add tests

ac56433

Parse off tags before splitting for aliasByNode

28a3861

shanson7 added 3 commits November 30, 2017 12:17

Replace peerQuery impl with peerQueryWithPeer

0001cfe

Move logic fo parsing and building series tags into Series struct

2412963

Add comment explaining seriesByTag special case

e9fb7db

Dieterbe reviewed Dec 2, 2017

View reviewed changes

Copy map contents rather than reference

f3da5d1

shanson7 added 3 commits December 4, 2017 13:51

Add tags to divideSeries, use Target instead of pattern, add tests

52961d2

Use target if we fail to extract the metric expression

8949030

Fix test case

087c212

Dieterbe reviewed Dec 4, 2017

View reviewed changes

Fix possible invalid array elements in corruption case

3bdadaf

Dieterbe approved these changes Dec 4, 2017

View reviewed changes

Dieterbe merged commit 6050dd1 into grafana:master Dec 4, 2017

shanson7 deleted the feature_seriesByTag branch December 5, 2017 15:40

		@@ -541,8 +541,8 @@ KEYS:
		// resolveIDs resolves a list of ids (TagIDs) into a list of complete
		// metric names, including tags. it assumes that at least a read lock

		@@ -139,7 +139,7 @@ type MetricIndex interface {
		// where the LastUpdate time is >= from will be returned as results.

add support for seriesByTag #774

add support for seriesByTag #774

Conversation

shanson7 commented Nov 29, 2017

Dieterbe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe Nov 30, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe Nov 30, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shanson7 commented Nov 30, 2017

Dieterbe commented Nov 30, 2017

shanson7 commented Nov 30, 2017

Dieterbe commented Nov 30, 2017 • edited Loading

shanson7 commented Nov 30, 2017

Dieterbe commented Dec 1, 2017

shanson7 commented Dec 1, 2017 • edited Loading

Dieterbe commented Dec 1, 2017 • edited Loading

DanCech commented Dec 1, 2017

DanCech commented Dec 1, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shanson7 commented Dec 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe Nov 30, 2017 •

edited

Loading

Dieterbe Nov 30, 2017 •

edited

Loading

Dieterbe commented Nov 30, 2017 •

edited

Loading

shanson7 commented Dec 1, 2017 •

edited

Loading

Dieterbe commented Dec 1, 2017 •

edited

Loading