Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

add support for seriesByTag #774

Merged
merged 16 commits into from
Dec 4, 2017
Merged

Conversation

shanson7
Copy link
Collaborator

This PR adds support for seriesByTag natively into metrictank.

The change was a bit more intrusive than I would have hoped.

  1. Change idx to return nodes (similar to Find)
  2. Use either findSeries or clusterFindByTag dependent on if seriesByTag is used or old pattern expressions.
  3. Add the new tags element to the response JSON

@Dieterbe Dieterbe self-requested a review November 29, 2017 19:47
Copy link
Contributor

@Dieterbe Dieterbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not done reviewing yet. some early notes.
also it's not clear to me why we need peerQueryWithPeer; the only caller of it is s.clusterFindByTag and when I trace the uses of that function I don't see what we use the node property for?

@@ -541,8 +541,8 @@ KEYS:
// resolveIDs resolves a list of ids (TagIDs) into a list of complete
// metric names, including tags. it assumes that at least a read lock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of date doc

@@ -139,7 +139,7 @@ type MetricIndex interface {
// where the LastUpdate time is >= from will be returned as results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of date doc

res[i] = idx.Node{
Path: def.NameWithTags(),
Leaf: true,
HasChildren: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems incorrect. why are we hardcoding these values?
You can see in Find() and find how it's done. basically pull the memory.Node out of tree.Items via the path and then :

			idxNode := idx.Node{
				Path:        n.Path,
				Leaf:        n.Leaf(),
				HasChildren: n.HasChildren(),
			}
			if idxNode.Leaf {
				idxNode.Defs = make([]idx.Archive, 0, len(n.Defs))
				for _, id := range n.Defs {
(...)

@@ -48,11 +48,27 @@ func (s *FuncAggregate) Exec(cache map[Req][]models.Series) ([]models.Series, er
out := pointSlicePool.Get().([]schema.Point)
s.agg.function(series, &out)

// Build up the set of common tags to add to the tags map
Copy link
Contributor

@Dieterbe Dieterbe Nov 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the set of the what to add to the what? this needs some clarifications.
something like: the tag for the aggregated series is only the tags that are common to all input series ?

@@ -33,6 +33,9 @@ func (s *FuncAliasByNode) Exec(cache map[Req][]models.Series) ([]models.Series,
}
for i, serie := range series {
metric := extractMetric(serie.Target)
if len(metric) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in which case do we expect this (meaning specifically: get a "" back and it being valid) to happen?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for something like aliasByNode(sumSeries(seriesByTag('host=~h.*9','name=~cpu.*')),0) there is no metric name / path expression to extract. In this case, the function expression is perfectly valid, but we can't pick out the name. Now, it's possible that there is one matching name (e.g. cpu.percent.idle) in which case we can pick it out of the name tag. This is a fairly contrived example, but it basically matches the behavior of this recent graphite change

Except now that I'm looking, I see that I forgot to split on ;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, please add that as a comment in the code.

expr/func_get.go Outdated
serie.Tags["name"] = tagSplits[0]
out = append(out, serie)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why put all this logic in this function? seems like the series should already be in this shape when they are stored in the cache, before processing functions are run.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed like the least intrusive place to put it. It could be done after the data fetch in graphite.go::executePlan, but I already added more explicit handling logic there than I liked. I could add a function similar to dataprocessor.go::mergeSeries that does the tag extraction.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should go before the call to plan.Run, not in the get function.
the expr functions tasks are to pass through series from the input to the output, potentially performing operations on them as the data passes through. not to set up initial state for the series.
correct me if i'm wrong, but this looks like initial state for the series that should be set up once

BTW, sidenode in executePlan, the call to mergeSeries, do we need to do anything to make it work with tags? the background of that function is when you switch the interval setting of a series, you end up with two different metricdefs for the same series, so they are merged. but obviously we shouldn't merge all series together that have the same name but different tags.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. Many of the expr functions modify the Tags, so I don't think it's out of place for the get function to create them in the first place. Of course, it's definitely doable to just move it before plan.Run

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as mergeSeries, the target is the NameWithTags(), so it wouldn't over merge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as mergeSeries, the target is the NameWithTags(), so it wouldn't over merge.

perfect :)

Not sure. Many of the expr functions modify the Tags, so I don't think it's out of place for the get function to create them in the first place. Of course, it's definitely doable to just move it before plan.Run

keep in mind that FuncGet is not always the "source of data". for certain unit tests it's FuncMock in func_mock_test.go (grep the tests for NewMock). should this function then also set the initial tags? what if the source is another function? I think there's a difference between modifying the tag set like as in FuncAggregate.Exec and setting the initial tags.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FuncMock also bypasses the cache, so setting the proper value in the cache won't make a difference. In that case, the tags would need to be passed in.

We could add a function to something like dataprocessor.go (extractTags or something) and call that from either:

  1. server::executePlan and FuncMock::Exec
  2. FuncGet::Exec and FuncMock::Exec
  3. server::executePlan or FuncGet::Exec and leave the callers of FuncMock::Exec to specify test tags explicitly.

Copy link
Contributor

@Dieterbe Dieterbe Nov 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FuncMock also bypasses the cache, so setting the proper value in the cache won't make a difference. In that case, the tags would need to be passed in.

cache is nothing more than "all the series that have been loaded and are ready to go into the processing chain". same thing for FuncMock's data attribute: a series with some data ready to go into the processing chain. the implementation is a bit different because of different situation (live requests in running process vs unit tests) but conceptually, very similar.

how about we add it as a method in api/models/series.go, like Series.SetTags()
I'm OK with 2 as long as you add a note to the Tags Field on the Series model that the property must be set by calling the SetTags method and then it should become obvious for the reader, if they search for SetTags they can see where it is set.

leave the callers of FuncMock::Exec to specify test tags explicitly.

i also thought about this, but i don't think we should add this concern into the tests for processing functions. they should be able to assume tags are correctly set.

expr/parse.go Outdated
@@ -337,21 +337,33 @@ func parseString(s string) (string, string, error) {
return s[:i], s[i+1:], nil
}

// exctractMetric searches for a metric name in `m'
// extractMetric searches for a metric name in `m'
// metric name is defined to be a series of name characters terminated by a comma
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function has never been well documented, but TestExtractMetric makes up for a lot. the changes you're doing warrant updates to TestExtractMetric though, just so that's clearer what the extra stuff is for
(and maybe it's time to just document this function)

@@ -98,8 +98,20 @@ func newplan(e *expr, context Context, stable bool, reqs []Req) (GraphiteFunc, [
req := NewReq(e.str, context.from, context.to, context.consol)
reqs = append(reqs, req)
return NewGet(req), reqs, nil
} else if e.etype == etFunc && e.str == "seriesByTag" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs comments about why we need this special case. why can't we just use the etFunc handling below?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I toyed around and half implemented a version that adds the seriesByTag function but I couldn't figure out how to properly integrate it.

The only solution I came up with was checking if plan.Funcs[0].(type) was a seriesByTag function.

I'm definitely open to suggestions here, as I do not like this particular section of code (nor the other side that parses it again).

@shanson7
Copy link
Collaborator Author

I'll go ahead and update the docs and add some tests for extractMetrics. I'll also dig a little on moving seriesByTag logic into a func_seriesByTag file.

Re: peerQueryWithPeer is basically a refactor of the code that is used in findSeries where we need to know which peer found which series. This info is used in Server::getTargets to target the nodes for their data. My plan was to submit a follow up PR that modified findSeries to use peerQueryWithPeer, as this PR was already large enough, IMO.

@Dieterbe
Copy link
Contributor

i compared peerQueryWithPeer to peerQuery and saw that they are exactly the same, minus the tracking of the peer. If we need this peer-tracking, i would suggest to just add it to peerQuery (and ignore the peer/node info where we don't need it). but i'm still not clear on why we need this at all. I don't see it being used in Server::getTargets, can you point me to it, or will this be in the next PR?

@shanson7
Copy link
Collaborator Author

In Server::clusterFindByTag r.Peer gets saved into series.Node and is used in Server::getTargets to build up the remoteReqs for each node.

I'm ok with making it just one function, but right now findSeries doesn't use peerQuery (I think only because it needs the node info). So, that would still need to be modified to use peerQuery.

@Dieterbe
Copy link
Contributor

Dieterbe commented Nov 30, 2017

ah now i'm with you.

but right now findSeries doesn't use peerQuery (I think only because it needs the node info)

Te reason is simply that findSeries predates peerQuery. the latter was introduced only a month ago as @replay used it for his tag api endpoint stuff (as part of #750) If porting other functions such as findSeries to use it, results in better code, that's fine with me. but it's also low priority as far as i'm concerned. it's certainly sub-optimal that in master we have 2 different approaches of talking to peers and i'ld like to smooth this out at some point. my suggestion was mainly to not introduce a 3rd different approach, and basically move peerQueryWithPeer's implementation into the current peerQuery function, and adjust callers of peerQuery where needed (not findSeries as it doesn't use it)

@shanson7
Copy link
Collaborator Author

The only part that I'm not fully satisfied with is how the seriesByTag string is parsed from string/encoded to string/parsed from string. I'm not sure how exactly to create the request base for the seriesByTag without making more significant changes to the Plan struct (which I'm also open to).

@Dieterbe
Copy link
Contributor

Dieterbe commented Dec 1, 2017

do we have to update expr/func_aliassub.go where it calls extractMetric and the returned metric may be "" ?

@shanson7
Copy link
Collaborator Author

shanson7 commented Dec 1, 2017

do we have to update expr/func_aliassub.go where it calls extractMetric and the returned metric may be "" ?

Ah, interesting. Actually a look at graphite-web and a quick A/B test indicate that MT should not be extracting the metric name.

@Dieterbe
Copy link
Contributor

Dieterbe commented Dec 1, 2017

The only part that I'm not fully satisfied with is how the seriesByTag string is parsed from string/encoded to string/parsed from string. I'm not sure how exactly to create the request base for the seriesByTag without making more significant changes to the Plan struct (which I'm also open to).

i'm not sure if this helps your question at all, but personally i wonder if things wouldn't be more elegant if getting series by tag expressions was a more.. "native" experience tied into the existing expression system.
e.g. instead of sumSeries(seriesByTag("name=cpu.*", "server=~a[1-9]+") something like sumSeries(cpu.* server=~a[1-9]+). and instead of avg(seriesByTag("foo=bar")) something like avg(foo=bar) (clearly i left out a few details here about how to unambiguously determine whether an expression is a "classic" expression vs a tag based one, but just talking about the general concept here.
getting series via a function call instead of a "plain" expression, seems inconsistent with the previous behavior where function calls were just for data processing. and our code base isn't well prepared for it, which seems to be the cause of all these weird exceptions and extra complications we have to add to the code. @DanCech any thoughts? surely you've spent some time thinking about this stuff.

@DanCech
Copy link
Contributor

DanCech commented Dec 1, 2017

Once you start getting into the implications of handling spaces and dealing with deciding whether a given string is a regular path expression or a set of tag expressions etc I suspect you'll find that it's a lot more complicated than handling seriesByTag

@DanCech
Copy link
Contributor

DanCech commented Dec 1, 2017

side note: in https://github.com/graphite-project/graphite-web/pull/2128/files#diff-7e4c8d4b570e5ede35fc26b9e4eec8e5 you can see how I'm actually changing graphite-web to handle seriesByTag specially rather than calling it as a function, which helps us to avoid the previous duplicated tagdb lookup, as well as allowing us to pass seriesByTag through to finders that support it (which we'll use when MT supports it natively instead of using the http tagdb to make a request to get the resolved series for each seriesByTag expression, then sending the resolved lists of series back in the cluster render request)

if len(series[0].Tags) == 0 {
commonTags = make(map[string]string, 0)
} else {
commonTags = series[0].Tags
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any change made to commonTags will be reflected in series[0].Tags

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that ok? Other functions modify the input series directly as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i looked through the code, turns out this was OK. because the map is set via Series.SetTags() and thus any changes to it are only visible in the scope of that http handler for the render request.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to copy the tags in f3da5d1 anyway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change it back to your first approach (it will be a more efficient and faster).
sorry that i set you on the wrong track :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While likely true, the benchmarks show almost no difference between the approaches (since the other work greatly outweighs the cost of the map building). I like this approach, because it removes cognitive load (not having to worry about the lifetime of series tags or if other functions might need to reuse it, etc) as well as being a few lines of code smaller.

If you have strong feelings about it, I can change it back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok np

@shanson7
Copy link
Collaborator Author

shanson7 commented Dec 4, 2017

I feel like this PR is getting rather bloated. I do have follow up changes that support aliasByTags and groupByTags as well. What's left with this PR?

corruptIndex.Inc()
log.Error(3, "memory-idx: node %s missing. Index is corrupt.", name)
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that because we went with the "set len up front, then fill in res[i] as we go" approach, in this case, as well as the above corruptIndex case, we will have an uninitialized slot in the slice for pos i which seems like it would probably break stuff. so instead, let's res := make([]idx.Node, 0, len(ids)) and then appendthe valid entries

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dieterbe Dieterbe merged commit 6050dd1 into grafana:master Dec 4, 2017
@shanson7 shanson7 deleted the feature_seriesByTag branch December 5, 2017 15:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants