Autocomplete for tag keys and tag values #779

replay · 2017-12-07T16:48:29Z

This implements the autocomplete endpoints for tags and tag values.

In order to do this more efficiently the whole query execution has been changed from the previous "evaluate everything one-by-one and then consolidate" to something that's more like multi-threaded map/reduce, which can also be aborted in the case of tag autocomplete queries once a sufficient number of results has been collected.

Especially in the case of tag autocomplete this brings some pretty significant speed improvements.
This benchmark queries for the tag prefix di with expressions "metric=~.*_time$", "direction!=~re", "host=~host9[0-9]0" on an index with 1.68 million entries:

# Before abortable multi-threaded map/reduce
BenchmarkTagQueryKeysByPrefixExpressions-8   	      50	 200014598 ns/op	36032354 B/op	  602109 allocs/op
# After
BenchmarkTagQueryKeysByPrefixExpressions-8   	   10000	    923016 ns/op	  262924 B/op	    1960 allocs/op

DanCech · 2017-12-07T17:08:30Z

api/graphite.go

+func (s *Server) graphiteAutoCompleteTags(ctx *middleware.Context, request models.GraphiteAutoCompleteTags) {
+	tags, err := s.clusterAutoCompleteTags(ctx.Req.Context(), ctx.OrgId, request.TagPrefix, request.Expr, request.From, request.Limit)
+	if err != nil {
+		response.Write(ctx, response.WrapError(err))


This needs to use the same JSON response format as graphite {"error":"<message>"}

shanson7 · 2017-12-08T19:53:29Z

api/models/node.go

+	Tag       string   `json:"tag"`
+	Expr      []string `json:"expressions"`
+	From      int64    `json:"from"`
+	Limit     uint16   `json:"limit"`


I believe these default to 100 in graphite.

Yeah, in graphite the default is 100 if no limit value is given. If the given limit value is 0 then this will be accepted as a valid value.
In Macaron's request binding I can't see an easy way to make it behave exactly the same way because if no value is given it will be set to 0 and there's no None. But I cannot think of a case where somebody would actually want the limit to be 0 so I'd suggest that we'll just update 0 to a configurable default value (like in graphite).

Could you use binding:"Default(100)"?

Ah right, thx i've missed that somehow. Although with that the problem is that afaict we can't make the default value configurable anymore.
The solution I was looking at was via the Validate() method, but in that method the information whether a field was not set or if it was set to 0 is already lost.
Not sure what the preferred behavior is in that situation, I think making the default configurable is good and we want to keep that.

shanson7 · 2017-12-08T20:36:01Z

idx/memory/memory.go

+						}
+					}
+					res = append(res, tag)
+					if uint16(len(res)) >= limit {


Seems like this can return a single result for a limit of 0 (which is the current default value for limit). I'm not sure that limit of 0 is really valid.

with the above changes a limit of 0 isn't valid anymore

shanson7 · 2017-12-13T19:26:46Z

idx/memory/memory.go

+	// otherwise, the generation of the result set is much simpler
+	if len(expressions) > 0 {
+		if len(valPrefix) > 0 {
+			expressions = append(expressions, tag+"^="+valPrefix)


Should this be "=~"?

That's a special prefix operator that MT knows about, just used for this stuff right now but I might add it to graphite also

Ah, gotcha. Well, this doesn't work on my local build, but changing it to "=~" does. I'll figure that out real quick.

shanson7 · 2017-12-13T21:54:36Z

idx/memory/tag_query.go

+
+// testByPrefix filters a given metric by matching prefixes against the values
+// of a specific tag
+func (q *TagQuery) testByPrefix(def *idx.Archive, exprs []kv) bool {


This function doesn't handle the name special case like testbyMatch does.

right, thanks!

that should be better: 544a435

shanson7 · 2017-12-13T21:54:59Z

idx/memory/tag_query.go

+}
+
+// testByTagPrefix filters a given metric by matching prefixes against its tags
+func (q *TagQuery) testByTagPrefix(def *idx.Archive) bool {


name special case?

Dieterbe · 2017-12-14T21:02:44Z

idx/idx.go

@@ -147,6 +147,9 @@ type MetricIndex interface {
 	// LastUpdate time is >= the given value.
 	Tags(int, string, int64) ([]string, error)

+	AutoCompleteTags(int, string, []string, int64, uint) ([]string, error)


can you document these functions here? (I know you documented the memory implementations, but typically the docs people see when navigating the code is the interface methods)

sure 818e0b2

Dieterbe · 2017-12-15T10:21:05Z

docker/docker-cluster/metrictank.ini

@@ -148,6 +148,8 @@ log-min-dur = 5min
 time-zone = local
 # maximum number of concurrent threads for fetching data on the local node. Each thread handles a single series.
 get-targets-concurrency = 20
+# what the default limit for tagdb query results should be


"what the... should be" is redundant.
and if this setting is only a default, we should document what may override it

Dieterbe · 2017-12-15T10:23:16Z

idx/idx.go

@@ -147,6 +147,17 @@ type MetricIndex interface {
 	// LastUpdate time is >= the given value.
 	Tags(int, string, int64) ([]string, error)

+	// AutoCompleteTags is used to generate a list of possible values that could


s/is used to generate/generates/

Dieterbe · 2017-12-15T10:23:32Z

idx/idx.go

+	// narrow down the result set.
+	AutoCompleteTags(int, string, []string, int64, uint) ([]string, error)
+
+	// AutoCompleteTagValues is used to generate a list of possible values that could


Dieterbe · 2017-12-15T10:24:51Z

idx/idx.go

@@ -147,6 +147,17 @@ type MetricIndex interface {
 	// LastUpdate time is >= the given value.
 	Tags(int, string, int64) ([]string, error)

+	// AutoCompleteTags is used to generate a list of possible values that could
+	// complete a given prefix. It also accepts additional conditions to further


"values that could complete a given prefix" this is confusing because AFAICT we're not talking about tag values, but keys.

also, what kind of conditions? expressions such as those accepted by FindByTag ?

updated: 9d217cd

Dieterbe · 2017-12-15T10:31:49Z

api/models/node.go

+
+type IndexAutoCompleteTagValues struct {
+	OrgId     int      `json:"orgId" binding:"Required"`
+	ValPrefix string   `json:"valPrefix"`


the type already makes it explicit that we're looking for values, so this field can be simplified to "prefix". same remark for IndexAutoCompleteTags

Dieterbe · 2017-12-15T10:45:53Z

api/models/cluster.go

@@ -4,6 +4,9 @@ import (
 	"github.com/grafana/metrictank/idx"
 )

+//go:generate msgp
+type StringList []string


why do we need this custom type?

Because we need to be able to json marshal/unmarshal a slice of strings. Is there a better way to do that?

did you mean msgp? or json and also msgp? not sure, i thought it should just work

ah, sorry, yes msgp

Dieterbe · 2017-12-15T10:51:51Z

idx/memory/memory.go

+// tagPrefix:   the string to be completed
+// expressions: tagdb expressions in the same format as graphite uses
+// from:        only tags will be returned that have at least one metric
+//              with a LastUpdate above from


same for the other function

Dieterbe · 2017-12-15T10:52:52Z

idx/memory/memory.go

+//              consecutive queries and the limit is applied after sorting
+//
+func (m *MemoryIdx) AutoCompleteTags(orgId int, tagPrefix string, expressions []string, from int64, limit uint) ([]string, error) {
+	res := make([]string, 0)


var res []string is equivalent but saves needless allocation. same for the other function below.

Dieterbe · 2017-12-15T10:55:17Z

idx/memory/memory.go

+	if len(expressions) > 0 {
+		// incorporate the tag prefix into the tag query expressions
+		if len(tagPrefix) > 0 {
+			expressions = append(expressions, fmt.Sprintf("__tag^=%s", tagPrefix))


this can be simpler and more performant by just doing "__tag^=" + tagPrefix

Dieterbe · 2017-12-15T11:01:00Z

idx/memory/memory.go

+
+		tagsSorted := make([]string, 0, len(tags))
+		for tag := range tags {
+			if len(tagPrefix) > 0 && (len(tagPrefix) > len(tag) || tag[:len(tagPrefix)] != tagPrefix) {


I think we can replace this entire condition with if !strings.HasPrefix(tag, tagPrefix) {
tagPrefix being "" doesn't need anything special, it just works.

yep, that works 👍

yep, that works 👍. I've used that in a ton of places now: d75149a

Dieterbe · 2017-12-15T11:09:36Z

idx/memory/memory.go

+		if len(valPrefix) > 0 {
+			expressions = append(expressions, tag+"^="+valPrefix)
+		} else {
+			// if no value prefix has been specified we still require that at


should this be "at least" ?

Dieterbe · 2017-12-15T11:11:38Z

idx/memory/memory.go

+
+		res = make([]string, 0, len(vals))
+		for val := range vals {
+			if len(valPrefix) > 0 && (len(valPrefix) > len(val) || val[:len(valPrefix)] != valPrefix) {


can use strings.HasPrefix here I think

yep: d75149a

Dieterbe · 2017-12-15T11:16:42Z

idx/memory/tag_query.go

+	MATCH_TAG         // __tag=~   relies on special key __tag
+	NOT_MATCH         // !=~
+	PREFIX            // ^=        exact prefix, not regex
+	PREFIX_TAG        // __tag^=   exact prefix with tag


are all these exactly the same as in graphite? we should document somewhere which operators we support and what the various tricks are (e.g. tag!= doing a check that the tag is non zero afaik). this is probably not the right place. maybe docs/http-api.md is, or we can also just refer to the graphite docs if we're 100% compatible.
we also currently don't really document the kind of regular patterns (non-tag) a user can use. not sure if we want to maintain all that ourself though, i would rather justpoint to graphite docs

i think this should be the documentation used, because it's the reference implementation as well: http://graphite.readthedocs.io/en/latest/tags.html

not all those operations are supported by graphite. some of them were only implemented because we need them internally to satisfy queries like for example by tag prefix (for tag autocomplete). if a user figures out how to use them that's fine, but they are not intended to be used by users and hence not documented in the reference documentation ^^

ok so in tag_query.go lets link to that URL and also clearly mark the non-standard ones.
looking at docs/http-api.md again, it's not the right place because it just talks about the different user accessible http endpoints, such as /metrics/find and /render; but doesn't go into much more detail regarding graphite processing functions

Dieterbe · 2017-12-15T11:21:40Z

idx/memory/tag_query.go

+}
+
+// getMaxTagCount calculates the maximum number of results a tag query could
+// possibly return


this is called cardinality

really? i thought cardinality is how many unique values each tag has. but this function returns the max number of tags

you can have cardinality of data and cardinality of a resultset, both simply convey the number of distinct values

Dieterbe · 2017-12-15T11:52:14Z

idx/memory/tag_query.go

+	tagPrefix string // only used for auto complete of tags to match exact prefix
+
+	index TagIndex
+	byId  map[string]*idx.Archive


when i see attributes for a type, i assume the attributes are scoped to the lifetime of the type. e.g. the fields are set when an instance is created, or very shortly after.
but index and byId are only initialized when Run() or RunGetTags() are called. this warrants a comment (or unless these fields can be set at creation time, that would be even a bit cleaner imho)

If we'd set the index during instantiation of the query we'd already need to acquire the read lock then for this bit: https://github.com/grafana/metrictank/blob/master/idx/memory/memory.go#L591
Currently when the query gets instantiated we verify that it is valid, if it is not valid we return an error without ever acquiring the index lock. That would not be possible then.

that's fine. but let's add a comment than explaining whose responsibility is it to set those fields (seems Run and RunGetTags set it)

Dieterbe · 2017-12-15T12:03:40Z

idx/memory/tag_query.go

-		resultSet = q.getInitialByEqual(index, q.equal[0])
-		q.equal = q.equal[1:]
+// Run executes the tag query on the given index and returns a list of ids
+func (q *TagQuery) Run(index TagIndex, byId map[string]*idx.Archive) TagIDs {


the TagIDs type is confusing. it's not a list but a set. and it's not id's of tags but id's of metrics.

does that mean i should change it? I guess i could rename it to something like IdSet, but it seems more consistent like this: https://github.com/grafana/metrictank/blob/master/idx/memory/memory.go#L63-L65

that, or MetricIDs I suppose. I like IdSet

k, renamed it to IdSet

Dieterbe · 2017-12-15T12:12:20Z

idx/memory/tag_query.go

+	// defined in the query expressions. then they will extract the tags of
+	// those that satisfy all conditions and push them into tagCh.
+	// when a worker completes it pushes an empty item into the completeCh to
+	// signal its completion


there's no need for the completeCh stuff. a more common pattern to do this is:

go func() { q.wg.Wait() close(tagCh) }()

then just do a for range over tagCh to consume everything.

makes sense, will implement that

that is much better: 8661c47

yes, beautiful :)

Dieterbe · 2017-12-15T12:27:24Z

idx/memory/tag_query.go

-			// always anchor all regular expressions at the beginning
-			if (e.operator == MATCH || e.operator == NOT_MATCH) && e.value[0] != byte('^') {
+			// always anchor all regular expressions at the beginning if they do not start with ^
+			if (e.operator == MATCH || e.operator == NOT_MATCH || e.operator == MATCH_TAG) && e.value[0] != 94 {


you can just use '^' instead of 94.
you can compile the below 2 programs with go build -gcflags=-S and see that the assembly is identical.

~/t/rune ❯❯❯ cat 1/main.go ⏎ package main import "fmt" func main() { a := "^abc" fmt.Println(a[0] == 94) } ~/t/rune ❯❯❯ cat 2/main.go package main import "fmt" func main() { a := "^abc" fmt.Println(a[0] == '^') }

the same comment goes for a few other places in the code where we use numbers

nice, i can also change the switch / case in parseExpression() then

Dieterbe · 2017-12-15T22:15:18Z

I need to review further, but another thing i noticed is that getInitial* functions all do a chan send for every value that should be considered, which seems a bit expensive. especially since idCh is an unbuffered channel. buffering it could probably bring a perf gain.

replay · 2017-12-18T07:03:42Z

@Dieterbe seems reasonable. Is there a good way to determine what buffer size would make sense? I guess just benchmark and see what performs the best, right?

Dieterbe · 2017-12-18T09:04:41Z

yep, use your best judgment + benchmarks on realistic queries/cases

* anything that has a String method will be printed properly. see https://play.golang.org/p/nx2wkMGa8b * make sure we always print the id of the metric so we can diagnose

Dieterbe · 2017-12-19T16:04:29Z

i'm just running some final benchmarks and if all looks good, will merge

DanCech reviewed Dec 7, 2017

View reviewed changes

replay force-pushed the autocomplete branch 3 times, most recently from 5d5030b to 29fce0c Compare December 8, 2017 12:34

replay requested review from Dieterbe and woodsaj December 8, 2017 13:17

shanson7 reviewed Dec 8, 2017

View reviewed changes

replay force-pushed the autocomplete branch from 23ac8c3 to 5062be5 Compare December 12, 2017 07:40

shanson7 reviewed Dec 13, 2017

View reviewed changes

replay force-pushed the autocomplete branch from a1902cf to 544a435 Compare December 14, 2017 15:44

Dieterbe reviewed Dec 14, 2017

View reviewed changes

replay force-pushed the autocomplete branch 2 times, most recently from 9d1f11f to 8b78830 Compare December 15, 2017 07:57

Dieterbe reviewed Dec 15, 2017

View reviewed changes

Dieterbe suggested changes Dec 15, 2017

View reviewed changes

Dieterbe reviewed Dec 15, 2017

View reviewed changes

replay and others added 25 commits December 20, 2017 00:22

simplify if

67e988d

add comment

791ad4e

fix order of methods

2cb2485

fix

c65f113

clarify

192c6e1

IdSet.String() method

4bcad4a

TagQuery matchers: make it clear what's what

cdd5de4

add more comment

6ce5724

better comment

57cd925

more comments

b1b00ba

comments

16b6cbd

save unnecessary operation

4b814bf

rename AutoComplete* -> Find*

51387d6

rename matchName to omitTagFilters

cb370f4

better comment

a2dbb23

clarify TagQuery stuff

928b005

add tagSupport clauses to new tag methods

eadbe48

tagMatch and tagPrefix can be used as filter OR for initial result set

b57be49

simplify sortByCost

6df7f42

bugfix: cost for prefix was often 0

7757d23

clarify

efdaa38

do not evaluate tag expressions twice

60e442f

fix memory-idx corruption log calls

6bc1c88

* anything that has a String method will be printed properly. see https://play.golang.org/p/nx2wkMGa8b * make sure we always print the id of the metric so we can diagnose

remove outdated/obvious comments

8d028df

fix benchmarks

ac85236

replay force-pushed the autocomplete branch from 1539673 to ac85236 Compare December 19, 2017 15:44

Dieterbe approved these changes Dec 19, 2017

View reviewed changes

Dieterbe merged commit 7b99d38 into master Dec 19, 2017

Dieterbe deleted the autocomplete branch September 18, 2018 08:59

Autocomplete for tag keys and tag values #779

Autocomplete for tag keys and tag values #779

Conversation

replay commented Dec 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

replay Dec 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

replay Dec 11, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DanCech Dec 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

replay Dec 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

replay Dec 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

replay Dec 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe Dec 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe commented Dec 15, 2017

replay commented Dec 18, 2017

replay commented Dec 7, 2017 •

edited

Loading

replay Dec 13, 2017 •

edited

Loading

replay Dec 11, 2017 •

edited

Loading

DanCech Dec 13, 2017 •

edited

Loading

replay Dec 15, 2017 •

edited

Loading

replay Dec 15, 2017 •

edited

Loading

replay Dec 15, 2017 •

edited

Loading

Dieterbe Dec 15, 2017 •

edited

Loading