Tags in mainindex #759

replay · 2017-11-09T13:02:31Z

We need to make it possible to query tagged series. This PR makes the ;tag=value suffix part of the name of the leaf node, so metrics can be queried by their full name such as a.b.c;a=a;b=b;c=c.
This is necessary because we need a way to uniquely identify metrics when querying them. When graphite receives a seriesByTag() query it will now do a /find on the tag index of MT, which results in a list of full series names (including the tags), so they are uniquely identified. Then it queries all those returned series:

mst@mst-nb1:~/documents/code/go/src/github.com/grafana/metrictank$ curl -H 'X-Org-Id: 1' 'http://localhost:8000/render?format=json&target=seriesByTag("name=some.id.of.a.metric.27")&from=-3s'  | jq '.' | more
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   242    0   242    0     0  15614      0 --:--:-- --:--:-- --:--:-- 16133
[
  {
    "datapoints": [
      [
        null,
        1510250742
      ],
      [
        null,
        1510250743
      ],
      [
        null,
        1510250744
      ]
    ],
    "target": "some.id.of.a.metric.27;metric=some.id.of.a.metric;some=tag",
    "tags": {
      "metric": "some.id.of.a.metric",
      "some": "tag",
      "name": "some.id.of.a.metric.27"
    }
  }
]

Depends on raintank/schema#12

replay · 2017-11-09T13:38:55Z

Note that I can now also browse and select tagged metrics the same way like normal ones:

Dieterbe · 2017-11-09T17:15:18Z

"tags in main index" is quite vague. can you explain in detail what the intended goal is (afaik, change the paths of metrics in the memory index - but not the persistent index - to include the ;k=v;k=v pairs. and more importantly why need this change.

shanson7 · 2017-11-09T23:32:29Z

idx/memory/memory.go

@@ -265,7 +265,8 @@ func (m *MemoryIdx) Load(defs []schema.MetricDefinition) int {
 }

 func (m *MemoryIdx) add(def *schema.MetricDefinition) idx.Archive {
-	path := def.Name
+	path := def.FullNameWithTags()


It seems wasteful to save an extra copy of everything when realistically the Name and Tags could all reference the one string.

very good idea, going to try that

that's a good point, i wonder if it makes sense to use 1 backing string and either parse out substrings on-demand, or maintain a list of index integers.

Hmm, that's true. Could trim off the half the slice size because we don't need to store a pointer, just a start and end.

The problem is that if we would store name and tags in only one string and then keep the positions where name&tags start/end as integers, then the Name could not be accessible as a simple property anymore unless we make the tags part of its value. We could easily add methods Name() and Tags() to extract those substrings, but that would then change the interface.
So I think probably the best would be to use one string that contains the full name, including tags, and then slice it up into the Name and Tags properties, even though that's resulting in more pointers

Something like that? or did i make it unnecessarily complicated?
raintank/schema@bc847ce

then the Name could not be accessible as a simple property anymore unless we make the tags part of its value. We could easily add methods Name() and Tags() to extract those substrings, but that would then change the interface.

This is not a problem if we're confident that it is results in significant performance improvement.
That said i'm not convinced that at this point it's worth the time spend to figure out whether or not it would result in a significant performance improvement. We can always push that for later.
One thing I would add though is that if a function/method implementation is simple enough (e.g. nothing more than returning a property value or maybe a few lines more) than the compiler will just inline that function and there is no function call overhead. but it can only do this when the type is fixed at compile time (not an interface type)

Something like that? or did i make it unnecessarily complicated?
raintank/schema@bc847ce

it looks more complicated because a var was renamed from buffer to idBuffer. if it kept the name it would be more easy to follow. otherwise it looks pretty good to me

@Dieterbe I updated it again and moved the deduplication into a separate function. That way it can also be used in other places, like for example after loading the definitions from the cassandra index, without side effects.

shanson7 · 2017-11-10T22:06:21Z

It seems like graphite.go::findTreeJson will need a little modifying as well. If the tags contain '.', it will mess up the result.

shanson7 · 2017-11-14T19:22:55Z

So, who is responsible for the name tag?

Dieterbe · 2017-11-14T20:08:04Z

is there still a use for the name attribute without tags?
also i wonder if there's any noticeable differences in benchmark runs between old and new (we can answer this question once we all agree on functionality)

replay · 2017-11-14T20:25:59Z

@Dieterbe from the top of my head i can't think of a reason why we still need the Name attribute. But the migration scenario away from it, and towards a single string that includes name & tags, would certainly be quite complicated and out of the scope of this PR.

replay · 2017-11-14T20:26:51Z

@shanson7 that's some copy-pastery from your commit: 89277b8

Dieterbe · 2017-11-14T20:29:52Z

from the top of my head i can't think of a reason why we still need the Name attribute. But the migration scenario away from it, and towards a single string that includes name & tags, would certainly be quite complicated and out of the scope of this PR.

i don't understand. i'm basically trying to say why do we even need a new nameWithTags attribute, and then update code to use that attribute instead of name, we can just start storing the tags in the name string, which is what you seem to want.

replay · 2017-11-14T20:42:58Z

@Dieterbe yeah true, could do that. then we'd have to update carbon-relay-ng again to include the tags inside the Name field when it generates the MetricDefinition (and other tools like fakemetrics and tsdb-gw), but that's no problem. Actually in the first version it was like that, i can't recall why we changed it: https://github.com/graphite-ng/carbon-relay-ng/pull/219/files

Dieterbe · 2017-11-14T22:25:20Z

could do that. then we'd have to update carbon-relay-ng again to include the tags inside the Name field when it generates the MetricDefinition (and other tools like fakemetrics and tsdb-gw), but that's no problem.

yes. if the consensus is that in MT the tags should be in the name (which I don't have strong opinions about), then it should also be the case in other tools. at least that's my half-educated opinion. maybe @shanson7 or @DanCech have input.

shanson7 · 2017-11-15T19:41:50Z

@Dieterbe - Are you recommending that the schema be changed to remove Tags altogether? I guess I prefer a structured interface for input, with the internal storage being an implementation detail. Carbons "structured" input is the formatted string which is parsed and stored in various ways.

Dieterbe · 2017-11-15T20:10:06Z

my 3 above comments are all about "why do we need 2 name attributes, one with and without tags encoded in it, it seems we only need the one that has the tags encoded in them"

I don't think at this point we should remove the Tags attribute (the slice), we may refactor (optimize) our data structures down the road and a more efficient way to encode tags could be part of it, but i'm not worried about that just yet. to your point, it seems sensible to distinguish the datastructures for ingest and internal storage, currently we're fine with using the same for both and get by fine, though our mdm input format is definitely neither memory, nor disk, nor cpu friendly and is due for replacement. (see also #199)

shanson7 · 2017-11-15T20:17:05Z

Ah, now I'm picking up what you're laying down. I will say that my local workaround while waiting on this PR did involve just encoding the tags into the Name attribute in Add and stored them to cassandra like that.

Dieterbe · 2017-11-17T18:45:15Z

after some slack discussion:

name tag and nameWithTags properties only live in MT's memory index, they don't need to be reflected in the MetricDefinition as it is transmitted, or stored.
that's also why we shouldn't implement my suggestion.

see also raintank/schema#12

Dieterbe · 2017-11-17T19:33:52Z

idx/memory/memory.go

@@ -64,6 +64,30 @@ type TagIDs map[idx.MetricID]struct{} // set of ids
 type TagValue map[string]TagIDs       // value -> set of ids
 type TagIndex map[string]TagValue     // key -> list of values

+func (t *TagIndex) AddTagId(name, value string, id idx.MetricID) {
+	ti := *t


what happens if we don't do this?

+ go build -ldflags '-X main.GitHash=0.7.4-313-g4aece03' -o /home/mst/documents/code/go/src/github.com/grafana/metrictank/scripts/../build/metrictank # github.com/grafana/metrictank/idx/memory idx/memory/memory.go:68:15: invalid operation: t[name] (type *TagIndex does not support indexing) idx/memory/memory.go:69:4: invalid operation: t[name] (type *TagIndex does not support indexing) idx/memory/memory.go:71:15: invalid operation: t[name] (type *TagIndex does not support indexing) idx/memory/memory.go:72:4: invalid operation: t[name] (type *TagIndex does not support indexing) idx/memory/memory.go:74:3: invalid operation: t[name] (type *TagIndex does not support indexing)

Dieterbe · 2017-11-17T19:36:10Z

idx/memory/memory.go

+	ti[name][value][id] = struct{}{}
+}
+
+func (t *TagIndex) DelTagId(name, value string, id idx.MetricID) {


this function (and the Add one) don't need to be exported I think?

👍 107aef7

Dieterbe

few minor comments but looks good overall.

replay · 2017-11-17T20:03:48Z

@Dieterbe fixed

replay force-pushed the tags_in_mainindex branch 2 times, most recently from 3f6d830 to fbdb56e Compare November 9, 2017 13:33

shanson7 reviewed Nov 9, 2017

View reviewed changes

shanson7 pushed a commit to bloomberg/metrictank that referenced this pull request Nov 13, 2017

Tweaks of grafana/pull/759

857de83

replay changed the title ~~[WIP] Tags in mainindex~~ Tags in mainindex Nov 13, 2017

replay added 5 commits November 13, 2017 17:05

tags in main index

3a4e809

move full name generation to schema

fb502e7

comments and fixing case of untagged series

5033921

fix test

1b9965e

update schema

4711e3b

replay force-pushed the tags_in_mainindex branch from c3c9d35 to 4711e3b Compare November 13, 2017 20:08

replay added 4 commits November 14, 2017 10:59

fix output splitting by .

c1fe132

update to new schema

03e2447

remove test that is deprecated now

a2c7bcb

optimize findTreeJson

8514c3b

index/deindex name

89277b8

update to latest version of schema

99eeb1d

fix tests

4aece03

Dieterbe reviewed Nov 17, 2017

View reviewed changes

Dieterbe suggested changes Nov 17, 2017

View reviewed changes

unexport internal functions

107aef7

Dieterbe approved these changes Nov 17, 2017

View reviewed changes

Dieterbe merged commit b90bd2a into master Nov 17, 2017

Dieterbe deleted the tags_in_mainindex branch September 18, 2018 08:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tags in mainindex #759

Tags in mainindex #759

replay commented Nov 9, 2017 •

edited

Loading

replay commented Nov 9, 2017

Dieterbe commented Nov 9, 2017

shanson7 Nov 9, 2017

replay Nov 10, 2017

Dieterbe Nov 10, 2017

shanson7 Nov 10, 2017

replay Nov 13, 2017 •

edited

Loading

replay Nov 13, 2017

Dieterbe Nov 14, 2017

Dieterbe Nov 14, 2017

replay Nov 14, 2017

shanson7 commented Nov 10, 2017

shanson7 commented Nov 14, 2017

Dieterbe commented Nov 14, 2017 •

edited

Loading

replay commented Nov 14, 2017

replay commented Nov 14, 2017

Dieterbe commented Nov 14, 2017

replay commented Nov 14, 2017 •

edited

Loading

Dieterbe commented Nov 14, 2017

shanson7 commented Nov 15, 2017

Dieterbe commented Nov 15, 2017

shanson7 commented Nov 15, 2017

Dieterbe commented Nov 17, 2017

Dieterbe Nov 17, 2017

replay Nov 17, 2017

Dieterbe Nov 17, 2017

replay Nov 17, 2017

Dieterbe left a comment

replay commented Nov 17, 2017

Tags in mainindex #759

Tags in mainindex #759

Conversation

replay commented Nov 9, 2017 • edited Loading

replay commented Nov 9, 2017

Dieterbe commented Nov 9, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

replay Nov 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shanson7 commented Nov 10, 2017

shanson7 commented Nov 14, 2017

Dieterbe commented Nov 14, 2017 • edited Loading

replay commented Nov 14, 2017

replay commented Nov 14, 2017

Dieterbe commented Nov 14, 2017

replay commented Nov 14, 2017 • edited Loading

Dieterbe commented Nov 14, 2017

shanson7 commented Nov 15, 2017

Dieterbe commented Nov 15, 2017

shanson7 commented Nov 15, 2017

Dieterbe commented Nov 17, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe left a comment

Choose a reason for hiding this comment

replay commented Nov 17, 2017

replay commented Nov 9, 2017 •

edited

Loading

replay Nov 13, 2017 •

edited

Loading

Dieterbe commented Nov 14, 2017 •

edited

Loading

replay commented Nov 14, 2017 •

edited

Loading