Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Tags in mainindex #759

Merged
merged 13 commits into from
Nov 17, 2017
Merged

Tags in mainindex #759

merged 13 commits into from
Nov 17, 2017

Conversation

replay
Copy link
Contributor

@replay replay commented Nov 9, 2017

We need to make it possible to query tagged series. This PR makes the ;tag=value suffix part of the name of the leaf node, so metrics can be queried by their full name such as a.b.c;a=a;b=b;c=c.
This is necessary because we need a way to uniquely identify metrics when querying them. When graphite receives a seriesByTag() query it will now do a /find on the tag index of MT, which results in a list of full series names (including the tags), so they are uniquely identified. Then it queries all those returned series:

mst@mst-nb1:~/documents/code/go/src/github.com/grafana/metrictank$ curl -H 'X-Org-Id: 1' 'http://localhost:8000/render?format=json&target=seriesByTag("name=some.id.of.a.metric.27")&from=-3s'  | jq '.' | more
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   242    0   242    0     0  15614      0 --:--:-- --:--:-- --:--:-- 16133
[
  {
    "datapoints": [
      [
        null,
        1510250742
      ],
      [
        null,
        1510250743
      ],
      [
        null,
        1510250744
      ]
    ],
    "target": "some.id.of.a.metric.27;metric=some.id.of.a.metric;some=tag",
    "tags": {
      "metric": "some.id.of.a.metric",
      "some": "tag",
      "name": "some.id.of.a.metric.27"
    }
  }
]

Depends on raintank/schema#12

@replay replay force-pushed the tags_in_mainindex branch 2 times, most recently from 3f6d830 to fbdb56e Compare November 9, 2017 13:33
@replay
Copy link
Contributor Author

replay commented Nov 9, 2017

Note that I can now also browse and select tagged metrics the same way like normal ones:
screenshot from 2017-11-09 10-37-52

@Dieterbe
Copy link
Contributor

Dieterbe commented Nov 9, 2017

"tags in main index" is quite vague. can you explain in detail what the intended goal is (afaik, change the paths of metrics in the memory index - but not the persistent index - to include the ;k=v;k=v pairs. and more importantly why need this change.

@@ -265,7 +265,8 @@ func (m *MemoryIdx) Load(defs []schema.MetricDefinition) int {
}

func (m *MemoryIdx) add(def *schema.MetricDefinition) idx.Archive {
path := def.Name
path := def.FullNameWithTags()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems wasteful to save an extra copy of everything when realistically the Name and Tags could all reference the one string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very good idea, going to try that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point, i wonder if it makes sense to use 1 backing string and either parse out substrings on-demand, or maintain a list of index integers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's true. Could trim off the half the slice size because we don't need to store a pointer, just a start and end.

Copy link
Contributor Author

@replay replay Nov 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that if we would store name and tags in only one string and then keep the positions where name&tags start/end as integers, then the Name could not be accessible as a simple property anymore unless we make the tags part of its value. We could easily add methods Name() and Tags() to extract those substrings, but that would then change the interface.
So I think probably the best would be to use one string that contains the full name, including tags, and then slice it up into the Name and Tags properties, even though that's resulting in more pointers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like that? or did i make it unnecessarily complicated?
raintank/schema@bc847ce

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then the Name could not be accessible as a simple property anymore unless we make the tags part of its value. We could easily add methods Name() and Tags() to extract those substrings, but that would then change the interface.

This is not a problem if we're confident that it is results in significant performance improvement.
That said i'm not convinced that at this point it's worth the time spend to figure out whether or not it would result in a significant performance improvement. We can always push that for later.
One thing I would add though is that if a function/method implementation is simple enough (e.g. nothing more than returning a property value or maybe a few lines more) than the compiler will just inline that function and there is no function call overhead. but it can only do this when the type is fixed at compile time (not an interface type)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like that? or did i make it unnecessarily complicated?
raintank/schema@bc847ce

it looks more complicated because a var was renamed from buffer to idBuffer. if it kept the name it would be more easy to follow. otherwise it looks pretty good to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dieterbe I updated it again and moved the deduplication into a separate function. That way it can also be used in other places, like for example after loading the definitions from the cassandra index, without side effects.

@shanson7
Copy link
Collaborator

It seems like graphite.go::findTreeJson will need a little modifying as well. If the tags contain '.', it will mess up the result.

shanson7 pushed a commit to bloomberg/metrictank that referenced this pull request Nov 13, 2017
@replay replay changed the title [WIP] Tags in mainindex Tags in mainindex Nov 13, 2017
@shanson7
Copy link
Collaborator

So, who is responsible for the name tag?

@Dieterbe
Copy link
Contributor

Dieterbe commented Nov 14, 2017

is there still a use for the name attribute without tags?
also i wonder if there's any noticeable differences in benchmark runs between old and new (we can answer this question once we all agree on functionality)

@replay
Copy link
Contributor Author

replay commented Nov 14, 2017

@Dieterbe from the top of my head i can't think of a reason why we still need the Name attribute. But the migration scenario away from it, and towards a single string that includes name & tags, would certainly be quite complicated and out of the scope of this PR.

@replay
Copy link
Contributor Author

replay commented Nov 14, 2017

@shanson7 that's some copy-pastery from your commit: 89277b8

@Dieterbe
Copy link
Contributor

from the top of my head i can't think of a reason why we still need the Name attribute. But the migration scenario away from it, and towards a single string that includes name & tags, would certainly be quite complicated and out of the scope of this PR.

i don't understand. i'm basically trying to say why do we even need a new nameWithTags attribute, and then update code to use that attribute instead of name, we can just start storing the tags in the name string, which is what you seem to want.

@replay
Copy link
Contributor Author

replay commented Nov 14, 2017

@Dieterbe yeah true, could do that. then we'd have to update carbon-relay-ng again to include the tags inside the Name field when it generates the MetricDefinition (and other tools like fakemetrics and tsdb-gw), but that's no problem. Actually in the first version it was like that, i can't recall why we changed it: https://github.com/graphite-ng/carbon-relay-ng/pull/219/files

@Dieterbe
Copy link
Contributor

could do that. then we'd have to update carbon-relay-ng again to include the tags inside the Name field when it generates the MetricDefinition (and other tools like fakemetrics and tsdb-gw), but that's no problem.

yes. if the consensus is that in MT the tags should be in the name (which I don't have strong opinions about), then it should also be the case in other tools. at least that's my half-educated opinion. maybe @shanson7 or @DanCech have input.

@shanson7
Copy link
Collaborator

@Dieterbe - Are you recommending that the schema be changed to remove Tags altogether? I guess I prefer a structured interface for input, with the internal storage being an implementation detail. Carbons "structured" input is the formatted string which is parsed and stored in various ways.

@Dieterbe
Copy link
Contributor

my 3 above comments are all about "why do we need 2 name attributes, one with and without tags encoded in it, it seems we only need the one that has the tags encoded in them"

I don't think at this point we should remove the Tags attribute (the slice), we may refactor (optimize) our data structures down the road and a more efficient way to encode tags could be part of it, but i'm not worried about that just yet. to your point, it seems sensible to distinguish the datastructures for ingest and internal storage, currently we're fine with using the same for both and get by fine, though our mdm input format is definitely neither memory, nor disk, nor cpu friendly and is due for replacement. (see also #199)

@shanson7
Copy link
Collaborator

Ah, now I'm picking up what you're laying down. I will say that my local workaround while waiting on this PR did involve just encoding the tags into the Name attribute in Add and stored them to cassandra like that.

@Dieterbe
Copy link
Contributor

after some slack discussion:

  • name tag and nameWithTags properties only live in MT's memory index, they don't need to be reflected in the MetricDefinition as it is transmitted, or stored.
  • that's also why we shouldn't implement my suggestion.

see also raintank/schema#12

@@ -64,6 +64,30 @@ type TagIDs map[idx.MetricID]struct{} // set of ids
type TagValue map[string]TagIDs // value -> set of ids
type TagIndex map[string]TagValue // key -> list of values

func (t *TagIndex) AddTagId(name, value string, id idx.MetricID) {
ti := *t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if we don't do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ go build -ldflags '-X main.GitHash=0.7.4-313-g4aece03' -o /home/mst/documents/code/go/src/github.com/grafana/metrictank/scripts/../build/metrictank
# github.com/grafana/metrictank/idx/memory
idx/memory/memory.go:68:15: invalid operation: t[name] (type *TagIndex does not support indexing)
idx/memory/memory.go:69:4: invalid operation: t[name] (type *TagIndex does not support indexing)
idx/memory/memory.go:71:15: invalid operation: t[name] (type *TagIndex does not support indexing)
idx/memory/memory.go:72:4: invalid operation: t[name] (type *TagIndex does not support indexing)
idx/memory/memory.go:74:3: invalid operation: t[name] (type *TagIndex does not support indexing)

ti[name][value][id] = struct{}{}
}

func (t *TagIndex) DelTagId(name, value string, id idx.MetricID) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function (and the Add one) don't need to be exported I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 107aef7

Copy link
Contributor

@Dieterbe Dieterbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few minor comments but looks good overall.

@replay
Copy link
Contributor Author

replay commented Nov 17, 2017

@Dieterbe fixed

@Dieterbe Dieterbe merged commit b90bd2a into master Nov 17, 2017
@Dieterbe Dieterbe deleted the tags_in_mainindex branch September 18, 2018 08:59
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants