No tags in tree index #806

replay · 2017-12-29T14:24:52Z

As described in #798, tagged series should not be added into the tree index anymore.

This also means that we'll now need separate methods to delete by tag expressions, as described here: http://graphite.readthedocs.io/en/latest/tags.html#removing-series-from-the-tagdb

Furthermore, pruning needs to be fixed so it also takes the tag index into account.

Fixes #798

woodsaj · 2018-01-02T03:20:19Z

Is it possible to make this behaviour optional via a config setting? I see some value in being able to have tagged series in the index tree.
eg, this allows users to just start appending tags to existing series and then use the byTag functions in their queries. for example, users could just update their existing collectd config and add a "Postfix" setting to their write_graphite config.

Postfix ";tag1=val1;tag2=val2"

replay · 2018-01-02T06:15:30Z

@woodsaj I'm not sure I understand.
With this patch and also without, once a user starts sending metrics with tags, then those metrics will be queriable via the byTag functions as long as tag-support is enabled. So that scenario you're describing should already work.
The difference that this patch makes is that once the metrics have tags, then they will not be queriable via <name>;<tags>... anymore because they'll not be in the tree index anymore.

woodsaj · 2018-01-02T07:45:44Z

The issue i see is that with this change, once you start sending tags with a series you can no longer use the query editor to explore the data you have (metric names). If your series are mostly in the old format with only a couple of tags, that is a problem.

So you would go from being able to navigate the tree from collectd -> collectd.hostA -> collectd.hostA.df -> collectd.hostA.df.root -> etc... to now having to know you are that you are looking for a series with a name=collectd.hostA.df.root.percent_bytes.free

Dieterbe · 2018-01-02T11:39:19Z

I think awood's problem consists of 2 aspects:

metric navigation/browsing: there's nothing preventing us from providing the ability to navigate/browse the series tree when you use tags, it's just that it would be in a different place than the current autocomplete editor. IOW this aspect would be solved by adding an autocomplete/tree-thing to the tag value editor when the tag key is name.
keeping dashboards/panels healthy when people add tags in the way that AJ described. that would no longer work. people would have to update their metric queries from the traditional non-tagged query, to a new tagged query where they fill in the old pattern in the value field for a tag with key name. this is essentially the concern as i also raised in Remove tagged series from tree index #798 and i'm also still not convinced this is the right way to go. it seems more useful to me to keep tagged series in the tree, along with non-tagged ones (but allow further narrowing down with the tag rule editor), although this "automagic transition" would result in the legends now getting the ;foo=bar suffixes where they previously didn't get them, which may break some series viz overrides, aliasSub rules, legend formatting, etc

replay · 2018-01-02T11:49:57Z

Ok, I understand what you mean now. I think at least for graphite running in front of MT it should be no problem to just forward /find results that include tagged series, it would just be kind of inconsistent with how the tagdb implementation in graphite behaves, but if we make it optional we could disable it by default for consistency with graphite's implementation.

replay · 2018-01-02T13:29:19Z

I think if we do that then DeleteByTag() might potentially get quite expensive if somebody deletes by a tag&value combination that results in a large number of series. Because first we'll need to look up all the MetricDefinitions by tag query, then for each of their path's we'll need to do a find on the tree index and clean up all the branches with the delete() method. But i guess that's just supposed to be expected then

this also means that they don't need to be deleted in the delete() method, because the delete() method is only for series without tags.

in order to loop over all defs metric defs we loop over m.DefById. This is better than looping over the tree and the tag index because it guarantees that we check every MetricDefinition exactly one time, while this wouldn't be possible on the tag index because on MetricDefinition can be referred to by many tags.

In order to include all series from both indexes (tags & tree) we iterate over DefById now.

Dieterbe · 2018-01-19T15:20:31Z

idx/idx.go

+
+	// DeleteTagged deletes the specified series from the tag index and also the
+	// DefById index.
+	DeleteTagged(int, []string) ([]Archive, error)


now that things are separated out, it's due time for us to document in MemoryIdx struct, which attributes are for what use case. it seems some properties are solely for tree index, some are for just tagged index, and some (just DefById ?) are for both. this can be documented/organized better.

while doing that, i realized that defByTagSet needs to be keyed by orgId, so I did that, plus added comments to the properties of MemoryIdx: f319903

I think you misunderstood what i meant. I meant we need to comment the use case for each attribute (for tag index, for hierarchy index, or for both). there was no need to describe the internals of the structures again since you already described that where the structures are defined, which is the right thing to do.

internals of stuff should be documented where the stuff is defined, not where they are used.

I changed this via 20a4ba2 please let me know what you think or if you see any mistakes

yep, that works

Dieterbe · 2018-01-19T15:33:52Z

api/models/node.go

@@ -122,6 +122,19 @@ func (t IndexAutoCompleteTagValues) Trace(span opentracing.Span) {
 func (i IndexAutoCompleteTagValues) TraceDebug(span opentracing.Span) {
 }

+type IndexTagDelSeries struct {
+	OrgId int      `json:"orgId" binding:"Required"`
+	Paths []string `json:"path" form:"path"`


interesting that is called "paths". afaik the term "path" is not commonly used yet. but i see it's the same on http://graphite.readthedocs.io/en/latest/tags.html
it looks like these values are used for the values for name tags, so why not just call this "names" instead of "paths" ( @DanCech question for you ) :) anyway not worth holding back this PR for my nitpickery but it just caught my eye.

i think the term name already has an implied meaning of name wihout any tags in the context of MT, because we also have a virtual tag called name which has that content.
So we need an alternative word to describe name;<tags>... and because nameWithTags is long, path might be better

Dieterbe · 2018-01-19T16:18:36Z

idx/memory/memory.go

@@ -103,6 +103,29 @@ func (t *TagIndex) delTagId(name, value string, id idx.MetricID) {
 	}
 }

+// <name>;<key>=<name> -> Set of references to schema.MetricDefinition
+type defByTagSet map[string]map[*schema.MetricDefinition]struct{}


why *schema.MetricDefinition and not schema.MetricDefinition ? would save a lot of pointers.

If we copy the MDs then we'd need to make sure that all updates to MetricDefinitions are made to all instances of each. F.e. if we update LastUpdate we'd need to make sure that the update is done to all instances of that MetricDefinition, which seems to be complicated and prone to errors.

right. bummer.

Dieterbe · 2018-01-19T16:21:09Z

idx/memory/memory.go

-		return
-	}
-
+func (m *MemoryIdx) deindexTags(tags TagIndex, def *schema.MetricDefinition) bool {


please document what the bool means

done: f3b644d

Dieterbe · 2018-01-19T16:28:45Z

idx/memory/tag_query_test.go

@@ -483,55 +487,6 @@ func TestGetByTag(t *testing.T) {
 	}
 }

-func TestDeleteTaggedSeries(t *testing.T) {


why is this removed? is it replaced by something else? if so, what?

The interface to delete tagged series has changed. Previously Delete() could be used to delete tagged series, while now those methods have been separated into Delete() and DeleteTagged(). There are other tests for DeleteTagged(), so this one is not necessary anymore.

Dieterbe · 2018-01-19T16:33:50Z

idx/memory/memory_test.go

+// testWithAndWithoutTagSupport calls a test with all combinations of
+// the settings TagSupport and tagsInTree. In some cases those settings can
+// affect the logic quite a lot, so we need to test all combinations
+func testWithAndWithoutTagSupport(t *testing.T, f func(*testing.T)) {


this is a cool idea, but the mentioned tagsInTree parameter does not exist

wouldn't it be more meaningful though, if the called functions actually did more with tags?
e.g. testWithAndWithoutTagSupport(t, testPrune) testPrune doesn't have any tagged data, so the tagsupport enabling won't do much. so the tag support aspect of this is a bit misleading. or should we just document that the purpose here is testing solely non-tag-related stuff in context of an enabled and disabled index?

(maybe out of scope for this PR?)

sorry, that comment is old, the tagsInTree parameter did exist but then we decided to remove it.

I think even if a function shouldn't do anything related to tagged series, it's still a good idea to test it with both settings, since doing so is basically free and it just gives a little more assurance that everything works as expected.

I think even if a function shouldn't do anything related to tagged series, it's still a good idea to test it with both settings, since doing so is basically free and it just gives a little more assurance that everything works as expected.

yep, completely agree

remove documentation that is redundant wrt declaration of the used types

Dieterbe · 2018-01-22T17:13:22Z

i pushed a few tweaks, otherwise I think this is pretty much good to go. @woodsaj does it look good to you?

woodsaj

LGTM

replay changed the title ~~No tags in tree~~ No tags in tree index Dec 29, 2017

replay force-pushed the no_tags_in_tree branch 7 times, most recently from eb04240 to 7a111e8 Compare December 29, 2017 17:03

replay requested review from DanCech, Dieterbe and woodsaj December 29, 2017 17:08

replay force-pushed the no_tags_in_tree branch 13 times, most recently from 47a11c1 to 084c6d3 Compare January 5, 2018 15:11

Dieterbe added this to the 0.8.1 milestone Jan 11, 2018

replay added 9 commits January 15, 2018 14:47

do not add tagged series into tree index

49f0594

this also means that they don't need to be deleted in the delete() method, because the delete() method is only for series without tags.

add methods to delete tagged series to memory index

87fd591

merge unnecessary method resolveIDs into FindByTag

3b2f4e3

List() should also include tagged series

9577b6e

In order to include all series from both indexes (tags & tree) we iterate over DefById now.

add api methods to delete tagged series

addc8d0

make cache clearing work via tag index queries as well

add9a65

add tests for cache clearing api with tags

78802a1

fix tag index pruning to deal with tagset collisions

352c61e

replay force-pushed the no_tags_in_tree branch from cc0c3d1 to 352c61e Compare January 15, 2018 05:49

Dieterbe reviewed Jan 19, 2018

View reviewed changes

replay added 2 commits January 22, 2018 16:15

better comment

ad7491a

add comment

f3b644d

replay force-pushed the no_tags_in_tree branch from 8021fac to f3b644d Compare January 22, 2018 07:38

replay and others added 3 commits January 22, 2018 17:10

defByTagSet needs to be keyed by org id

f319903

document use of different memoryIdx properties

20a4ba2

remove documentation that is redundant wrt declaration of the used types

no need to export index internals

51ea484

woodsaj approved these changes Jan 22, 2018

View reviewed changes

replay merged commit 65b75bd into master Jan 23, 2018

Dieterbe mentioned this pull request Feb 19, 2018

Support tag expressions in storage-{aggregations,schemas}.conf #845

Closed

Dieterbe deleted the no_tags_in_tree branch September 18, 2018 09:07

Dieterbe modified the milestones: 1.1, 0.8.1 Dec 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No tags in tree index #806

No tags in tree index #806

replay commented Dec 29, 2017 •

edited

Loading

woodsaj commented Jan 2, 2018

replay commented Jan 2, 2018

woodsaj commented Jan 2, 2018

Dieterbe commented Jan 2, 2018 •

edited

Loading

replay commented Jan 2, 2018

replay commented Jan 2, 2018 •

edited

Loading

Dieterbe Jan 19, 2018

replay Jan 22, 2018

Dieterbe Jan 22, 2018

replay Jan 23, 2018

Dieterbe Jan 19, 2018 •

edited

Loading

replay Jan 22, 2018 •

edited

Loading

Dieterbe Jan 19, 2018

replay Jan 22, 2018

Dieterbe Jan 22, 2018

Dieterbe Jan 19, 2018

replay Jan 22, 2018 •

edited

Loading

Dieterbe Jan 19, 2018

replay Jan 22, 2018

Dieterbe Jan 19, 2018

Dieterbe Jan 19, 2018

replay Jan 22, 2018

replay Jan 22, 2018

replay Jan 22, 2018

Dieterbe Jan 22, 2018

Dieterbe commented Jan 22, 2018

woodsaj left a comment

No tags in tree index #806

No tags in tree index #806

Conversation

replay commented Dec 29, 2017 • edited Loading

woodsaj commented Jan 2, 2018

replay commented Jan 2, 2018

woodsaj commented Jan 2, 2018

Dieterbe commented Jan 2, 2018 • edited Loading

replay commented Jan 2, 2018

replay commented Jan 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe Jan 19, 2018 • edited Loading

Choose a reason for hiding this comment

replay Jan 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

replay Jan 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe commented Jan 22, 2018

woodsaj left a comment

Choose a reason for hiding this comment

replay commented Dec 29, 2017 •

edited

Loading

Dieterbe commented Jan 2, 2018 •

edited

Loading

replay commented Jan 2, 2018 •

edited

Loading

Dieterbe Jan 19, 2018 •

edited

Loading

replay Jan 22, 2018 •

edited

Loading

replay Jan 22, 2018 •

edited

Loading