Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

No tags in tree index #806

Merged
merged 14 commits into from
Jan 23, 2018
Merged

No tags in tree index #806

merged 14 commits into from
Jan 23, 2018

Conversation

replay
Copy link
Contributor

@replay replay commented Dec 29, 2017

As described in #798, tagged series should not be added into the tree index anymore.

This also means that we'll now need separate methods to delete by tag expressions, as described here: http://graphite.readthedocs.io/en/latest/tags.html#removing-series-from-the-tagdb

Furthermore, pruning needs to be fixed so it also takes the tag index into account.

Fixes #798

@replay replay changed the title No tags in tree No tags in tree index Dec 29, 2017
@replay replay force-pushed the no_tags_in_tree branch 7 times, most recently from eb04240 to 7a111e8 Compare December 29, 2017 17:03
@woodsaj
Copy link
Member

woodsaj commented Jan 2, 2018

Is it possible to make this behaviour optional via a config setting? I see some value in being able to have tagged series in the index tree.
eg, this allows users to just start appending tags to existing series and then use the byTag functions in their queries. for example, users could just update their existing collectd config and add a "Postfix" setting to their write_graphite config.

Postfix ";tag1=val1;tag2=val2"

@replay
Copy link
Contributor Author

replay commented Jan 2, 2018

@woodsaj I'm not sure I understand.
With this patch and also without, once a user starts sending metrics with tags, then those metrics will be queriable via the byTag functions as long as tag-support is enabled. So that scenario you're describing should already work.
The difference that this patch makes is that once the metrics have tags, then they will not be queriable via <name>;<tags>... anymore because they'll not be in the tree index anymore.

@woodsaj
Copy link
Member

woodsaj commented Jan 2, 2018

The issue i see is that with this change, once you start sending tags with a series you can no longer use the query editor to explore the data you have (metric names). If your series are mostly in the old format with only a couple of tags, that is a problem.

So you would go from being able to navigate the tree from collectd -> collectd.hostA -> collectd.hostA.df -> collectd.hostA.df.root -> etc... to now having to know you are that you are looking for a series with a name=collectd.hostA.df.root.percent_bytes.free

@Dieterbe
Copy link
Contributor

Dieterbe commented Jan 2, 2018

I think awood's problem consists of 2 aspects:

  1. metric navigation/browsing: there's nothing preventing us from providing the ability to navigate/browse the series tree when you use tags, it's just that it would be in a different place than the current autocomplete editor. IOW this aspect would be solved by adding an autocomplete/tree-thing to the tag value editor when the tag key is name.
  2. keeping dashboards/panels healthy when people add tags in the way that AJ described. that would no longer work. people would have to update their metric queries from the traditional non-tagged query, to a new tagged query where they fill in the old pattern in the value field for a tag with key name. this is essentially the concern as i also raised in Remove tagged series from tree index #798 and i'm also still not convinced this is the right way to go. it seems more useful to me to keep tagged series in the tree, along with non-tagged ones (but allow further narrowing down with the tag rule editor), although this "automagic transition" would result in the legends now getting the ;foo=bar suffixes where they previously didn't get them, which may break some series viz overrides, aliasSub rules, legend formatting, etc

@replay
Copy link
Contributor Author

replay commented Jan 2, 2018

Ok, I understand what you mean now. I think at least for graphite running in front of MT it should be no problem to just forward /find results that include tagged series, it would just be kind of inconsistent with how the tagdb implementation in graphite behaves, but if we make it optional we could disable it by default for consistency with graphite's implementation.

@replay
Copy link
Contributor Author

replay commented Jan 2, 2018

I think if we do that then DeleteByTag() might potentially get quite expensive if somebody deletes by a tag&value combination that results in a large number of series. Because first we'll need to look up all the MetricDefinitions by tag query, then for each of their path's we'll need to do a find on the tree index and clean up all the branches with the delete() method. But i guess that's just supposed to be expected then

@replay replay force-pushed the no_tags_in_tree branch 13 times, most recently from 47a11c1 to 084c6d3 Compare January 5, 2018 15:11
@Dieterbe Dieterbe added this to the 0.8.1 milestone Jan 11, 2018
this also means that they don't need to be deleted in the delete()
method, because the delete() method is only for series without tags.
in order to loop over all defs metric defs we loop over m.DefById. This
is better than looping over the tree and the tag index because it
guarantees that we check every MetricDefinition exactly one time, while
this wouldn't be possible on the tag index because on MetricDefinition
can be referred to by many tags.
In order to include all series from both indexes (tags & tree) we
iterate over DefById now.

// DeleteTagged deletes the specified series from the tag index and also the
// DefById index.
DeleteTagged(int, []string) ([]Archive, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that things are separated out, it's due time for us to document in MemoryIdx struct, which attributes are for what use case. it seems some properties are solely for tree index, some are for just tagged index, and some (just DefById ?) are for both. this can be documented/organized better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while doing that, i realized that defByTagSet needs to be keyed by orgId, so I did that, plus added comments to the properties of MemoryIdx: f319903

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you misunderstood what i meant. I meant we need to comment the use case for each attribute (for tag index, for hierarchy index, or for both). there was no need to describe the internals of the structures again since you already described that where the structures are defined, which is the right thing to do.

internals of stuff should be documented where the stuff is defined, not where they are used.

I changed this via 20a4ba2 please let me know what you think or if you see any mistakes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, that works

@@ -122,6 +122,19 @@ func (t IndexAutoCompleteTagValues) Trace(span opentracing.Span) {
func (i IndexAutoCompleteTagValues) TraceDebug(span opentracing.Span) {
}

type IndexTagDelSeries struct {
OrgId int `json:"orgId" binding:"Required"`
Paths []string `json:"path" form:"path"`
Copy link
Contributor

@Dieterbe Dieterbe Jan 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting that is called "paths". afaik the term "path" is not commonly used yet. but i see it's the same on http://graphite.readthedocs.io/en/latest/tags.html
it looks like these values are used for the values for name tags, so why not just call this "names" instead of "paths" ( @DanCech question for you ) :) anyway not worth holding back this PR for my nitpickery but it just caught my eye.

Copy link
Contributor Author

@replay replay Jan 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the term name already has an implied meaning of name wihout any tags in the context of MT, because we also have a virtual tag called name which has that content.
So we need an alternative word to describe name;<tags>... and because nameWithTags is long, path might be better

@@ -103,6 +103,29 @@ func (t *TagIndex) delTagId(name, value string, id idx.MetricID) {
}
}

// <name>;<key>=<name> -> Set of references to schema.MetricDefinition
type defByTagSet map[string]map[*schema.MetricDefinition]struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why *schema.MetricDefinition and not schema.MetricDefinition ? would save a lot of pointers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we copy the MDs then we'd need to make sure that all updates to MetricDefinitions are made to all instances of each. F.e. if we update LastUpdate we'd need to make sure that the update is done to all instances of that MetricDefinition, which seems to be complicated and prone to errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. bummer.

return
}

func (m *MemoryIdx) deindexTags(tags TagIndex, def *schema.MetricDefinition) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please document what the bool means

Copy link
Contributor Author

@replay replay Jan 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done: f3b644d

@@ -483,55 +487,6 @@ func TestGetByTag(t *testing.T) {
}
}

func TestDeleteTaggedSeries(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this removed? is it replaced by something else? if so, what?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface to delete tagged series has changed. Previously Delete() could be used to delete tagged series, while now those methods have been separated into Delete() and DeleteTagged(). There are other tests for DeleteTagged(), so this one is not necessary anymore.

// testWithAndWithoutTagSupport calls a test with all combinations of
// the settings TagSupport and tagsInTree. In some cases those settings can
// affect the logic quite a lot, so we need to test all combinations
func testWithAndWithoutTagSupport(t *testing.T, f func(*testing.T)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a cool idea, but the mentioned tagsInTree parameter does not exist

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be more meaningful though, if the called functions actually did more with tags?
e.g. testWithAndWithoutTagSupport(t, testPrune) testPrune doesn't have any tagged data, so the tagsupport enabling won't do much. so the tag support aspect of this is a bit misleading. or should we just document that the purpose here is testing solely non-tag-related stuff in context of an enabled and disabled index?

(maybe out of scope for this PR?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, that comment is old, the tagsInTree parameter did exist but then we decided to remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think even if a function shouldn't do anything related to tagged series, it's still a good idea to test it with both settings, since doing so is basically free and it just gives a little more assurance that everything works as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think even if a function shouldn't do anything related to tagged series, it's still a good idea to test it with both settings, since doing so is basically free and it just gives a little more assurance that everything works as expected.

yep, completely agree

@Dieterbe
Copy link
Contributor

i pushed a few tweaks, otherwise I think this is pretty much good to go. @woodsaj does it look good to you?

Copy link
Member

@woodsaj woodsaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@replay replay merged commit 65b75bd into master Jan 23, 2018
@Dieterbe Dieterbe deleted the no_tags_in_tree branch September 18, 2018 09:07
@Dieterbe Dieterbe modified the milestones: 1.1, 0.8.1 Dec 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove tagged series from tree index
3 participants