-
Notifications
You must be signed in to change notification settings - Fork 104
Tag index #729
Conversation
idx/memory/memory.go
Outdated
tags[tagSplits[0]] = make(map[string][]string) | ||
} | ||
|
||
tags[tagSplits[0]][tagSplits[1]] = append(tags[tagSplits[0]][tagSplits[1]], def.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to init tags[tagSplits[0]][tagSplits[1]]
first? this code will be much more readable if you say tagName := tagSplits[0]
tagValue := tagSplits[1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be necessary to init slices i think, only maps. Ok, i'll name them
idx/memory/memory.go
Outdated
return m.idsByTagQuery(tree, query) | ||
} | ||
|
||
func (m *MemoryIdx) idsByTagQuery(tags map[string]map[string][]string, query tag.TagQuery) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function supposed to be returning the ids that match ANY expression or ALL expressions? Seems like it's ANY right now. I suppose this function is designed to be composable to support ALLs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@replay was confused, it should be ALL. He's sorting that out now.
From an end-user perspective, ANY matches can be done via a combination of using regular expression matches and/or using multiple calls to seriesByTag
and combining the results with group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
d7dcf96
to
d375fb9
Compare
idx/memory/tag_query.go
Outdated
value := exprSplits[0][3] | ||
|
||
// always anchor all regular expressions at the beginning | ||
if operator[len(operator)-1] == byte('~') && len(value) > 0 && value[len(value)-1] != byte('^') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value[0] != byte('^')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
idx/memory/tag_query.go
Outdated
|
||
// special case of empty value | ||
if len(value) == 0 { | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a no-op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, still going to add that. the idea is that i'll just reverse the operation.
so key!=~
becomes key=~.*
and key=~
becomes key!=~.*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then, further down where the matching is done, i could also shortcut it to not even do the matching if the pattern is .*
or .+
. (for .+
this works assuming that no value that's in the index can be ""
)
idx/memory/tag_query.go
Outdated
return ids, nil | ||
} | ||
return make(map[string]struct{}), nil | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this just be return idx[key][value], nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
idx/memory/tag_query.go
Outdated
return nil, err | ||
} | ||
|
||
for v, ids := range values { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question, it seems like this could just be for v, ids := range idx[key] {
idx/memory/tag_query.go
Outdated
return nil, errInvalidQuery | ||
} | ||
|
||
exprSplits := expressionRe.FindAllStringSubmatch(expr, -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idx/memory/tag_query.go
Outdated
value = ".+" | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do better in handling terms that match the empty string, since we have a requirement that there must be a tag expression that doesn't match the empty string. Because of that requirement, we should be able to make a list of candidate series from the expression that requires a match, then use those as the list of "all" series that should be considered to match if they don't have the tags that we're matching the empty string for. I didn't implement that concept in the graphite tagdbs, but I'm going to go back and see whether I can use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so i'm not completely sure i understand you right. but actually i think this already does what you describe:
- further down queries are divided into "selects" and "filters". any query that's excluding something is a "filter", any query that's selecting by a match is a "select".
- a query like
key=
is translated intokey!=.+
, this is a filter - the "select" queries are always executed first, building that list of candidates you mention
- after the list of candidates is built, the "filters" are applied
we can further speed this up by not even matching the regex .+
because we can't have empty tag values in the index. so every series that has the tag key
will match the value .+
, so we can shortcut it and skip the regex match.
ebc48cf
to
e30aee4
Compare
FTR: prior discussion: #532 |
7402db6
to
a4ee0e2
Compare
idx/memory/memory_test.go
Outdated
{"key1=~value[0-9]", "key2=~", "key3!=value3"}, | ||
{"key2=", "key1=value1"}, | ||
} | ||
expecting := []int{3, 1, 4, 1, 0, 1, 2, 0, 1, 2} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make 1 slice of cases, where each case shows its expression and which metrics are expected to match.
just checking the number of matches can hide bugs, is hard for readers, and having 2 structures is a bit confusing
idx/cassandra/cassandra.go
Outdated
|
||
func (c *CasIdx) IdsByTagExpressions(orgId int, expressions []string) ([]string, error) { | ||
return c.MemoryIdx.IdsByTagExpressions(orgId, expressions) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these additions should not be necessary. go compiler should automatically wire these through since MemoryIdx is embedded into the CasIdx struct (that's why Find, List, etc work also on this index)
idx/memory/memory.go
Outdated
@@ -42,19 +42,24 @@ var ( | |||
// metric idx.metrics_active is the number of currently known metrics in the index | |||
statMetricsActive = stats.NewGauge32("idx.metrics_active") | |||
|
|||
Enabled bool | |||
Enabled bool | |||
TagSupport bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need to export this i think
idx/memory/memory.go
Outdated
func (m *MemoryIdx) addTags(def *schema.MetricDefinition) { | ||
tags, ok := m.Tags[def.OrgId] | ||
if !ok { | ||
tags = make(map[string]map[string]map[string]struct{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can make(TagIndex)
here
idx/memory/memory.go
Outdated
tagValue := tagSplits[1] | ||
|
||
if _, ok = tags[tagName]; !ok { | ||
tags[tagName] = make(map[string]map[string]struct{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should give these substructures a type, so that it's easier/clearer to create them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be good, but I can't come up with a good name :)
The top level structure is currently named TagIndex
. I suggest the next one could be TagKey
and it's nested values would be called TagValue
. So each TagValue
would then contain a set of IDs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about:
TagIndex -> TagValues -> MetricDefinitions
idx/memory/memory.go
Outdated
@@ -130,6 +144,69 @@ func (m *MemoryIdx) Update(entry idx.Archive) { | |||
m.Unlock() | |||
} | |||
|
|||
func (m *MemoryIdx) addTags(def *schema.MetricDefinition) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func name is a bit confusing. we're indexing tags (or "adding tags to the index") for a metricdefinition that already has tags. so maybe call this indexTags
or at least at a function comment explaining what this does. thanks :)
idx/memory/memory.go
Outdated
} | ||
} | ||
|
||
func (m *MemoryIdx) delTags(def *schema.MetricDefinition) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here as above. clarify a bit via function comment, thanks (could be confused for "delete tags from a metric")
my main concern here is the amount of strings used. every string has a pointer internally, and GC workload is directly proportional to amount of live pointers on the heap. and we already have a too-high number of strings and pointers due to legacy reasons (eg all the AggMetric stuff, and other stuff in the current index), which results in a high GC workload, which combined with golang/go#14812 results occassionally in poor latencies. so, I think we need a way to verify if this concern holds any ground, with minimal time investment. should be pretty trivial to spin up a docker stack, use fakemetrics for ingest workload, and vegeta for query workload, and compare both cases. I can help with the specifics, i've done this a bunch of times before. see also #440 and https://github.com/grafana/metrictank/blob/7a508e43e9b48af9ecdf1723c91dd75e1b4ca212/docker/docker-cluster/load.sh |
idx/idx.go
Outdated
@@ -112,4 +112,7 @@ type MetricIndex interface { | |||
Find(int, string, int64) ([]Node, error) | |||
List(int) []Archive | |||
Prune(int, time.Time) ([]Archive, error) | |||
TagList(int, uint32) []string | |||
Tag(int, string) map[string]uint32 | |||
IdsByTagExpressions(int, []string) ([]string, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these 3 functions are not called by anything, it's not really clear what they are for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding comments
idx/memory/memory.go
Outdated
) | ||
|
||
func ConfigSetup() { | ||
memoryIdx := flag.NewFlagSet("memory-idx", flag.ExitOnError) | ||
memoryIdx.BoolVar(&Enabled, "enabled", false, "") | ||
memoryIdx.BoolVar(&tagSupport, "tag-support", false, "") | ||
memoryIdx.IntVar(&matchCacheSize, "match-cache-size", 1000, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fill in help message, and add these settings, along with the same help message and default value to metrictank-sample.ini, then run the sync-configs script to apply it to the other files as well. and config-to-doc.sh
no need to loop twice no need to do map lookups if key doesn't even match only need to track values, not entire k=v pair
* add a corruption case metric+log * standardize the format of all recoverable internal errors makes a simpler, cleaner namespace, on which we can easily alert. * standardize terms: - corrupt for internal datastructures - invalid for user data * no need to export vars
idx/memory/tag_query.go
Outdated
@@ -328,11 +318,19 @@ func (q *TagQuery) filterByMatch(resultSet TagIDs, byId map[string]*idx.Archive, | |||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this just break
? There shouldn't be another tag value for this tag key, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm github is not showing anything besides the continue
line. are you talking about the case where we found an entry in notMatchingTags
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shanson7 you're right. each def
should only have one value per key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i gave the inner loop a name for consistency with the outer one: 6fb73bf
@shanson7 this will be merged very soon now, unless you have any more feedback :) |
Adds a tag index, together with the functions to query it.
It does not expose any API endpoints yet, those will follow in a separate PR (from branch https://github.com/grafana/metrictank/compare/tag_api_endpoints).