-
Notifications
You must be signed in to change notification settings - Fork 105
Improve the performance of memory index find #655
Conversation
idx/memory/memory.go
Outdated
} | ||
} | ||
} else if (matchAll) { | ||
log.Debug("Matching all children") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefix the log message with "memory-idx: "
idx/memory/memory.go
Outdated
patterns = expandQueries(pattern) | ||
} else { | ||
patterns = []string{pattern} | ||
log.Debug("memory-idx: reached pattern length. %d nodes matched", pos, len(children)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pos
is not used in the string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
b43d906
to
c7495cb
Compare
Rebased against current master, changed logic to use closures to encapsulate matching functionality. This cut out a few more allocs, and removes the clutter of matching logic from the BFS logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the major thing here is the reuse of the regex instead of compiling over and over again. that's a great find and the results look promising. that said i found a few issues which will need to be adressed.
idx/memory/memory.go
Outdated
// filepath.Match doesn't support {} because that's not posix, it's a bashism | ||
// the easiest way of implementing this extra feature is just expanding single queries | ||
// that contain these queries into multiple queries, who will be checked separately | ||
// and whose results will be ORed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this comment being removed? it provides helpful documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a first reader of this code, this comment confused me. I read it as "We use filepath.Match, but need to workaround it's lack of support for {}
". I could change it to:
"We don't use filepath.Match because it..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so your first interpretation was correct. But I only realized now we've stopped using filepatch.Match (via 6dc3776) so you're right that it needs to be rephrased at least.
idx/memory/memory.go
Outdated
return nil, err | ||
} | ||
return func(children []string) []string { | ||
matches := make([]string, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to allocate. can just use var matches []string
here. will speed up cases without matches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
idx/memory/memory.go
Outdated
return []string{c} | ||
} | ||
} | ||
return []string{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to allocate. can just use return nil
here (which is a valid value of a zero-sized slice. see https://dave.cheney.net/2013/01/19/what-is-the-zero-value-and-why-is-it-useful for example). will speed up cases without matches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nifty. 👍
idx/memory/memory.go
Outdated
} | ||
|
||
// Convert to regex and match | ||
if strings.ContainsAny(path, "*{}[]?") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's quite common to have patterns with a char from {}
but not any from *[]?
, in those cases we don't need to generate regexes. and that's also why we had the expandQueries
function which turns patterns from this case into a set of strings that can be equality checked So would it make sense to just put expandQueries
back and work with that? It would be helpful anyway to only have this PR change the things i needs to change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. This was mostly to simplify the search code to run a single iteration through each node. I could instead make this optimization by having the exact match closure operate on multiple strings.
The logic would then be:
If there are special characters that are not {}, form the regex (which may include conversion of {} to regex)
else match 1 or more exact.
idx/memory/memory.go
Outdated
@@ -344,21 +344,21 @@ func (m *MemoryIdx) find(orgId int, pattern string) ([]*Node, error) { | |||
// for a query like foo.bar.baz, pos is 2 | |||
// for a query like foo.bar.* or foo.bar, pos is 1 | |||
// for a query like foo.b*.baz, pos is 0 | |||
pos := len(nodes) - 1 | |||
pos := len(nodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change makes the comments above incorrect, so comments should be updated also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
idx/memory/memory.go
Outdated
} | ||
return queries | ||
|
||
p = strings.Replace(p, "*", ".*", -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing i just realized, is that *
should actually be translated to [^\.]*
because in graphite world it's only a wildcard within a "node" (where node is dot separated). so we can tell the regex engine to stop looking when it encounters a dot. that will probably give a little performance increase as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The children are individual node strings, so they shouldn't contain dots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops you're right of course.
I commented out the I tested 3 basic schemes:
I tested using only these 4 patterns (so that non expanding patterns didn't wash out the results):
Results:
So, option (2) seems to be the way to go for these queries. |
The downside of option (2) in the above case is that we have 2 different methods of expanding With this in mind, and @Dieterbe suggestion to reduce the scope of the changes, I'll go with option (3) to expand the queries and then operate on the multiple expansions. |
Ok, this last round of changes implemented scheme 3. Final benchmarks are: With debug disabled (what prod would be in ideal case):
With debug enabled (what prod is in current case):
|
Was there anything else to be done on this PR? |
hey @shanson7 this looks good now. i rebased them (so history will look cleaner) and merged into master. thanks! |
I am testing out metrictank for usage with some timeseries with high cardinality nodes. I noticed that this caused some find operations to really lag.
I decided to look through the code to see if there was a way to speed up the find behavior. I noticed a couple of things that looked like they could be adjusted:
WARNING : I have never written go before, so this PR might need some extra attention. For instance, I initially created an interface to simplify the code (Matcher, with RegexMatcher, ExactMatcher and AllMatcher). Turns out whatever I did just made it WAAAY slower. I still liked the code that way, but I think it needs a go experts eye to help adjust that.
Benchmarks on a 2-core 12GB RAM ubuntu-16.04 VM (dedicated):
And on my 8-core mac (running bunches of other things)