Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Improve the performance of memory index find #655

Closed
wants to merge 7 commits into from

Conversation

shanson7
Copy link
Collaborator

I am testing out metrictank for usage with some timeseries with high cardinality nodes. I noticed that this caused some find operations to really lag.

I decided to look through the code to see if there was a way to speed up the find behavior. I noticed a couple of things that looked like they could be adjusted:

  1. Different code path for {a,b,c} lists versus other special chars. I turned these into (a|b|c) to simplify the code flow. This did not really change the performance of the benchmarks
  2. Break loop early when doing exact string match. No need to continue looking for matches and adding nodes at that point.
  3. Compile the regex once for all queued children. This means that even if we have 100k children queued up in the BFS, we only compile the regex once, instead of once per child node.
  4. Special case '*' and avoid regex altogether. This ended up being a pretty huge improvement.

WARNING : I have never written go before, so this PR might need some extra attention. For instance, I initially created an interface to simplify the code (Matcher, with RegexMatcher, ExactMatcher and AllMatcher). Turns out whatever I did just made it WAAAY slower. I still liked the code that way, but I think it needs a go experts eye to help adjust that.

Benchmarks on a 2-core 12GB RAM ubuntu-16.04 VM (dedicated):

benchmark                      old ns/op     new ns/op     delta
BenchmarkFind-2                257933        148660        -42.36%
BenchmarkConcurrent4Find-2     144912        87049         -39.93%
BenchmarkConcurrent8Find-2     128865        83874         -34.91%

benchmark                      old allocs     new allocs     delta
BenchmarkFind-2                1110           658            -40.72%
BenchmarkConcurrent4Find-2     1111           658            -40.77%
BenchmarkConcurrent8Find-2     1111           658            -40.77%

benchmark                      old bytes     new bytes     delta
BenchmarkFind-2                55552         30501         -45.09%
BenchmarkConcurrent4Find-2     55601         30500         -45.14%
BenchmarkConcurrent8Find-2     55618         30493         -45.17%

And on my 8-core mac (running bunches of other things)

benchmark                      old ns/op     new ns/op     delta
BenchmarkFind-8                181536        108696        -40.12%
BenchmarkConcurrent4Find-8     73666         45098         -38.78%
BenchmarkConcurrent8Find-8     66373         51848         -21.88%

benchmark                      old allocs     new allocs     delta
BenchmarkFind-8                1111           658            -40.77%
BenchmarkConcurrent4Find-8     1111           658            -40.77%
BenchmarkConcurrent8Find-8     1111           658            -40.77%

benchmark                      old bytes     new bytes     delta
BenchmarkFind-8                55620         30502         -45.16%
BenchmarkConcurrent4Find-8     55621         30497         -45.17%
BenchmarkConcurrent8Find-8     55628         30495         -45.18%

}
}
} else if (matchAll) {
log.Debug("Matching all children")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefix the log message with "memory-idx: "

patterns = expandQueries(pattern)
} else {
patterns = []string{pattern}
log.Debug("memory-idx: reached pattern length. %d nodes matched", pos, len(children))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pos is not used in the string.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

@Dieterbe Dieterbe added the perf label Jun 28, 2017
@shanson7
Copy link
Collaborator Author

Rebased against current master, changed logic to use closures to encapsulate matching functionality. This cut out a few more allocs, and removes the clutter of matching logic from the BFS logic.

@woodsaj woodsaj requested a review from Dieterbe July 17, 2017 07:14
Copy link
Contributor

@Dieterbe Dieterbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the major thing here is the reuse of the regex instead of compiling over and over again. that's a great find and the results look promising. that said i found a few issues which will need to be adressed.

// filepath.Match doesn't support {} because that's not posix, it's a bashism
// the easiest way of implementing this extra feature is just expanding single queries
// that contain these queries into multiple queries, who will be checked separately
// and whose results will be ORed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this comment being removed? it provides helpful documentation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a first reader of this code, this comment confused me. I read it as "We use filepath.Match, but need to workaround it's lack of support for {}". I could change it to:
"We don't use filepath.Match because it..."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so your first interpretation was correct. But I only realized now we've stopped using filepatch.Match (via 6dc3776) so you're right that it needs to be rephrased at least.

return nil, err
}
return func(children []string) []string {
matches := make([]string, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to allocate. can just use var matches []string here. will speed up cases without matches.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

return []string{c}
}
}
return []string{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to allocate. can just use return nil here (which is a valid value of a zero-sized slice. see https://dave.cheney.net/2013/01/19/what-is-the-zero-value-and-why-is-it-useful for example). will speed up cases without matches.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nifty. 👍

}

// Convert to regex and match
if strings.ContainsAny(path, "*{}[]?") {
Copy link
Contributor

@Dieterbe Dieterbe Jul 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quite common to have patterns with a char from {} but not any from *[]?, in those cases we don't need to generate regexes. and that's also why we had the expandQueries function which turns patterns from this case into a set of strings that can be equality checked So would it make sense to just put expandQueries back and work with that? It would be helpful anyway to only have this PR change the things i needs to change

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. This was mostly to simplify the search code to run a single iteration through each node. I could instead make this optimization by having the exact match closure operate on multiple strings.

The logic would then be:
If there are special characters that are not {}, form the regex (which may include conversion of {} to regex)
else match 1 or more exact.

@@ -344,21 +344,21 @@ func (m *MemoryIdx) find(orgId int, pattern string) ([]*Node, error) {
// for a query like foo.bar.baz, pos is 2
// for a query like foo.bar.* or foo.bar, pos is 1
// for a query like foo.b*.baz, pos is 0
pos := len(nodes) - 1
pos := len(nodes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change makes the comments above incorrect, so comments should be updated also.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

}
return queries

p = strings.Replace(p, "*", ".*", -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing i just realized, is that * should actually be translated to [^\.]* because in graphite world it's only a wildcard within a "node" (where node is dot separated). so we can tell the regex engine to stop looking when it encounters a dot. that will probably give a little performance increase as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The children are individual node strings, so they shouldn't contain dots.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops you're right of course.

@shanson7
Copy link
Collaborator Author

I commented out the log.Debug statements (as @Dieterbe noted they are polluting the results and will hopefully eventually not be evaluated in production when the logging library is replaced).

I tested 3 basic schemes:

  1. This PR as-is
  2. Expanding just non-regex queries that contain {} and exact matching them
  3. Expanding all {}, then either compiling them if needed, or exact matching

I tested using only these 4 patterns (so that non expanding patterns didn't wash out the results):

{Pattern: "collectd.{dc1,dc50}.host960.disk.disk1.disk_ops.*", ExpectedResults: 2},
{Pattern: "*.dc3.host96{1,3}.cpu.1.*", ExpectedResults: 16},
{Pattern: "*.dc3.{host,server}96{1,3}.cpu.1.*", ExpectedResults: 16},
{Pattern: "*.dc3.{host,server}*{1,3}.cpu.1.*", ExpectedResults: 160},

Results:

name \ time/op     run1.noexpand.txt  run1.expandexact.txt  run1.fullexpand.txt
Find-8                    115µs ± 5%             68µs ± 1%           138µs ±32%
Concurrent4Find-8        65.2µs ± 6%           47.1µs ±10%          77.4µs ±12%
Concurrent8Find-8        64.2µs ± 4%           51.7µs ±10%          69.1µs ±22%

name \ alloc/op    run1.noexpand.txt  run1.expandexact.txt  run1.fullexpand.txt
Find-8                   42.5kB ± 0%           36.4kB ± 0%          69.3kB ± 0%
Concurrent4Find-8        42.5kB ± 0%           36.4kB ± 0%          69.3kB ± 0%
Concurrent8Find-8        42.5kB ± 0%           36.4kB ± 0%          69.3kB ± 0%

name \ allocs/op   run1.noexpand.txt  run1.expandexact.txt  run1.fullexpand.txt
Find-8                      215 ± 0%              160 ± 0%             210 ± 0%
Concurrent4Find-8           215 ± 0%              160 ± 0%             210 ± 0%
Concurrent8Find-8           215 ± 0%              160 ± 0%             210 ± 0%

So, option (2) seems to be the way to go for these queries.

@shanson7
Copy link
Collaborator Author

The downside of option (2) in the above case is that we have 2 different methods of expanding {} dependent on if there is a regex char in the string. I can't think of a clean and efficient method of deduplicating the parsing code (since one creates multiple strings and the other makes a regex).

With this in mind, and @Dieterbe suggestion to reduce the scope of the changes, I'll go with option (3) to expand the queries and then operate on the multiple expansions.

@shanson7
Copy link
Collaborator Author

Ok, this last round of changes implemented scheme 3. Final benchmarks are:

With debug disabled (what prod would be in ideal case):

$ benchstat  master.txt fullExpand.txt
name               old time/op    new time/op    delta
Find-8                168µs ± 3%      76µs ±12%  -55.05%  (p=0.000 n=20+20)
Concurrent4Find-8    77.8µs ± 8%    39.8µs ± 9%  -48.82%  (p=0.000 n=19+19)
Concurrent8Find-8    68.4µs ±14%    37.5µs ±23%  -45.14%  (p=0.000 n=20+19)

name               old alloc/op   new alloc/op   delta
Find-8               51.2kB ± 0%    23.0kB ± 0%  -55.15%  (p=0.000 n=17+18)
Concurrent4Find-8    51.2kB ± 0%    23.0kB ± 0%  -55.15%  (p=0.000 n=17+20)
Concurrent8Find-8    51.2kB ± 0%    23.0kB ± 0%  -55.15%  (p=0.000 n=19+19)

name               old allocs/op  new allocs/op  delta
Find-8                  626 ± 0%       156 ± 0%  -75.08%  (p=0.000 n=20+20)
Concurrent4Find-8       626 ± 0%       156 ± 0%  -75.08%  (p=0.000 n=20+20)
Concurrent8Find-8       626 ± 0%       156 ± 0%  -75.08%  (p=0.000 n=20+20)

With debug enabled (what prod is in current case):

$ benchstat  master.dbg.txt fullExpand.dbg.txt
name               old time/op    new time/op    delta
Find-8                163µs ± 9%      93µs ± 8%  -43.08%  (p=0.000 n=19+20)
Concurrent4Find-8    70.4µs ± 9%    46.6µs ± 7%  -33.75%  (p=0.000 n=19+20)
Concurrent8Find-8    72.0µs ± 4%    40.0µs ± 5%  -44.49%  (p=0.000 n=19+19)

name               old alloc/op   new alloc/op   delta
Find-8               58.1kB ± 0%    28.7kB ± 0%  -50.68%  (p=0.000 n=20+19)
Concurrent4Find-8    58.1kB ± 0%    28.7kB ± 0%  -50.68%  (p=0.000 n=16+19)
Concurrent8Find-8    58.1kB ± 0%    28.7kB ± 0%  -50.68%  (p=0.000 n=20+20)

name               old allocs/op  new allocs/op  delta
Find-8                1.19k ± 0%     0.65k ± 0%  -45.85%  (p=0.000 n=20+20)
Concurrent4Find-8     1.19k ± 0%     0.65k ± 0%  -45.85%  (p=0.000 n=20+20)
Concurrent8Find-8     1.19k ± 0%     0.65k ± 0%  -45.85%  (p=0.000 n=20+20)

@shanson7
Copy link
Collaborator Author

Was there anything else to be done on this PR?

Dieterbe added a commit that referenced this pull request Aug 11, 2017
@Dieterbe
Copy link
Contributor

hey @shanson7 this looks good now. i rebased them (so history will look cleaner) and merged into master. thanks!

@Dieterbe Dieterbe closed this Aug 11, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants