Improve the performance of memory index find #655

shanson7 · 2017-06-16T15:42:38Z

I am testing out metrictank for usage with some timeseries with high cardinality nodes. I noticed that this caused some find operations to really lag.

I decided to look through the code to see if there was a way to speed up the find behavior. I noticed a couple of things that looked like they could be adjusted:

Different code path for {a,b,c} lists versus other special chars. I turned these into (a|b|c) to simplify the code flow. This did not really change the performance of the benchmarks
Break loop early when doing exact string match. No need to continue looking for matches and adding nodes at that point.
Compile the regex once for all queued children. This means that even if we have 100k children queued up in the BFS, we only compile the regex once, instead of once per child node.
Special case '*' and avoid regex altogether. This ended up being a pretty huge improvement.

WARNING : I have never written go before, so this PR might need some extra attention. For instance, I initially created an interface to simplify the code (Matcher, with RegexMatcher, ExactMatcher and AllMatcher). Turns out whatever I did just made it WAAAY slower. I still liked the code that way, but I think it needs a go experts eye to help adjust that.

Benchmarks on a 2-core 12GB RAM ubuntu-16.04 VM (dedicated):

benchmark                      old ns/op     new ns/op     delta
BenchmarkFind-2                257933        148660        -42.36%
BenchmarkConcurrent4Find-2     144912        87049         -39.93%
BenchmarkConcurrent8Find-2     128865        83874         -34.91%

benchmark                      old allocs     new allocs     delta
BenchmarkFind-2                1110           658            -40.72%
BenchmarkConcurrent4Find-2     1111           658            -40.77%
BenchmarkConcurrent8Find-2     1111           658            -40.77%

benchmark                      old bytes     new bytes     delta
BenchmarkFind-2                55552         30501         -45.09%
BenchmarkConcurrent4Find-2     55601         30500         -45.14%
BenchmarkConcurrent8Find-2     55618         30493         -45.17%

And on my 8-core mac (running bunches of other things)

benchmark                      old ns/op     new ns/op     delta
BenchmarkFind-8                181536        108696        -40.12%
BenchmarkConcurrent4Find-8     73666         45098         -38.78%
BenchmarkConcurrent8Find-8     66373         51848         -21.88%

benchmark                      old allocs     new allocs     delta
BenchmarkFind-8                1111           658            -40.77%
BenchmarkConcurrent4Find-8     1111           658            -40.77%
BenchmarkConcurrent8Find-8     1111           658            -40.77%

benchmark                      old bytes     new bytes     delta
BenchmarkFind-8                55620         30502         -45.16%
BenchmarkConcurrent4Find-8     55621         30497         -45.17%
BenchmarkConcurrent8Find-8     55628         30495         -45.18%

woodsaj · 2017-06-16T17:15:15Z

idx/memory/memory.go

+					}
+				}
+			} else if (matchAll) {
+				log.Debug("Matching all children")


prefix the log message with "memory-idx: "

woodsaj · 2017-06-16T20:34:14Z

idx/memory/memory.go

-		patterns = expandQueries(pattern)
-	} else {
-		patterns = []string{pattern}
+	log.Debug("memory-idx: reached pattern length. %d nodes matched", pos, len(children))


pos is not used in the string.

Good catch.

…' to avoid regex

shanson7 · 2017-07-13T15:32:31Z

Rebased against current master, changed logic to use closures to encapsulate matching functionality. This cut out a few more allocs, and removes the clutter of matching logic from the BFS logic.

Dieterbe

the major thing here is the reuse of the regex instead of compiling over and over again. that's a great find and the results look promising. that said i found a few issues which will need to be adressed.

Dieterbe · 2017-07-17T08:29:57Z

idx/memory/memory.go

-// filepath.Match doesn't support {} because that's not posix, it's a bashism
-// the easiest way of implementing this extra feature is just expanding single queries
-// that contain these queries into multiple queries, who will be checked separately
-// and whose results will be ORed.


why is this comment being removed? it provides helpful documentation

As a first reader of this code, this comment confused me. I read it as "We use filepath.Match, but need to workaround it's lack of support for {}". I could change it to:
"We don't use filepath.Match because it..."

so your first interpretation was correct. But I only realized now we've stopped using filepatch.Match (via 6dc3776) so you're right that it needs to be rephrased at least.

Dieterbe · 2017-07-17T08:31:57Z

idx/memory/memory.go

+			return nil, err
+		}
+		return func(children []string) []string {
+			matches := make([]string, 0)


no need to allocate. can just use var matches []string here. will speed up cases without matches.

Dieterbe · 2017-07-17T08:34:06Z

idx/memory/memory.go

+				return []string{c}
+			}
+		}
+		return []string{}


no need to allocate. can just use return nil here (which is a valid value of a zero-sized slice. see https://dave.cheney.net/2013/01/19/what-is-the-zero-value-and-why-is-it-useful for example). will speed up cases without matches.

Nifty. 👍

Dieterbe · 2017-07-17T08:49:46Z

idx/memory/memory.go

+	}
+
+	// Convert to regex and match
+	if strings.ContainsAny(path, "*{}[]?") {


It's quite common to have patterns with a char from {} but not any from *[]?, in those cases we don't need to generate regexes. and that's also why we had the expandQueries function which turns patterns from this case into a set of strings that can be equality checked So would it make sense to just put expandQueries back and work with that? It would be helpful anyway to only have this PR change the things i needs to change

That's fair. This was mostly to simplify the search code to run a single iteration through each node. I could instead make this optimization by having the exact match closure operate on multiple strings.

The logic would then be:
If there are special characters that are not {}, form the regex (which may include conversion of {} to regex)
else match 1 or more exact.

Dieterbe · 2017-07-17T08:51:17Z

idx/memory/memory.go

@@ -344,21 +344,21 @@ func (m *MemoryIdx) find(orgId int, pattern string) ([]*Node, error) {
 	// for a query like foo.bar.baz, pos is 2
 	// for a query like foo.bar.* or foo.bar, pos is 1
 	// for a query like foo.b*.baz, pos is 0
-	pos := len(nodes) - 1
+	pos := len(nodes)


this change makes the comments above incorrect, so comments should be updated also.

Dieterbe · 2017-07-17T08:52:52Z

idx/memory/memory.go

 	}
-	return queries
+
+	p = strings.Replace(p, "*", ".*", -1)


one thing i just realized, is that * should actually be translated to [^\.]* because in graphite world it's only a wildcard within a "node" (where node is dot separated). so we can tell the regex engine to stop looking when it encounters a dot. that will probably give a little performance increase as well.

The children are individual node strings, so they shouldn't contain dots.

oops you're right of course.

shanson7 · 2017-07-17T17:48:08Z

I commented out the log.Debug statements (as @Dieterbe noted they are polluting the results and will hopefully eventually not be evaluated in production when the logging library is replaced).

I tested 3 basic schemes:

This PR as-is
Expanding just non-regex queries that contain {} and exact matching them
Expanding all {}, then either compiling them if needed, or exact matching

I tested using only these 4 patterns (so that non expanding patterns didn't wash out the results):

{Pattern: "collectd.{dc1,dc50}.host960.disk.disk1.disk_ops.*", ExpectedResults: 2},
{Pattern: "*.dc3.host96{1,3}.cpu.1.*", ExpectedResults: 16},
{Pattern: "*.dc3.{host,server}96{1,3}.cpu.1.*", ExpectedResults: 16},
{Pattern: "*.dc3.{host,server}*{1,3}.cpu.1.*", ExpectedResults: 160},

Results:

name \ time/op     run1.noexpand.txt  run1.expandexact.txt  run1.fullexpand.txt
Find-8                    115µs ± 5%             68µs ± 1%           138µs ±32%
Concurrent4Find-8        65.2µs ± 6%           47.1µs ±10%          77.4µs ±12%
Concurrent8Find-8        64.2µs ± 4%           51.7µs ±10%          69.1µs ±22%

name \ alloc/op    run1.noexpand.txt  run1.expandexact.txt  run1.fullexpand.txt
Find-8                   42.5kB ± 0%           36.4kB ± 0%          69.3kB ± 0%
Concurrent4Find-8        42.5kB ± 0%           36.4kB ± 0%          69.3kB ± 0%
Concurrent8Find-8        42.5kB ± 0%           36.4kB ± 0%          69.3kB ± 0%

name \ allocs/op   run1.noexpand.txt  run1.expandexact.txt  run1.fullexpand.txt
Find-8                      215 ± 0%              160 ± 0%             210 ± 0%
Concurrent4Find-8           215 ± 0%              160 ± 0%             210 ± 0%
Concurrent8Find-8           215 ± 0%              160 ± 0%             210 ± 0%

So, option (2) seems to be the way to go for these queries.

shanson7 · 2017-07-17T20:19:49Z

The downside of option (2) in the above case is that we have 2 different methods of expanding {} dependent on if there is a regex char in the string. I can't think of a clean and efficient method of deduplicating the parsing code (since one creates multiple strings and the other makes a regex).

With this in mind, and @Dieterbe suggestion to reduce the scope of the changes, I'll go with option (3) to expand the queries and then operate on the multiple expansions.

shanson7 · 2017-07-17T21:28:09Z

Ok, this last round of changes implemented scheme 3. Final benchmarks are:

With debug disabled (what prod would be in ideal case):

$ benchstat  master.txt fullExpand.txt
name               old time/op    new time/op    delta
Find-8                168µs ± 3%      76µs ±12%  -55.05%  (p=0.000 n=20+20)
Concurrent4Find-8    77.8µs ± 8%    39.8µs ± 9%  -48.82%  (p=0.000 n=19+19)
Concurrent8Find-8    68.4µs ±14%    37.5µs ±23%  -45.14%  (p=0.000 n=20+19)

name               old alloc/op   new alloc/op   delta
Find-8               51.2kB ± 0%    23.0kB ± 0%  -55.15%  (p=0.000 n=17+18)
Concurrent4Find-8    51.2kB ± 0%    23.0kB ± 0%  -55.15%  (p=0.000 n=17+20)
Concurrent8Find-8    51.2kB ± 0%    23.0kB ± 0%  -55.15%  (p=0.000 n=19+19)

name               old allocs/op  new allocs/op  delta
Find-8                  626 ± 0%       156 ± 0%  -75.08%  (p=0.000 n=20+20)
Concurrent4Find-8       626 ± 0%       156 ± 0%  -75.08%  (p=0.000 n=20+20)
Concurrent8Find-8       626 ± 0%       156 ± 0%  -75.08%  (p=0.000 n=20+20)

With debug enabled (what prod is in current case):

$ benchstat  master.dbg.txt fullExpand.dbg.txt
name               old time/op    new time/op    delta
Find-8                163µs ± 9%      93µs ± 8%  -43.08%  (p=0.000 n=19+20)
Concurrent4Find-8    70.4µs ± 9%    46.6µs ± 7%  -33.75%  (p=0.000 n=19+20)
Concurrent8Find-8    72.0µs ± 4%    40.0µs ± 5%  -44.49%  (p=0.000 n=19+19)

name               old alloc/op   new alloc/op   delta
Find-8               58.1kB ± 0%    28.7kB ± 0%  -50.68%  (p=0.000 n=20+19)
Concurrent4Find-8    58.1kB ± 0%    28.7kB ± 0%  -50.68%  (p=0.000 n=16+19)
Concurrent8Find-8    58.1kB ± 0%    28.7kB ± 0%  -50.68%  (p=0.000 n=20+20)

name               old allocs/op  new allocs/op  delta
Find-8                1.19k ± 0%     0.65k ± 0%  -45.85%  (p=0.000 n=20+20)
Concurrent4Find-8     1.19k ± 0%     0.65k ± 0%  -45.85%  (p=0.000 n=20+20)
Concurrent8Find-8     1.19k ± 0%     0.65k ± 0%  -45.85%  (p=0.000 n=20+20)

shanson7 · 2017-08-10T21:26:26Z

Was there anything else to be done on this PR?

Dieterbe · 2017-08-11T09:48:08Z

hey @shanson7 this looks good now. i rebased them (so history will look cleaner) and merged into master. thanks!

woodsaj reviewed Jun 16, 2017

View reviewed changes

Dieterbe added the perf label Jun 28, 2017

Sean Hanson added 4 commits July 13, 2017 09:47

Use regex for braced list, break loop on exact match, special case '*…

e4eb940

…' to avoid regex

Add component name to log statement

e899ada

Fix debug statement

c51d66b

Use closures to encapsulate matching logic

c7495cb

shanson7 force-pushed the findImprovements branch from b43d906 to c7495cb Compare July 13, 2017 15:30

woodsaj requested a review from Dieterbe July 17, 2017 07:14

woodsaj approved these changes Jul 17, 2017

View reviewed changes

Dieterbe suggested changes Jul 17, 2017

View reviewed changes

Sean Hanson added 3 commits July 17, 2017 17:21

Review comments

1be9568

Fix comment

5ef7a30

Fix comment

eb389cc

Dieterbe added a commit that referenced this pull request Aug 11, 2017

Merge branch 'findImprovements' #655

3c3e8ac

Dieterbe approved these changes Aug 11, 2017

View reviewed changes

Dieterbe closed this Aug 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the performance of memory index find #655

Improve the performance of memory index find #655

shanson7 commented Jun 16, 2017

woodsaj Jun 16, 2017

woodsaj Jun 16, 2017

shanson7 Jun 16, 2017

shanson7 commented Jul 13, 2017

Dieterbe left a comment

Dieterbe Jul 17, 2017

shanson7 Jul 17, 2017

Dieterbe Jul 17, 2017

Dieterbe Jul 17, 2017

shanson7 Jul 17, 2017

Dieterbe Jul 17, 2017

shanson7 Jul 17, 2017

Dieterbe Jul 17, 2017 •

edited

Loading

shanson7 Jul 17, 2017

Dieterbe Jul 17, 2017

shanson7 Jul 17, 2017

Dieterbe Jul 17, 2017

shanson7 Jul 17, 2017

Dieterbe Jul 17, 2017

shanson7 commented Jul 17, 2017

shanson7 commented Jul 17, 2017

shanson7 commented Jul 17, 2017

shanson7 commented Aug 10, 2017

Dieterbe commented Aug 11, 2017

Improve the performance of memory index find #655

Improve the performance of memory index find #655

Conversation

shanson7 commented Jun 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shanson7 commented Jul 13, 2017

Dieterbe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe Jul 17, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shanson7 commented Jul 17, 2017

shanson7 commented Jul 17, 2017

shanson7 commented Jul 17, 2017

shanson7 commented Aug 10, 2017

Dieterbe commented Aug 11, 2017

Dieterbe Jul 17, 2017 •

edited

Loading