Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Avoid unnecessary string allocations during memory-idx.Add #687

Merged
merged 1 commit into from
Jul 17, 2017

Conversation

shanson7
Copy link
Collaborator

This is a small optimization (shaved about 3% off the heap size in our POC cluster, not major).

Instead of splitting and joining, the code iterates over the string in place and just saves references to the pieces of the string.

Benchmark:

benchmark               old ns/op     new ns/op     delta
BenchmarkIndexing-8     7607          5287          -30.50%

benchmark               old allocs     new allocs     delta
BenchmarkIndexing-8     22             18             -18.18%

benchmark               old bytes     new bytes     delta
BenchmarkIndexing-8     1246          1150          -7.70%

@shanson7 shanson7 changed the title Avoid string allocations Avoid string allocations during memory-idx.Add Jul 13, 2017
@shanson7 shanson7 changed the title Avoid string allocations during memory-idx.Add Avoid unnecessary string allocations during memory-idx.Add Jul 13, 2017
@woodsaj woodsaj requested a review from Dieterbe July 17, 2017 07:09
Copy link
Contributor

@Dieterbe Dieterbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the lower allocs, and the code seems a bit simpler too (2 loops instead of 1)
However I was surprised to find that on my system the code runs slower.
I ran the benchmarks 20 times master vs your branch, (with a warmup) like so:

date; for i in {1..20}; do go test -bench=. > /dev/null; done; for i in {1..20}; do go test -bench=. >> master.txt; done; git checkout shanson/FewerAllocs; for i in {1..20}; do go test -bench=. >> new.txt; done; date

then I compared with https://godoc.org/golang.org/x/perf/cmd/benchstat:

benchstat master.txt new.txt
name               old time/op    new time/op    delta
Find-8                108µs ± 3%     109µs ± 2%     ~     (p=0.247 n=20+19)
Concurrent4Find-8    63.2µs ± 2%    61.5µs ± 3%   -2.56%  (p=0.000 n=20+19)
Concurrent8Find-8    52.9µs ± 1%    51.4µs ± 2%   -2.76%  (p=0.000 n=18+19)
Indexing-8           2.81µs ± 2%    3.56µs ±33%  +26.83%  (p=0.000 n=18+20)
Deletes-8            3.45µs ±46%    3.21µs ±15%     ~     (p=0.092 n=20+15)

name               old alloc/op   new alloc/op   delta
Find-8               50.3kB ± 0%    50.3kB ± 0%   -0.00%  (p=0.048 n=20+20)
Concurrent4Find-8    50.3kB ± 0%    50.3kB ± 0%     ~     (p=0.351 n=20+20)
Concurrent8Find-8    50.3kB ± 0%    50.3kB ± 0%     ~     (p=0.982 n=19+20)
Indexing-8           1.27kB ± 0%    1.18kB ± 0%   -7.55%  (p=0.000 n=18+20)
Deletes-8            1.39kB ± 0%    1.39kB ± 0%     ~     (all equal)

name               old allocs/op  new allocs/op  delta
Find-8                  967 ± 0%       967 ± 0%     ~     (all equal)
Concurrent4Find-8       967 ± 0%       967 ± 0%     ~     (all equal)
Concurrent8Find-8       966 ± 0%       966 ± 0%     ~     (all equal)
Indexing-8             21.0 ± 0%      17.0 ± 0%  -19.05%  (p=0.000 n=20+20)
Deletes-8              8.00 ± 0%      8.00 ± 0%     ~     (all equal)

I wonder how you got that decrease in runtime for this benchmark.
could be a fluke. these sort of benchmarks can be sensitive to background noise (e.g. browsers/deskstop stuff running in the background, cpu throttling (frequency adjustments) kicking in, etc)

branch := strings.Join(nodes[0:i], ".")
pos := strings.LastIndex(path, ".")
prevPos := len(path)
for true {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can just for { with go :)

tree.Items[branch] = &Node{
Path: branch,
Children: []string{nodes[nodePos]},
Children: []string{prevNode},
Defs: make([]string, 0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we could probably save ourselves another allocation here.
Because Arrays do not need to be initialized explicitly; the zero value of an array is a ready-to-use array whose elements are themselves zeroed: https://blog.golang.org/go-slices-usage-and-internals

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to be clear though, this is a slice. but yeah, could use nil here and save an allocation.
from that same page:

The zero value of a slice is nil. The len and cap functions will both return 0 for a nil slice.
Since the zero value of a slice (nil) acts like a zero-length slice, you can declare a slice variable and then append to it

note that this is not @shanson7's doing , but ours. we have a bunch of these cases littered around that we need to clean up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i'm going to sed search & replace them and add another PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a bunch of more cases like this through the file. we can adress in an other PR.

@Dieterbe
Copy link
Contributor

I suspect the log.Debug calls are somehow interfering. So I did a new bench run with all log.Debug calls commented out in both master branch and your branch. I also found a cleaner way to run only the benchmark we care about, like so:

$ go test -run='^$' -bench=BenchmarkIndexing -count 20 >/dev/null; go test -run='^$' -bench=BenchmarkIndexing -count 20 > new2.txt; git checkout master; go test -run='^$' -bench=BenchmarkIndexing -count 20 > master2.txt


$ benchstat master2.txt new2.txt
name        old time/op    new time/op    delta
Indexing-8    3.74µs ±10%    3.61µs ± 8%   -3.61%  (p=0.040 n=19+20)

name        old alloc/op   new alloc/op   delta
Indexing-8    1.19kB ± 0%    1.13kB ± 0%   -5.20%  (p=0.000 n=19+17)

name        old allocs/op  new allocs/op  delta
Indexing-8      16.0 ± 0%      14.0 ± 0%  -12.50%  (p=0.000 n=20+20)

so this looks a lot better.
optimizng the log.Debug calls (eg to not execute any logic when Debug is not enabled) is on the roadmap for the future, so we don't have to worry about it here.

@Dieterbe Dieterbe merged commit 2bf672d into grafana:master Jul 17, 2017
@shanson7 shanson7 deleted the FewerAllocs branch July 18, 2017 14:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants