Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile, bytes: bootstrap array causes bytes.Buffer to always be heap-allocated #7921

Open
lukescott opened this issue May 1, 2014 · 26 comments
Labels
help wanted NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@lukescott
Copy link

What steps reproduce the problem?

1. Download playground.zip
2. Run go build -gcflags=-m bytes.go
3. Run go build -gcflags=-m bytes2.go

playground/bytes: Resembles a simplified version of Go's bytes.Buffer, with the
exception that NewBuffer returns Buffer{} instead of &Buffer{}.
playground/bytes2: Different implementation that avoids reslicing

What happened?

$ go build -gcflags=-m bytes.go
# playground/bytes
bytes/buffer.go:12: can inline NewBuffer
bytes/buffer.go:57: can inline (*Buffer).Read
bytes/buffer.go:69: can inline (*Buffer).Bytes
bytes/buffer.go:12: leaking param: b to result ~anon1
bytes/buffer.go:16: leaking param: b
bytes/buffer.go:16: leaking param: b
bytes/buffer.go:34: make([]byte, 2 * cap(b.buf) + n) escapes to heap
bytes/buffer.go:16: leaking param: b
bytes/buffer.go:44: leaking param: b
bytes/buffer.go:44: leaking param: b
bytes/buffer.go:52: leaking param: b
bytes/buffer.go:52: (*Buffer).Write p does not escape
bytes/buffer.go:57: (*Buffer).Read b does not escape
bytes/buffer.go:57: (*Buffer).Read p does not escape
bytes/buffer.go:69: (*Buffer).Bytes b does not escape
# command-line-arguments
./bytes.go:9: inlining call to bytes.NewBuffer
./bytes.go:13: inlining call to Read
./bytes.go:18: inlining call to Bytes
./bytes.go:9: moved to heap: myBuf
./bytes.go:11: myBuf escapes to heap
./bytes.go:8: make([]byte, 4) escapes to heap
./bytes.go:14: myBuf escapes to heap
./bytes.go:8: make([]byte, 4) escapes to heap
./bytes.go:11: main []byte literal does not escape
./bytes.go:12: main make([]byte, 2) does not escape
./bytes.go:13: main myBuf does not escape
./bytes.go:14: main []byte literal does not escape
./bytes.go:18: main myBuf does not escape


$ go build -gcflags=-m bytes2.go
# playground/bytes2
bytes2/buffer.go:13: can inline NewBuffer
bytes2/buffer.go:61: can inline (*Buffer).Read
bytes2/buffer.go:73: can inline (*Buffer).Bytes
bytes2/buffer.go:13: leaking param: b to result ~anon1
bytes2/buffer.go:38: make([]byte, size) escapes to heap
bytes2/buffer.go:17: (*Buffer).grow b does not escape
bytes2/buffer.go:47: (*Buffer).Grow b does not escape
bytes2/buffer.go:54: (*Buffer).Write b does not escape
bytes2/buffer.go:54: (*Buffer).Write p does not escape
bytes2/buffer.go:61: (*Buffer).Read b does not escape
bytes2/buffer.go:61: (*Buffer).Read p does not escape
bytes2/buffer.go:73: (*Buffer).Bytes b does not escape
# command-line-arguments
./bytes2.go:9: inlining call to bytes2.NewBuffer
./bytes2.go:13: inlining call to Read
./bytes2.go:18: inlining call to Bytes
./bytes2.go:8: main make([]byte, 4) does not escape
./bytes2.go:11: main myBuf does not escape
./bytes2.go:11: main []byte literal does not escape
./bytes2.go:12: main make([]byte, 2) does not escape
./bytes2.go:13: main myBuf does not escape
./bytes2.go:14: main myBuf does not escape
./bytes2.go:14: main []byte literal does not escape
./bytes2.go:18: main myBuf does not escape
# command-line-arguments


What should have happened instead?

This shouldn't be happening with playground/bytes:

./bytes.go:9: moved to heap: myBuf
./bytes.go:11: myBuf escapes to heap
./bytes.go:8: make([]byte, 4) escapes to heap
./bytes.go:14: myBuf escapes to heap
./bytes.go:8: make([]byte, 4) escapes to heap

A re-slicing shouldn't cause playground/bytes's Buffer to always be heap allocated.

It should be like playground/bytes2, which avoids heap allocation until a resize happens:

bytes2/buffer.go:38: make([]byte, size) escapes to heap

Please provide any additional information below.

If this is working as intended, the implementation of bytes.Buffer should change to
allow it to be completely stack allocated. The playground/bytes2 implementation is
completely stack allocated until the initial buffer needs to be resized. This allows it
to operate on the stack, avoiding a heap allocation if possible.

Attachments:

  1. playground.zip (5470 bytes)
@lukescott
Copy link
Author

Comment 1:

Another thing to note: If I change "NewBuffer" in playground/bytes2 to return &Buffer{}
then buf (in main()) escapes to the heap even though NewBuffer is inlined:
./bytes2.go:9: inlining call to bytes2.NewBuffer
./bytes2.go:13: inlining call to Read
./bytes2.go:18: inlining call to Bytes
./bytes2.go:8: make([]byte, 4) escapes to heap
./bytes2.go:9: <S> &bytes2.Buffer literal does not escape
./bytes2.go:11: main []byte literal does not escape
./bytes2.go:12: main make([]byte, 2) does not escape
./bytes2.go:14: main []byte literal does not escape
Otherwise if I return Buffer{} directly nothing gets allocated to the heap, unless a
resize happens.

@ianlancetaylor
Copy link
Contributor

Comment 2:

Labels changed: added repo-main, release-go1.4.

@rsc
Copy link
Contributor

rsc commented Sep 15, 2014

Comment 3:

Labels changed: added release-none, removed release-go1.4.

Status changed to Accepted.

dvyukov added a commit that referenced this issue Jan 28, 2015
Escape analysis treats everything assigned to OIND/ODOTPTR as escaping.
As the result b escapes in the following code:

	func (b *Buffer) Foo() {
		n, m := ...
		b.buf = b.buf[n:m]
	}

This change recognizes such assignments and ignores them.

Update issue #9043.
Update issue #7921.

There are two similar cases in std lib that benefit from this optimization.
First is in archive/zip:

type readBuf []byte
func (b *readBuf) uint32() uint32 {
	v := binary.LittleEndian.Uint32(*b)
	*b = (*b)[4:]
	return v
}

Second is in time:

type data struct {
	p     []byte
	error bool
}

func (d *data) read(n int) []byte {
	if len(d.p) < n {
		d.p = nil
		d.error = true
		return nil
	}
	p := d.p[0:n]
	d.p = d.p[n:]
	return p
}

benchmark                         old ns/op     new ns/op     delta
BenchmarkCompressedZipGarbage     32431724      32217851      -0.66%

benchmark                         old allocs     new allocs     delta
BenchmarkCompressedZipGarbage     153            143            -6.54%

Change-Id: Ia6cd32744e02e36d6d8c19f402f8451101711626
Reviewed-on: https://go-review.googlesource.com/3162
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@cespare
Copy link
Contributor

cespare commented Mar 11, 2017

With Go 1.8, the provided sample code (the playground/bytes package) no longer escapes:

GOPATH=/tmp/gopath go build -gcflags -m bytes.go
# playground/bytes
/tmp/gopath/src/playground/bytes/buffer.go:12: can inline NewBuffer
/tmp/gopath/src/playground/bytes/buffer.go:57: can inline (*Buffer).Read
/tmp/gopath/src/playground/bytes/buffer.go:69: can inline (*Buffer).Bytes
/tmp/gopath/src/playground/bytes/buffer.go:12: leaking param: b to result ~r1 level=0
/tmp/gopath/src/playground/bytes/buffer.go:21: (*Buffer).grow ignoring self-assignment to b.buf
/tmp/gopath/src/playground/bytes/buffer.go:40: (*Buffer).grow ignoring self-assignment to b.buf
/tmp/gopath/src/playground/bytes/buffer.go:16: leaking param content: b
/tmp/gopath/src/playground/bytes/buffer.go:34: make([]byte, 2 * cap(b.buf) + n) escapes to heap
/tmp/gopath/src/playground/bytes/buffer.go:49: (*Buffer).Grow ignoring self-assignment to b.buf
/tmp/gopath/src/playground/bytes/buffer.go:44: leaking param content: b
/tmp/gopath/src/playground/bytes/buffer.go:52: leaking param content: b
/tmp/gopath/src/playground/bytes/buffer.go:52: leaking param content: p
/tmp/gopath/src/playground/bytes/buffer.go:57: leaking param content: b
/tmp/gopath/src/playground/bytes/buffer.go:57: (*Buffer).Read p does not escape
/tmp/gopath/src/playground/bytes/buffer.go:69: leaking param: b to result ~r0 level=1
# command-line-arguments
./bytes.go:9: inlining call to bytes.NewBuffer
./bytes.go:13: inlining call to (*bytes.Buffer).Read
./bytes.go:18: inlining call to (*bytes.Buffer).Bytes
./bytes.go:8: make([]byte, 4) escapes to heap
./bytes.go:11: main myBuf does not escape
./bytes.go:11: main ([]byte)("test") does not escape
./bytes.go:12: main make([]byte, 2) does not escape
./bytes.go:13: main myBuf does not escape
./bytes.go:14: main myBuf does not escape
./bytes.go:14: main ([]byte)("00") does not escape
./bytes.go:16: main string(buf) does not escape
./bytes.go:18: main myBuf does not escape
./bytes.go:18: main string(([]byte)(~r0)) does not escape

However, a bytes.Buffer still seems to always escape if you call its WriteXxx methods, so the root issue persists.

@cespare cespare changed the title cmd/gc, bytes: re-slicing causes bytes.Buffer to always allocate on heap cmd/compile, bytes: bootstrap array causes bytes.Buffer to always be heap-allocated Mar 11, 2017
@cespare
Copy link
Contributor

cespare commented Mar 11, 2017

It's the use of the bootstrap array that's forcing bytes.Buffer to always escape now, from what I can tell. Here's a very simple repro:

package main

type B struct {
	buf     []byte
	storage [64]byte
}

func (b *B) X() {
	b.buf = b.storage[:10]
}

func main() {
	var b B
	b.X()
}
./test.go:8: can inline (*B).X
./test.go:12: can inline main
./test.go:14: inlining call to (*B).X
./test.go:9: b.storage escapes to heap
./test.go:8: leaking param: b
./test.go:14: b.storage escapes to heap
./test.go:14: b escapes to heap
./test.go:13: moved to heap: b

Is it the case that any self-referencing pointers foil escape analysis? For instance, it also escapes if you do this:

type B struct {
	p *int
	n int
}

func (b *B) X() {
	b.p = &b.n
}

cc @randall77 for escape analysis thoughts

@lukescott I took the liberty of retitling the issue given that the re-slicing thing was fixed but the underlying problem of bytes.Buffer always being heap-allocated was not (you also discussed this in the now-closed #7661).

@randall77
Copy link
Contributor

I don't know why this would cause an escape. Probably a tricky corner case? Cycles in the escapes-to graph?
@drchase might know.

@dr2chase
Copy link
Contributor

I don't "know", but I suspect. Pretty sure that forms a cycle in the graph, which can look like an escape for various reasons.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/86976 mentions this issue: strings: prevent copyCheck from forcing Builder to escape and allocate

gopherbot pushed a commit that referenced this issue Jan 9, 2018
All credit and blame goes to Ian for this suggestion, copied from the
runtime.

Fixes #23382
Updates #7921

Change-Id: I3d5a9ee4ab730c87e0f3feff3e7fceff9bcf9e18
Reviewed-on: https://go-review.googlesource.com/86976
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/133375 mentions this issue: cmd/compile/internal/gc: handle array slice self-assign in esc.go

@quasilyte
Copy link
Contributor

It's hard to solve this issue without any API changes just with improvements to the escape analysis.
Although https://golang.org/cl/133375 does make it possible to implement "proper" bytes.Buffer outside of golang stdlib.

See https://github.com/intel-go/bytebuf.

The only change to buffer.go:

type Buffer struct {
	buf       []byte   // contents are the bytes buf[off : len(buf)]
	off       int      // read at &buf[off], write at &buf[len(buf)]
- 	bootstrap [64]byte // memory to hold first slice; helps small buffers avoid allocation.
+ 	bootstrap *[64]byte // memory to hold first slice; helps small buffers avoid allocation.
	lastRead  readOp   // last read operation, so that Unread* can work correctly.
}

And we get these numbers:

name            old time/op    new time/op    delta
String/empty-8     138ns ±13%      24ns ± 0%   -82.94%  (p=0.000 n=10+8)
String/5-8         186ns ±11%      60ns ± 1%   -67.82%  (p=0.000 n=10+10)
String/64-8        225ns ±10%     108ns ± 6%   -52.26%  (p=0.000 n=10+10)
String/128-8       474ns ±17%     338ns ±13%   -28.57%  (p=0.000 n=10+10)
String/1024-8      889ns ± 0%     740ns ± 1%   -16.78%  (p=0.000 n=9+10)

name            old alloc/op   new alloc/op   delta
String/empty-8      112B ± 0%        0B       -100.00%  (p=0.000 n=10+10)
String/5-8          117B ± 0%        5B ± 0%   -95.73%  (p=0.000 n=10+10)
String/64-8         176B ± 0%       64B ± 0%   -63.64%  (p=0.000 n=10+10)
String/128-8        368B ± 0%      256B ± 0%   -30.43%  (p=0.000 n=10+10)
String/1024-8     2.16kB ± 0%    2.05kB ± 0%    -5.19%  (p=0.000 n=10+10)

name            old allocs/op  new allocs/op  delta
String/empty-8      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
String/5-8          2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
String/64-8         2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
String/128-8        3.00 ± 0%      2.00 ± 0%   -33.33%  (p=0.000 n=10+10)
String/1024-8       3.00 ± 0%      2.00 ± 0%   -33.33%  (p=0.000 n=10+10)

@rasky
Copy link
Member

rasky commented Sep 5, 2018

@quasilyte there's something I don't understand. The theory behind the bootstrap array was to avoid an allocation, by having the initial slice point to it. Unfortunately, our escape analysis is unable to prove that the buffer does not escape (would you be able to explain why, what is the limit?), so we always got 1 allocation (the bytes.Buffer instance).

Your change makes the buffer to be a pointer to an array, so that (bytes.Buffer) does not escape, but it forces an allocation for the bootstrap buffer.

So shouldn't it be the same?

@quasilyte
Copy link
Contributor

@rasky trick is that bootstrap array allocated with new may be proven as non-escaping with the current escape analysis. So, it does achieve the initial goal of avoiding heap allocations for local buffers that may benefit from small buffer optimization.

@ghost
Copy link

ghost commented Sep 6, 2018

Without applying CL133375 and with this patch to buffer.go (keeping zero Buffer value usable), I get these numbers in the bytebuf benchmarks:

name                         old time/op    new time/op    delta
String/bytes.Buffer/empty-4    64.1ns ± 0%    10.9ns ± 1%   -83.07%  (p=0.000 n=8+9)
String/bytes.Buffer/5-4        89.3ns ± 2%    66.5ns ± 1%   -25.52%  (p=0.000 n=10+10)
String/bytes.Buffer/64-4        103ns ± 0%      83ns ± 1%   -19.78%  (p=0.000 n=10+10)
String/bytes.Buffer/128-4       235ns ± 1%     175ns ± 2%   -25.57%  (p=0.000 n=9+10)
String/bytes.Buffer/1024-4      591ns ± 1%     547ns ± 2%    -7.56%  (p=0.000 n=8+9)

name                         old alloc/op   new alloc/op   delta
String/bytes.Buffer/empty-4      112B ± 0%        0B       -100.00%  (p=0.000 n=10+10)
String/bytes.Buffer/5-4          117B ± 0%       69B ± 0%   -41.03%  (p=0.000 n=10+10)
String/bytes.Buffer/64-4         176B ± 0%      128B ± 0%   -27.27%  (p=0.000 n=10+10)
String/bytes.Buffer/128-4        368B ± 0%      256B ± 0%   -30.43%  (p=0.000 n=10+10)
String/bytes.Buffer/1024-4     2.16kB ± 0%    2.05kB ± 0%    -5.19%  (p=0.000 n=10+10)

name                         old allocs/op  new allocs/op  delta
String/bytes.Buffer/empty-4      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
String/bytes.Buffer/5-4          2.00 ± 0%      2.00 ± 0%      ~     (all equal)
String/bytes.Buffer/64-4         2.00 ± 0%      2.00 ± 0%      ~     (all equal)
String/bytes.Buffer/128-4        3.00 ± 0%      2.00 ± 0%   -33.33%  (p=0.000 n=10+10)
String/bytes.Buffer/1024-4       3.00 ± 0%      2.00 ± 0%   -33.33%  (p=0.000 n=10+10)

The bytes package benchmarks aren't so rosy, though.

@reezer
Copy link

reezer commented Sep 6, 2018

Since this seems to be a big improvement for certain scenarios, but requires an API change, maybe this change would be a candidate for Go 2.

@quasilyte
Copy link
Contributor

@reezer, please, have some more patience.

What @opennota described is almost an optimal thing for go1.
I'll be back soon with explanations and, probably, a CL.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/133715 mentions this issue: bytes: remove bootstrap array from Buffer

@quasilyte
Copy link
Contributor

@opennota what you're doing is essentially a make([]byte), but in a slightly more convoluted way.
We can remove bootstrap field completely and replace it with make.

The main benefit from not having a bootstrap (or having it as a pointer) is that b receiver
stops leaking from Buffer.grow method. Since Buffer.grow can't be inlined, it's important
to not have receiver leaking in order to make stack allocated buffer possible.

It also opens some new optimization possibilities described below.


Results for benchmarks from bytebuf package (comparing only old and new bytes.Buffer):

name            old time/op    new time/op    delta
String/empty-8     132ns ±14%      21ns ± 0%   -84.30%  (p=0.000 n=10+10)
String/5-8         190ns ±12%     130ns ± 5%   -31.62%  (p=0.000 n=10+9)
String/64-8        205ns ±15%     166ns ±13%   -18.79%  (p=0.000 n=10+10)
String/1024-8      831ns ± 1%     780ns ± 0%    -6.09%  (p=0.000 n=8+7)

name            old alloc/op   new alloc/op   delta
String/empty-8      112B ± 0%        0B       -100.00%  (p=0.000 n=10+10)
String/5-8          117B ± 0%       69B ± 0%   -41.03%  (p=0.000 n=10+10)
String/64-8         176B ± 0%      128B ± 0%   -27.27%  (p=0.000 n=10+10)
String/1024-8     2.16kB ± 0%    2.05kB ± 0%    -5.19%  (p=0.000 n=10+10)

name            old allocs/op  new allocs/op  delta
String/empty-8      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
String/5-8          2.00 ± 0%      2.00 ± 0%      ~     (all equal)
String/64-8         2.00 ± 0%      2.00 ± 0%      ~     (all equal)
String/1024-8       3.00 ± 0%      2.00 ± 0%   -33.33%  (p=0.000 n=10+10)

With array self-assignment removed, we don't have a leaking param in bytes.Buffer.grow.
This makes it possible to do a fancy pre-allocation which will reside on stack for buffers that do not escape:

- var buf bytes.Buffer
+ buf := bytes.NewBuffer(make([]byte, 0, 64))

Note that one can use different capacity, so we can effectively have bootstrap "array" of more than 64 bytes.
And given that we're not leaking anymore, it really does improve things:

name            old time/op    new time/op    delta
String/empty-8     132ns ±14%      26ns ± 0%   -80.47%  (p=0.000 n=10+8)
String/5-8         190ns ±12%      53ns ± 1%   -71.95%  (p=0.000 n=10+10)
String/64-8        205ns ±15%      99ns ± 9%   -51.62%  (p=0.000 n=10+10)
String/1024-8      831ns ± 1%     931ns ±20%      ~     (p=0.500 n=8+10)

name            old alloc/op   new alloc/op   delta
String/empty-8      112B ± 0%        0B       -100.00%  (p=0.000 n=10+10)
String/5-8          117B ± 0%        5B ± 0%   -95.73%  (p=0.000 n=10+10)
String/64-8         176B ± 0%       64B ± 0%   -63.64%  (p=0.000 n=10+10)
String/1024-8     2.16kB ± 0%    2.18kB ± 0%    +0.74%  (p=0.000 n=10+10)

name            old allocs/op  new allocs/op  delta
String/empty-8      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
String/5-8          2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
String/64-8         2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
String/1024-8       3.00 ± 0%      2.00 ± 0%   -33.33%  (p=0.000 n=10+10)

Note that we do less allocations for workloads that fit the slice capacity.

Key points of the new implementation:

  • Zero value bytes.Buffer performs better than before
  • You can have trully stack-allocated buffer, and it's not even limited to 64 bytes
  • The unsafe.Sizeof(bytes.Buffer{}) is reduced significantly
  • Empty writes don't cause allocations

There were no big reasons to use bytes.NewBuffer before. Now there are.


bytes.Buffer benchmark results (go test -v -count=10 -bench='ReadString|WriteByte|WriteRune|Buffer' -benchmem bytes):

name                       old time/op    new time/op    delta
ReadString-8                 9.20µs ± 1%    9.22µs ± 1%     ~     (p=0.148 n=10+10)
WriteByte-8                  28.1µs ± 0%    26.2µs ± 0%   -6.78%  (p=0.000 n=10+10)
WriteRune-8                  64.9µs ± 0%    65.0µs ± 0%   +0.16%  (p=0.000 n=10+10)
BufferNotEmptyWriteRead-8     469µs ± 0%     461µs ± 0%   -1.76%  (p=0.000 n=9+10)
BufferFullSmallReads-8        108µs ± 0%     108µs ± 0%   -0.21%  (p=0.000 n=10+10)

name                       old speed      new speed      delta
ReadString-8               3.56GB/s ± 1%  3.55GB/s ± 1%     ~     (p=0.165 n=10+10)
WriteByte-8                 146MB/s ± 0%   156MB/s ± 0%   +7.26%  (p=0.000 n=9+10)
WriteRune-8                 189MB/s ± 0%   189MB/s ± 0%   -0.16%  (p=0.000 n=10+10)

name                       old alloc/op   new alloc/op   delta
ReadString-8                 32.8kB ± 0%    32.8kB ± 0%     ~     (all equal)
WriteByte-8                   0.00B          0.00B          ~     (all equal)
WriteRune-8                   0.00B          0.00B          ~     (all equal)
BufferNotEmptyWriteRead-8    4.72kB ± 0%    4.67kB ± 0%   -1.02%  (p=0.000 n=10+10)
BufferFullSmallReads-8       3.44kB ± 0%    3.33kB ± 0%   -3.26%  (p=0.000 n=10+10)

name                       old allocs/op  new allocs/op  delta
ReadString-8                   1.00 ± 0%      1.00 ± 0%     ~     (all equal)
WriteByte-8                    0.00           0.00          ~     (all equal)
WriteRune-8                    0.00           0.00          ~     (all equal)
BufferNotEmptyWriteRead-8      3.00 ± 0%      3.00 ± 0%     ~     (all equal)
BufferFullSmallReads-8         3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.000 n=10+10)

gopherbot pushed a commit that referenced this issue Sep 6, 2018
Rationale: small buffer optimization does not work and it has
made things slower since 2014. Until we can make it work,
we should prefer simpler code that also turns out to be more
efficient.

With this change, it's possible to use
NewBuffer(make([]byte, 0, bootstrapSize)) to get the desired
stack-allocated initial buffer since escape analysis can
prove the created slice to be non-escaping.

New implementation key points:

    - Zero value bytes.Buffer performs better than before
    - You can have a truly stack-allocated buffer, and it's not even limited to 64 bytes
    - The unsafe.Sizeof(bytes.Buffer{}) is reduced significantly
    - Empty writes don't cause allocations

Buffer benchmarks from bytes package:

    name                       old time/op    new time/op    delta
    ReadString-8                 9.20µs ± 1%    9.22µs ± 1%     ~     (p=0.148 n=10+10)
    WriteByte-8                  28.1µs ± 0%    26.2µs ± 0%   -6.78%  (p=0.000 n=10+10)
    WriteRune-8                  64.9µs ± 0%    65.0µs ± 0%   +0.16%  (p=0.000 n=10+10)
    BufferNotEmptyWriteRead-8     469µs ± 0%     461µs ± 0%   -1.76%  (p=0.000 n=9+10)
    BufferFullSmallReads-8        108µs ± 0%     108µs ± 0%   -0.21%  (p=0.000 n=10+10)

    name                       old speed      new speed      delta
    ReadString-8               3.56GB/s ± 1%  3.55GB/s ± 1%     ~     (p=0.165 n=10+10)
    WriteByte-8                 146MB/s ± 0%   156MB/s ± 0%   +7.26%  (p=0.000 n=9+10)
    WriteRune-8                 189MB/s ± 0%   189MB/s ± 0%   -0.16%  (p=0.000 n=10+10)

    name                       old alloc/op   new alloc/op   delta
    ReadString-8                 32.8kB ± 0%    32.8kB ± 0%     ~     (all equal)
    WriteByte-8                   0.00B          0.00B          ~     (all equal)
    WriteRune-8                   0.00B          0.00B          ~     (all equal)
    BufferNotEmptyWriteRead-8    4.72kB ± 0%    4.67kB ± 0%   -1.02%  (p=0.000 n=10+10)
    BufferFullSmallReads-8       3.44kB ± 0%    3.33kB ± 0%   -3.26%  (p=0.000 n=10+10)

    name                       old allocs/op  new allocs/op  delta
    ReadString-8                   1.00 ± 0%      1.00 ± 0%     ~     (all equal)
    WriteByte-8                    0.00           0.00          ~     (all equal)
    WriteRune-8                    0.00           0.00          ~     (all equal)
    BufferNotEmptyWriteRead-8      3.00 ± 0%      3.00 ± 0%     ~     (all equal)
    BufferFullSmallReads-8         3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.000 n=10+10)

The most notable thing in go1 benchmarks is reduced allocs in HTTPClientServer (-1 alloc):

    HTTPClientServer-8           64.0 ± 0%      63.0 ± 0%  -1.56%  (p=0.000 n=10+10)

For more explanations and benchmarks see the referenced issue.

Updates #7921

Change-Id: Ica0bf85e1b70fb4f5dc4f6a61045e2cf4ef72aa3
Reviewed-on: https://go-review.googlesource.com/133715
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@cespare
Copy link
Contributor

cespare commented Sep 6, 2018

I had retitled this bug to be about bytes.Buffer's bootstrap array, so now that we got rid of that I suppose we can close this bug.

The underlying issue of self-referential structures tricking escape analysis (as demonstrated in #7921 (comment)) remains, however.

gopherbot pushed a commit that referenced this issue Sep 17, 2018
Instead of skipping all OSLICEARR, skip only ones with non-pointer
array type. For pointers to arrays, it's safe to apply the
self-assignment slicing optimizations.

Refactored the matching code into separate function for readability.

This is an extension to already existing optimization.

On its own, it does not improve any code under std, but
it opens some new optimization opportunities. One
of them is described in the referenced issue.

Updates #7921

Change-Id: I08ac660d3ef80eb15fd7933fb73cf53ded9333ad
Reviewed-on: https://go-review.googlesource.com/133375
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Nov 15, 2018
The iterator returned by `tree.MakeIter` was escaping to the heap
for two reasons:
1. it was capturing a reference to the tree itself.
2. it was pointing a slice into its own array.

This change addresses both of these problems and prevents the iterator
from escaping when used. The fixes were:
1. copy the tree's root pointer reference instead of a reference
   to the tree itself.
2. avoid creating the self-referential slice reference. This mistakenly
   escapes because of golang/go#7921, which
   also caused issues with `bytes.Buffer` (https://golang.org/cl/133715).

This change also adds a new benchmark which demonstrates whether `MakeIter`
escapes or not:

```
name             old time/op    new time/op    delta
BTreeMakeIter-4     131ns ±14%      25ns ± 1%   -81.23%  (p=0.000 n=9+9)

name             old alloc/op   new alloc/op   delta
BTreeMakeIter-4      144B ± 0%        0B       -100.00%  (p=0.000 n=10+10)

name             old allocs/op  new allocs/op  delta
BTreeMakeIter-4      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
```

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Nov 15, 2018
The iterator returned by `tree.MakeIter` was escaping to the heap
for two reasons:
1. it was capturing a reference to the tree itself.
2. it was pointing a slice into its own array.

This change addresses both of these problems and prevents the iterator
from escaping when used. The fixes were:
1. copy the tree's root pointer reference instead of a reference
   to the tree itself.
2. avoid creating the self-referential slice reference. This mistakenly
   escapes because of golang/go#7921, which
   also caused issues with `bytes.Buffer` (https://golang.org/cl/133715).

This change also adds a new benchmark which demonstrates whether `MakeIter`
escapes or not:

```
name             old time/op    new time/op    delta
BTreeMakeIter-4     131ns ±14%      25ns ± 1%   -81.23%  (p=0.000 n=9+9)

name             old alloc/op   new alloc/op   delta
BTreeMakeIter-4      144B ± 0%        0B       -100.00%  (p=0.000 n=10+10)

name             old allocs/op  new allocs/op  delta
BTreeMakeIter-4      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
```

Release note: None
@mdempsky
Copy link
Contributor

mdempsky commented Apr 16, 2019

There's a lot of history on the thread, but skimming briefly I understand the issue boils down to @cespare's example:

type B struct {
	p *int
	n int
}

func (b *B) X() {
	b.p = &b.n
}

This function is analyzed as "leaking param: b", which means calling b.X() causes b to always be heap allocated.

The issue here is that we're assigning &b.n through a (implicit) pointer dereference, so escape analysis pessimistically assumes the pointer might point to the heap.

There's an optimization esc.go:isSelfAssign that tries to look for things like p.f = p.g to eliminate unnecessary leaks, but it doesn't recognize p.f = &p.g. It probably could though.

(The above explanation applies to both esc.go and escape.go; they use the same approach here and even both use isSelfAssign for this optimization.)

Edit: This is wrong. Simply recognizing p.f = &p.g in isSelfAssign is insufficient, because we need to still record that this assignment makes p self-referential.

@cuonglm
Copy link
Member

cuonglm commented Apr 30, 2019

@mdempsky If we eliminate unnecessary leaks in p.f = &p.g, then this code:

package escape

type S struct {
	i  int
	pi *int
}

var sink S

func f(p *S) {
	p.pi = &p.i
	sink = *p
}

func g(p *S) {
	p.pi = &p.i
}

func h() {
	var s S
	g(&s)
	sink = s
}

will only has leaking param content in func f(p *S) { right? Will var s S moves to heap?

@mdempsky
Copy link
Contributor

@cuonglm Unfortunately, my earlier suggestion to ignore p.f = &p.g like how we ignore p.f = p.g is wrong.

In the example you pasted, it's important that var s S be heap allocated: after the assignment sink = s, we have sink.pi pointing to s.i. If we naively treated g as a non-escaping function though, s would be stack allocated.

To fix this, we need a way to tag functions with something like "*p becomes self-referential". The current scheme used by esc.go (and inherited by escape.go) doesn't support this though.

Once we remove esc.go, I expect we'll be able to cleanup/simplify the tagging scheme, and there will probably be opportunities for representing more fine-grained semantics like this.

@zhangyoufu zhangyoufu mentioned this issue May 21, 2019
@szyhf
Copy link

szyhf commented Sep 29, 2019

I think I may case the same problem:

#34463 (comment)

#34463 (comment)

The memory is leaking

@josharian
Copy link
Contributor

josharian commented Feb 2, 2020

@mdempsky now that esc.go is gone, is this ripe for revisiting?

@rogpeppe
Copy link
Contributor

rogpeppe commented Mar 5, 2021

I just ran up against this issue again and wondered if it's likely that there might be some work done on this in the next cycle or two.
@mdempsky ?

@ysmolski
Copy link
Member

latest go1.17-b8a7c33ec9:

package main

type B struct {
	buf     []byte
	storage [64]byte
}

func (b *B) X() {
	b.buf = b.storage[:10]
}

func main() {
	var b B
	b.X()
}

Builds:

~ % go build -gcflags -m escaping.go
# command-line-arguments
./escaping.go:8:6: can inline (*B).X
./escaping.go:12:6: can inline main
./escaping.go:14:5: inlining call to (*B).X
./escaping.go:8:7: leaking param: b
./escaping.go:13:6: moved to heap: b

@josharian
Copy link
Contributor

I believe that if this were fixed, we could use a bootstrap array to fix #2320.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests