Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: append growth sort of inconsistent and surprising #41239

Closed
seebs opened this issue Sep 6, 2020 · 32 comments
Closed

runtime: append growth sort of inconsistent and surprising #41239

seebs opened this issue Sep 6, 2020 · 32 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@seebs
Copy link
Contributor

seebs commented Sep 6, 2020

What version of Go are you using (go version)?

1.15

Does this issue reproduce with the latest release?

y

What operating system and processor architecture are you using (go env)?

Any, but also playground.

What did you do?

This actually grew out of a different question. Consider the fairly common slice-insert idiom:

s = append(s, 0)
copy(s[i+1:], s[i:])
s[i] = inserted

Assume i is less than len(s). You could also do:

s = append(s[:i+1], s[i:])
s[i] = inserted

I thought this might be faster because it sort of skips one copy of the s[i+1:] part of the slice -- it can in theory copy each half only once. In fact, it was dramatically slower, and allocated more memory, which surprised me. Further investigation led to:

https://play.golang.org/p/cR51ks72emv

What did you expect to see?

I don't know. I think I expected the behavior of slice growth on append to be the same regardless of how I achieved the outcome of "increase total length by 1".

What did you see instead?

Two different things, both somewhat surprising to me. The first is that, starting at N=1024, appending two slices produces wildly different resulting capacities than appending a couple of items to a slice. (Curiously, appending two items explicitly, like "append(s, 0, 1)" doesn't produce the different behavior, but appending b[size-1:]... does produce the single-item-append behavior.) Specifically, appending a single item grows the slice by 25%, but appending "a slice" does something much fancier. It will at least double the capacity of the first parameter to append(), but if you specify a cap on that parameter, it will then pick a size that's some amount larger than the total size it needs, but not necessarily as large. It might even be smaller than the size picked when appending a single item to the existing slice.

The second is that, for N=1023 exactly, the single-item append always goes straight to 2048. So, appending a single item to a 1023-item slice produces a 2048 capacity, but appending a single item to a 1024-or-larger item slice almost always produces a smaller capacity. This seems like a weird special case, and might be a bug in the sense of "a check that is supposed to do something specific is doing something else specific". (Or it might just be "grow by doubling until 1024, then grow by 25%, I guess?) But batch-append doesn't have that behavior. So, for N from 850 to 1023, batch-append grows the slice more slowly (and thus, allocates less memory and has to do less initialization in this particular use case), but for N from 1024 on up, batch-append tends to grow the slice faster (and thus has to do more extra work). It feels like they should be roughly comparable, probably?

I am not sure what the right outcome is. I think what I'd ideally like would be for the short-cut append to have the same semantics (when it's valid at all) including the amount by which capacity grows.

The "append instead of appending and copying" behavior actually seems moderately rewarding (a few percent faster, sometimes), but it becomes more valuable as N increases, but currently, it's only actually better when N is large and the index is small so the array isn't accidentally doubling in size. If they used the same growth sizing logic, I think it would be a potentially significant win for some use cases.

(Thanks to josharian for pointing out that the capacity of the first slice would be a significant factor.)

@agnivade agnivade added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance labels Sep 6, 2020
@agnivade
Copy link
Contributor

agnivade commented Sep 6, 2020

@randall77 @martisch

@ianlancetaylor ianlancetaylor changed the title append() growth sort of inconsistent and surprising. runtime: append growth sort of inconsistent and surprising Sep 6, 2020
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Sep 6, 2020
@ianlancetaylor
Copy link
Member

Consistency of append growth behavior is not a goal. The goal is that simply appending elements to a slice will have amortized linear performance. I'm not opposed to consistency, but the bare fact of not having it is not a bug.

Above you wrote this, but did you really mean it?

    s := append(s[i+1:], s[i:])
    s[i] = inserted

Maybe I'm misreading it, but that seems to drop s[0:i-1] from the slice.

@seebs
Copy link
Contributor Author

seebs commented Sep 6, 2020

... That is at least the second time today I've put the colon on the wrong side of that expression. So, no, I didn't mean that. Fixed it.

I see your point about the amortized linear performance, and after all, we can always manually make() things if we know the target size. Although I occasionally end up in a circumstance where I actually sort of care about the marginal cost of zeroing the memory out before copying the desired contents in.

I also periodically want an append(...) which can take multiple parameters some of which are slice... parameters, but I'm pretty sure that would be a huge pain to implement. But it would certainly make for a much clearer insert:

s = append(s[:i], newValue, s[i:]...)

@martisch
Copy link
Contributor

martisch commented Sep 6, 2020

Similar to @ianlancetaylor I think the slice capacities that append produces should not be relied on to be consistent between call sites or versions.

I think the difference in result capacities is due to runtime growslice taking the starting capacity and length into account which is different between the two versions:

newcap := old.cap

So given different starting parameters I dont think these two versions need to come up with the same resulting capacities.

@seebs
Copy link
Contributor Author

seebs commented Sep 6, 2020

I think what surprised me was that both the length and the capacity appeared to be relevant, and now I think I know why:

if old.len < 1024 {

		if old.len < 1024 {
			newcap = doublecap

My vague intuition is that there's a mismatch here -- we're otherwise looking exclusively at the old and new capacity to determine what to do, but in this one case, we look at the previous length instead of the previous capacity. And that is why, when the total slice length is fairly close to 1024, there's a good chance that a random index to insert at produces the doubling behavior instead of the *1.25 behavior.

@zigo101
Copy link

zigo101 commented Sep 6, 2020

A little off topic. I always expect a better (imo) behavior of the append function. It would be great to get a slice with the same capacity as the source slice in append(src[:0:0], src..), so that the append call will be a perfect clone operation. Now, it is not.

Edit: sorry, here, I mean "to get a result slice which capacity is the same as its length", a.k.a., without wasting.

@martisch
Copy link
Contributor

martisch commented Sep 6, 2020

If a "perfect" clone is wanted a make with an explicit cap and copy seems the right fit. Using append for that starting from a 0 cap slice seems like a "misuse" of append to me to avoid writing more than a one liner.

@zigo101
Copy link

zigo101 commented Sep 6, 2020

It is not more than one line, it more than 5 lines, when using copy to clone:

if a == nil {
	b = nil
} else {
	b = make([]T, len(a))
	copy(b, a)
}

Edit: sorry, In my last comment, I mean "to get a result slice which capacity is the same as its length".

@martisch
Copy link
Contributor

martisch commented Sep 6, 2020

If nil vs non-nil is important:

var b []T
if a != nil {
	b = make([]T, len(a))
	copy(b, a)
}

Which nicely conveys explicitly that the case of a == nil is special.
For append(src[:0:0], src...) that nil might be special is easily lost as its implicit.

For the more common case I have seen where only length matters (its often an API smell if slice == nil is handled differently then len(slice) == 0) these two lines are nicely explicit in the meaning and optimised to avoid the memclear in make:

b := make([]T, len(a))
copy(b, a)

Also can be more easily stack allocated if the len(a) is known constant where with append it will not.

@zigo101
Copy link

zigo101 commented Sep 6, 2020

Yes, it is still 4 lines more, in which a show 3 times, b show 3 times, and []T show 2 times, which is much more verbose than the one line solution.

@martisch
Copy link
Contributor

martisch commented Sep 6, 2020

I do not think appends focus should be to be a slice clone mechanism. I do not agree with the idea that because currently its possible to write a slice clone with append in one line and it isn't perfect append needs to change to make it perfect in one line. If a "perfect" and concise slice clone mechanism is needed lets explore how often the nil-ness of slice matters for the copy. I have not seen it come very often and make+copy seems concise enough to me. With generics it will be easy to implement it once in a library in a generic way without changing append.

Every additional constraint on the behaviour of append could make it more complex or more inefficient in the future when allocation infrastructure changes underneath.

For example it is possible that Go's compiler in the future may be smart enough to understand additional capacity might not be needed or more capacity might be needed because the append is used in a loop. The compiler should be free to optimize the capacity to benefit overall program performance. If the programmer wants to control the exact length and capacity then the builtin make can be used.

Returning the same length and capacity can be "wasteful" if not rounding up to the next allocation size class in the current implementation so I do not think requiring to have cap==len for appending slices to themselfs is a good constraint to have for general usage of appends.

append is currently the only (safe) way I know off to let the runtime choose what it thinks is best in terms of allocation size classes for slice allocation, taking that away by restricting append will mean there is no other mechanism left to let the programmer choose to have a allocation size class fitting capacity. Currently the programmer is free to choose between precise capacity with make or looser guarantees about capacity with append. Taking that choice away by restricting append to write less lines of code is I do not think a good tradeoff.

If slice cloning is an important enough topic it could be discussed in its own proposal.

Getting back to the original topic im not sure its worth to change the current behaviour. I think it is changed for reason of making appends behaviour not depend on source slice length it would be good to also understand more how it would impact existing programs performance.

@networkimprov
Copy link

networkimprov commented Sep 6, 2020

I also just assumed that these would return a slice with cap = len(src):

append([]T{}, src...)
append([]T(nil), src...)

EDIT: just tried []int on play.golang.org and found it rounds up cap to an even number, so wastes 1 half the time.
EDIT2: but with a struct, it never wastes any, except for very small structs.

@zigo101
Copy link

zigo101 commented Sep 6, 2020

@martisch
I preferred using append over make+copy to clone slices is because the former way is often MUCH more efficient than the latter way. But, the situation has changed since Go 1.15. Go 1.15 release notes don't mention that make+copy is specially optimized. Now, the make+copy is always (a little) more efficient than the append way.

However, I still enjoy the simplicity of append(s[:0:0], s...).

As you have mentioned, it is possible that gc could also optimize the append calls by not allocating unnecessary memory.

This is really off-topic too much and I really should create a new issue.

@seebs
Copy link
Contributor Author

seebs commented Sep 6, 2020

I would definitely not expect append like that to be a "perfect" clone -- I'd expect it to have append's normal behavior of possibly rounding up to a better size class or something.

I would sort of like a way to express things like "make-and-copy" that hints that it's unnecessary to zero the memory out if it's going to be copied over (execept when it's pointer-containing, in which case you need to zero it on alloc anyway for gc reasons), but it seems pretty minor.

@zigo101
Copy link

zigo101 commented Sep 6, 2020

execept when it's pointer-containing, in which case you need to zero it on alloc anyway for gc reasons

Why this exception? I don't see any differences between pointer-holder and non-pointer elements here. After all, all elements will be over-written in the copy call.

@seebs
Copy link
Contributor Author

seebs commented Sep 6, 2020

If you have an interval at all between the allocation and the copying, which you always do, it's vital that random uninitialized memory not be used for things with pointer-containing types, or the garbage collector will scan them and be scanning random values. This is true even within a single append operation.

Getting back to the original topic: I'm not sure whether it's worth changing the current behavior, but (1) I think it's probably not intentional behavior, given how weird it is that there's exactly one check of the old slice's current-length instead of its capacity involved, (2) it seems really weird to me that the behavior of append(s[:i+1], s[i:]...), which always grows s by one item, varies so much depending on where the insert is.

How it would affect real programs.... That's a fascinating question.

One indirect effect of the length dependency is that, for a given previous and desired capacity, the chances of doubling the previous capacity increase the more items you're appending, because the length is more likely to be under the threshold. But I don't think that's a general goal, because in the cases where the new cap is more than twice the current cap, none of the slice-growth algorithms are involved at all, we just round up to the next size class.

Reasoning about this is additionally complicated because it's dependent on slice-length, not size.

In the code base where I most commonly see insert operations performed, the impact is probably that right now a lot of inserts into slices are doubling in size when they would only be increasing by about 25% if the test were against previous cap rather than previous len. If they continue growing rapidly, that's possibly a performance advantage, but I think it also implies a lot of excess memory usage.

It might be reasonable for someone with access to reasonably stable build farms (that would be not-me, sorry) to just try it and run the usual variety of benchmarks against the compiler; it's a trivial one-line change, anyway. (s/old.len/old.cap/), so it's not a large development effort.

@randall77
Copy link
Contributor

I would sort of like a way to express things like "make-and-copy" that hints that it's unnecessary to zero the memory out if it's going to be copied over (execept when it's pointer-containing, in which case you need to zero it on alloc anyway for gc reasons), but it seems pretty minor.

That's CL 146719 which went in for 1.15.

@seebs
Copy link
Contributor Author

seebs commented Sep 6, 2020

Oh, heh. I suppose that's easy enough for the compiler to just detect it.

@martisch
Copy link
Contributor

martisch commented Sep 6, 2020

EDIT: just tried []int on play.golang.org and found it rounds up cap to an even number, so wastes 1 half the time.

Im not sure why the additional fields for the cap are considered wasted. Those bytes would be unusable otherwise as the Go gc allocator doesnt e.g. support just reserving three 64bit ints. The next size class that can be allocated are four 64 bit ints. So if append would cut that down to three three 64bit ints it would actually wasted one 64bit int that nothing can ever use until the slice is garbage collected. But making it four 64bit ints that one int can actually be used if one more element is appended. The context needed to understand the append behaviour is that the current Go gc allocator does not support allocating arbitrary sizes of memory for small sizes:
https://github.com/golang/go/blob/master/src/runtime/sizeclasses.go

@networkimprov
Copy link

Martin, thanks for the drill-down.

@go101 you wrote "to get a result slice which capacity is the same as its length" -- my quick experiment showed that's what it does, but padded for allocation policy. Do you have a counter-example?

@zigo101
Copy link

zigo101 commented Sep 6, 2020

@networkimprov

EDIT: just tried []int on play.golang.org and found it rounds up cap to an even number, so wastes 1 half the time.

Sometimes, it might waste 10%.

package main

func clone(s []int) []int {
	return append(s[:0:0], s...)
}

func main() {
	for i := range [130]struct{}{} {
		x := make([]int, i)
		println(i, cap(clone(x)) - i)
	}
}

@zigo101
Copy link

zigo101 commented Sep 6, 2020

BTW, I wrote a simpler example for OP's topic.

package main

import "fmt"

func insertA(s []int, k int, v int) []int {
	s = append(s, 0)
	copy(s[k:], s[k+1:])
	s[k] = v
	return s
}

func insertB(s []int, k int, v int) []int {
	s = append(s[:k+1], s[k:]...)
	s[k] = v
	return s
}

// https://github.com/golang/go/issues/41239
func main() {
	const n = 1024
	var x, y [n]int
	
	for _, k := range []int{0, 512, 1022, 1023} {
		a := insertA(x[:], k, 9)
		b := insertB(y[:], k, 9)
		fmt.Printf("%5d: %d - %d\n", k, cap(a), cap(b))
	}
	// iff k = 1023, cap(a) == cap(b) == 1280, otherwise, cap(b) == 2048
}

@networkimprov
Copy link

Ah, the minimum cap doubles as len(src) does. That's disappointing, but this works:

append([]int{}, src...)[:len(src):len(src)]

@zigo101
Copy link

zigo101 commented Sep 7, 2020

From another view to think OP's code, maybe we should use three-index slice form in the middle-insertion way.
For example, if the corresponding line in OP's code is changed to

b = append(b[:64+1:64+1], b[64:]...)

Then result capacities of the middle-insertion way become much smaller than the end-insertion way.

This is another reason why it would be great if simplified three-index slice forms are supported.

@zigo101
Copy link

zigo101 commented Sep 7, 2020

Maybe it is a mistake to use append to do the insertion job. Since Go 1.15, append should be only used when the number of to be inserted elements is hard to predict,

@seebs
Copy link
Contributor Author

seebs commented Sep 7, 2020

I don't think that makes any sense. It's usually the case that there may well someday be future insertions, so we don't have any reason to want to prevent the normal slice growth from happening. What seems weird to me is mostly just that the slice growth behavior for an insert in the middle of a slice varies so much with the index of the insertion.

@zigo101
Copy link

zigo101 commented Sep 7, 2020

varies so much with the index of the insertion.

It doesn't vary so much. length 1024 is the boundary point. As you have mentioned above, appending a lot to a small-length-but-large-capacity will get a revised-double-capacity slice.

On the contrary, appending a few to a large-length slice will get a revised-slightly-larger-capacity slice.

@randall77
Copy link
Contributor

I think there is something to fix here. I would expect that both of these appends should allocate the same amount:

func main() {
	const N = 1024
	var a [N]int
	println(cap(append(a[:N-1:N], 9, 9)))
	println(cap(append(a[:N:N], 9)))
}

Both overflow the same amount and need a growslice, and both started with the same capacity slice. The fact that one needed to fill in one available slot in the cap-len space first seems like it shouldn't matter.

This isn't to say that callers should depend on this behavior. But growth should ideally not depend on how elements were added (one at a time, two at a time, whatever) but just on how many were added. (With the exception that if you add so many in a single append that you would overflow what we would do for a single grow, then of course we have to grow more.)

@martisch martisch self-assigned this Sep 10, 2020
@martisch martisch added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Sep 10, 2020
@martisch
Copy link
Contributor

If we are going to change it I think go1.16 is a good time as adding a 24byte size class is about to change append capacity allocated already. Batching up these changes up will also batch up some Go code needing to be corrected to not assume capacities grown into by append are consistent between Go versions and we can communicate the changes and potential differences seen between Go versions for multiple CLs together.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/257338 mentions this issue: runtime: use old capacity to decide on append growth regime

@zigo101
Copy link

zigo101 commented Oct 6, 2020

Although it is not a must do, I think it would be great to make the reflect.Append be consistent with the non-reflection way.

(The following outputs are the same for 1.15 and tip):

package main

import "reflect"

const N = 1024

func main() {
	b := make([]int, N)
	b = reflect.Append(
		reflect.ValueOf(b[:N-1:N]),
		reflect.ValueOf(0),
		reflect.ValueOf(1),
	).Interface().([]int)
	println(len(b), cap(b)) // 1025 2048
	c := make([]int, N)
	c = reflect.AppendSlice(
		reflect.ValueOf(c[:N:N]),
		reflect.ValueOf([]int{1}),
	).Interface().([]int)
	println(len(c), cap(c)) // 1025 1280
}

@komuw
Copy link
Contributor

komuw commented Sep 8, 2021

@golang golang locked and limited conversation to collaborators Jun 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests

9 participants