-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: append growth sort of inconsistent and surprising #41239
Comments
Consistency of Above you wrote this, but did you really mean it? s := append(s[i+1:], s[i:])
s[i] = inserted Maybe I'm misreading it, but that seems to drop |
... That is at least the second time today I've put the colon on the wrong side of that expression. So, no, I didn't mean that. Fixed it. I see your point about the amortized linear performance, and after all, we can always manually make() things if we know the target size. Although I occasionally end up in a circumstance where I actually sort of care about the marginal cost of zeroing the memory out before copying the desired contents in. I also periodically want an append(...) which can take multiple parameters some of which are slice... parameters, but I'm pretty sure that would be a huge pain to implement. But it would certainly make for a much clearer insert:
|
Similar to @ianlancetaylor I think the slice capacities that append produces should not be relied on to be consistent between call sites or versions. I think the difference in result capacities is due to runtime growslice taking the starting capacity and length into account which is different between the two versions: Line 144 in c489330
So given different starting parameters I dont think these two versions need to come up with the same resulting capacities. |
I think what surprised me was that both the length and the capacity appeared to be relevant, and now I think I know why: Line 149 in c489330
My vague intuition is that there's a mismatch here -- we're otherwise looking exclusively at the old and new capacity to determine what to do, but in this one case, we look at the previous length instead of the previous capacity. And that is why, when the total slice length is fairly close to 1024, there's a good chance that a random index to insert at produces the doubling behavior instead of the *1.25 behavior. |
A little off topic. I always expect a better (imo) behavior of the Edit: sorry, here, I mean "to get a result slice which capacity is the same as its length", a.k.a., without wasting. |
If a "perfect" clone is wanted a make with an explicit cap and copy seems the right fit. Using append for that starting from a 0 cap slice seems like a "misuse" of append to me to avoid writing more than a one liner. |
It is not more than one line, it more than 5 lines, when using if a == nil {
b = nil
} else {
b = make([]T, len(a))
copy(b, a)
} Edit: sorry, In my last comment, I mean "to get a result slice which capacity is the same as its length". |
If nil vs non-nil is important:
Which nicely conveys explicitly that the case of a == nil is special. For the more common case I have seen where only length matters (its often an API smell if
Also can be more easily stack allocated if the len(a) is known constant where with append it will not. |
Yes, it is still 4 lines more, in which |
I do not think appends focus should be to be a slice clone mechanism. I do not agree with the idea that because currently its possible to write a slice clone with append in one line and it isn't perfect append needs to change to make it perfect in one line. If a "perfect" and concise slice clone mechanism is needed lets explore how often the nil-ness of slice matters for the copy. I have not seen it come very often and make+copy seems concise enough to me. With generics it will be easy to implement it once in a library in a generic way without changing append. Every additional constraint on the behaviour of append could make it more complex or more inefficient in the future when allocation infrastructure changes underneath. For example it is possible that Go's compiler in the future may be smart enough to understand additional capacity might not be needed or more capacity might be needed because the append is used in a loop. The compiler should be free to optimize the capacity to benefit overall program performance. If the programmer wants to control the exact length and capacity then the builtin make can be used. Returning the same length and capacity can be "wasteful" if not rounding up to the next allocation size class in the current implementation so I do not think requiring to have cap==len for appending slices to themselfs is a good constraint to have for general usage of appends. append is currently the only (safe) way I know off to let the runtime choose what it thinks is best in terms of allocation size classes for slice allocation, taking that away by restricting append will mean there is no other mechanism left to let the programmer choose to have a allocation size class fitting capacity. Currently the programmer is free to choose between precise capacity with make or looser guarantees about capacity with append. Taking that choice away by restricting append to write less lines of code is I do not think a good tradeoff. If slice cloning is an important enough topic it could be discussed in its own proposal. Getting back to the original topic im not sure its worth to change the current behaviour. I think it is changed for reason of making appends behaviour not depend on source slice length it would be good to also understand more how it would impact existing programs performance. |
I also just assumed that these would return a slice with cap =
EDIT: just tried []int on play.golang.org and found it rounds up cap to an even number, so wastes 1 half the time. |
@martisch However, I still enjoy the simplicity of As you have mentioned, it is possible that gc could also optimize the This is really off-topic too much and I really should create a new issue. |
I would definitely not expect append like that to be a "perfect" clone -- I'd expect it to have append's normal behavior of possibly rounding up to a better size class or something. I would sort of like a way to express things like "make-and-copy" that hints that it's unnecessary to zero the memory out if it's going to be copied over (execept when it's pointer-containing, in which case you need to zero it on alloc anyway for gc reasons), but it seems pretty minor. |
Why this exception? I don't see any differences between pointer-holder and non-pointer elements here. After all, all elements will be over-written in the |
If you have an interval at all between the allocation and the copying, which you always do, it's vital that random uninitialized memory not be used for things with pointer-containing types, or the garbage collector will scan them and be scanning random values. This is true even within a single append operation. Getting back to the original topic: I'm not sure whether it's worth changing the current behavior, but (1) I think it's probably not intentional behavior, given how weird it is that there's exactly one check of the old slice's current-length instead of its capacity involved, (2) it seems really weird to me that the behavior of How it would affect real programs.... That's a fascinating question. One indirect effect of the length dependency is that, for a given previous and desired capacity, the chances of doubling the previous capacity increase the more items you're appending, because the length is more likely to be under the threshold. But I don't think that's a general goal, because in the cases where the new cap is more than twice the current cap, none of the slice-growth algorithms are involved at all, we just round up to the next size class. Reasoning about this is additionally complicated because it's dependent on slice-length, not size. In the code base where I most commonly see insert operations performed, the impact is probably that right now a lot of inserts into slices are doubling in size when they would only be increasing by about 25% if the test were against previous cap rather than previous len. If they continue growing rapidly, that's possibly a performance advantage, but I think it also implies a lot of excess memory usage. It might be reasonable for someone with access to reasonably stable build farms (that would be not-me, sorry) to just try it and run the usual variety of benchmarks against the compiler; it's a trivial one-line change, anyway. (s/old.len/old.cap/), so it's not a large development effort. |
That's CL 146719 which went in for 1.15. |
Oh, heh. I suppose that's easy enough for the compiler to just detect it. |
Im not sure why the additional fields for the cap are considered wasted. Those bytes would be unusable otherwise as the Go gc allocator doesnt e.g. support just reserving three 64bit ints. The next size class that can be allocated are four 64 bit ints. So if append would cut that down to three three 64bit ints it would actually wasted one 64bit int that nothing can ever use until the slice is garbage collected. But making it four 64bit ints that one int can actually be used if one more element is appended. The context needed to understand the append behaviour is that the current Go gc allocator does not support allocating arbitrary sizes of memory for small sizes: |
Martin, thanks for the drill-down. @go101 you wrote "to get a result slice which capacity is the same as its length" -- my quick experiment showed that's what it does, but padded for allocation policy. Do you have a counter-example? |
Sometimes, it might waste 10%. package main
func clone(s []int) []int {
return append(s[:0:0], s...)
}
func main() {
for i := range [130]struct{}{} {
x := make([]int, i)
println(i, cap(clone(x)) - i)
}
} |
BTW, I wrote a simpler example for OP's topic. package main
import "fmt"
func insertA(s []int, k int, v int) []int {
s = append(s, 0)
copy(s[k:], s[k+1:])
s[k] = v
return s
}
func insertB(s []int, k int, v int) []int {
s = append(s[:k+1], s[k:]...)
s[k] = v
return s
}
// https://github.com/golang/go/issues/41239
func main() {
const n = 1024
var x, y [n]int
for _, k := range []int{0, 512, 1022, 1023} {
a := insertA(x[:], k, 9)
b := insertB(y[:], k, 9)
fmt.Printf("%5d: %d - %d\n", k, cap(a), cap(b))
}
// iff k = 1023, cap(a) == cap(b) == 1280, otherwise, cap(b) == 2048
} |
Ah, the minimum cap doubles as
|
From another view to think OP's code, maybe we should use three-index slice form in the middle-insertion way. b = append(b[:64+1:64+1], b[64:]...) Then result capacities of the middle-insertion way become much smaller than the end-insertion way. This is another reason why it would be great if simplified three-index slice forms are supported. |
Maybe it is a mistake to use |
I don't think that makes any sense. It's usually the case that there may well someday be future insertions, so we don't have any reason to want to prevent the normal slice growth from happening. What seems weird to me is mostly just that the slice growth behavior for an insert in the middle of a slice varies so much with the index of the insertion. |
It doesn't vary so much. length 1024 is the boundary point. As you have mentioned above, appending a lot to a small-length-but-large-capacity will get a revised-double-capacity slice. On the contrary, appending a few to a large-length slice will get a revised-slightly-larger-capacity slice. |
I think there is something to fix here. I would expect that both of these appends should allocate the same amount:
Both overflow the same amount and need a growslice, and both started with the same capacity slice. The fact that one needed to fill in one available slot in the cap-len space first seems like it shouldn't matter. This isn't to say that callers should depend on this behavior. But growth should ideally not depend on how elements were added (one at a time, two at a time, whatever) but just on how many were added. (With the exception that if you add so many in a single append that you would overflow what we would do for a single grow, then of course we have to grow more.) |
If we are going to change it I think go1.16 is a good time as adding a 24byte size class is about to change append capacity allocated already. Batching up these changes up will also batch up some Go code needing to be corrected to not assume capacities grown into by append are consistent between Go versions and we can communicate the changes and potential differences seen between Go versions for multiple CLs together. |
Change https://golang.org/cl/257338 mentions this issue: |
Although it is not a must do, I think it would be great to make the (The following outputs are the same for 1.15 and tip): package main
import "reflect"
const N = 1024
func main() {
b := make([]int, N)
b = reflect.Append(
reflect.ValueOf(b[:N-1:N]),
reflect.ValueOf(0),
reflect.ValueOf(1),
).Interface().([]int)
println(len(b), cap(b)) // 1025 2048
c := make([]int, N)
c = reflect.AppendSlice(
reflect.ValueOf(c[:N:N]),
reflect.ValueOf([]int{1}),
).Interface().([]int)
println(len(c), cap(c)) // 1025 1280
} |
What version of Go are you using (
go version
)?1.15
Does this issue reproduce with the latest release?
y
What operating system and processor architecture are you using (
go env
)?Any, but also playground.
What did you do?
This actually grew out of a different question. Consider the fairly common slice-insert idiom:
Assume i is less than len(s). You could also do:
I thought this might be faster because it sort of skips one copy of the s[i+1:] part of the slice -- it can in theory copy each half only once. In fact, it was dramatically slower, and allocated more memory, which surprised me. Further investigation led to:
https://play.golang.org/p/cR51ks72emv
What did you expect to see?
I don't know. I think I expected the behavior of slice growth on append to be the same regardless of how I achieved the outcome of "increase total length by 1".
What did you see instead?
Two different things, both somewhat surprising to me. The first is that, starting at N=1024, appending two slices produces wildly different resulting capacities than appending a couple of items to a slice. (Curiously, appending two items explicitly, like "append(s, 0, 1)" doesn't produce the different behavior, but appending
b[size-1:]...
does produce the single-item-append behavior.) Specifically, appending a single item grows the slice by 25%, but appending "a slice" does something much fancier. It will at least double the capacity of the first parameter to append(), but if you specify a cap on that parameter, it will then pick a size that's some amount larger than the total size it needs, but not necessarily as large. It might even be smaller than the size picked when appending a single item to the existing slice.The second is that, for N=1023 exactly, the single-item append always goes straight to 2048. So, appending a single item to a 1023-item slice produces a 2048 capacity, but appending a single item to a 1024-or-larger item slice almost always produces a smaller capacity. This seems like a weird special case, and might be a bug in the sense of "a check that is supposed to do something specific is doing something else specific". (Or it might just be "grow by doubling until 1024, then grow by 25%, I guess?) But batch-append doesn't have that behavior. So, for N from 850 to 1023, batch-append grows the slice more slowly (and thus, allocates less memory and has to do less initialization in this particular use case), but for N from 1024 on up, batch-append tends to grow the slice faster (and thus has to do more extra work). It feels like they should be roughly comparable, probably?
I am not sure what the right outcome is. I think what I'd ideally like would be for the short-cut append to have the same semantics (when it's valid at all) including the amount by which capacity grows.
The "append instead of appending and copying" behavior actually seems moderately rewarding (a few percent faster, sometimes), but it becomes more valuable as N increases, but currently, it's only actually better when N is large and the index is small so the array isn't accidentally doubling in size. If they used the same growth sizing logic, I think it would be a potentially significant win for some use cases.
(Thanks to josharian for pointing out that the capacity of the first slice would be a significant factor.)
The text was updated successfully, but these errors were encountered: