Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: allow 3-index slices to elide the second (length) term #38081

Closed
bcmills opened this issue Mar 26, 2020 · 25 comments
Closed

proposal: Go 2: allow 3-index slices to elide the second (length) term #38081

bcmills opened this issue Mar 26, 2020 · 25 comments
Labels
FrozenDueToAge LanguageChange Suggested changes to the Go language Proposal Proposal-FinalCommentPeriod v2 An incompatible library change
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Mar 26, 2020

I propose to allow the second index of 3-index slice expressions to be elided, with the default value “min(original length, new capacity)”.

This is based on the observation that the 3-index slice form is almost always used with the new length set to either 0 or the same expression as the new capacity, and the new capacity set to either exactly the original length or value known not to exceed it (such as in numerous fixes for #34972).

The specific default is chosen for the following properties:

  • Unlike defaulting to the the original length alone, it is guaranteed not to panic for the expression a[::k] for any kcap(a).
  • Unlike defaulting to the new capacity alone, it does not preclude a future change from establishing the intuitive identity x[::] ≡ x.

Template answers

  • Would you consider yourself a novice, intermediate, or experienced Go programmer?

Experienced.

  • What other languages do you have experience with?

C, C++, Standard ML, OCaml, Rust, JavaScript, Python, Bash, π-calculus, probably a few others I've missed.

  • Would this change make Go easier or harder to learn, and why?

Could go either way: 3-index slices are already relatively obscure, but this proposal arguably makes them more closely match the intuitive behavior.

  • Has this idea, or one like it, been proposed before?

This form was discussed in the original design doc, but explicitly deferred for later discussion. Now that we have released 12 intervening Go versions, I believe we have enough experience to continue that discussion.

It is not clear to me whether this point of the design has been revisited since Go 1.2. (If so, the original issue #1642 has no cross-references to that discussion.)

  • Who does this proposal help, and why?

Anyone who uses append and doesn't enjoy debugging aliasing bugs.

This proposal makes it easier (less verbose) to append to a slice that is never otherwise mutated, without risking overlapping mutations to the portion of the slice beyond its length.

  • What is the proposed change?

Allow the second (high) index of a 3-index “full slice expression” to be elided.

Specifically, for an expression of the form a[low : high : max], if the high term is elided, it should default to min(len(a), max).

This corresponds to cases 3 and 6 in the original design doc. However, this proposal does not allow the third element to be elided: we do not need to choose between those two cases at this time.

- Only the first index may be omitted; it defaults to 0.
+ The first and/or second index may be omitted.
+ The first index defaults to 0.
+ The second index defaults to either `max` or the length of the sliced operand, whichever is smaller.
  • Please also describe the change informally, as in a class teaching Go.

If the middle index of a 3-index slice expression is omitted, it defaults to either the length of the original slice or the new capacity, whichever is smaller.

  • Is this change backward compatible?

Yes. It is explicitly called out as a possibility in the Go 1.2 design doc, and I don't believe any change in the interim has rendered it impossible.

  • Show example code before and after the change.

This example is from real code (in #38077), fixing an aliasing bug.

Unfixed:

cmd.Env = append(cfg.OrigEnv, g.env...)

Fixed with the language as defined in Go 1.14:

	cmd.Env = append(cfg.OrigEnv[:len(cfg.OrigEnv):len(cfg.OrigEnv)], g.env...)

Under this proposal:

	cmd.Env = append(cfg.OrigEnv[::len(cfg.OrigEnv)], g.env...)
  • What is the cost of this proposal? (Every language change has a cost).
    • How many tools (such as vet, gopls, gofmt, goimports, etc.) would be affected?

The compiler and static analysis tools would need to be upgraded to accept the new form.
I believe the change to the compiler would be minimal.

go/ast.SliceExpr already allows a nil Expr in the High position, so other syntax-based tools would be unaffected unless they assume SliceExpr invariants beyond what is promised in the documentation.

  • What is the compile time cost?

Negligible.

  • What is the run time cost?

None: an equivalent 3-index slice can already be written today.

  • Can you describe a possible implementation?

Yes: assign the 3rd index to a temporary, then use that temporary (and a trivial min calculation) for both the 2nd and 3rd indices.

  • Do you have a prototype? (This is not required.)

No.

  • How would the language spec change?

Objection: asked and answered.

  • Orthogonality: how does this change interact or overlap with existing features?

It makes 3-index slices more consistent with 2-index slices, for which the high term may already be elided.

  • Is the goal of this change a performance improvement?

No.

  • Does this affect error handling?

No.

  • Is this about generics?

No.

@bcmills bcmills added LanguageChange Suggested changes to the Go language v2 An incompatible library change labels Mar 26, 2020
@gopherbot gopherbot added this to the Proposal milestone Mar 26, 2020
@ianlancetaylor
Copy link
Member

I think question here is: what do people intuitively expect when the second operand is omitted? I think a decent argument could be made that people intuitively expect that if the second operand is omitted that the length remains unchanged. I think it's a little harder to argue for the min expression. But maybe that is OK.

(Another possible default would be that omitting the second expression means that the length is unchanged, and if the capacity becomes smaller than the length then the expression panics at run time. The most important goal is not "omit the second expression as often as possible." I think the most important goal is "don't surprise the reader.")

@bcmills
Copy link
Contributor Author

bcmills commented Mar 26, 2020

I think a decent argument could be made that people intuitively expect that if the second operand is omitted that the length remains unchanged.

Indeed. You can think of the “min” part as meaning: “leave the length unchanged unless it would exceed the new capacity, in which case keep it as close to the original as possible”.

The alternative is to panic if the original length exceeds the new capacity, but panicking is arguably never the expected behavior — that's why it is a panic instead of an error return or a , ok — whereas slicing down to the new capacity is the natural and expected behavior for the (*[veryLarge]T)(p)[:actualLength:actualLength] expressions in #34972.

@zigo101
Copy link

zigo101 commented Mar 26, 2020

A similar proposal by defaulting the max index as the high index: #25638

@bcmills
Copy link
Contributor Author

bcmills commented Mar 26, 2020

In a cursory search of the x repos, I found only three examples for which the second element was not identical to the third.

One such example supports the “min” behavior over “default to the new cap”:

func (s *asmState) clone() *asmState {
	c := *s
	c.buf = c.storage[:len(s.buf):cap(s.buf)]
	return &c
}

The other two use values that are not plausible defaults:

	{a[:10], a[:10:20], true, false},
	{a[:10], a[5:10:20], true, true},
// resetBuf points buf at storage, sets the length to 0 and sets cap to be a
// multiple of the rate.
func (s *asmState) resetBuf() {
	max := (cap(s.storage) / s.rate) * s.rate
	s.buf = s.storage[:0:max]
}

The vast majority of usage sites don't distinguish between “min” and “new cap” at all, because they use the same expression for both terms. However, they clearly support that “old length” alone is not a broadly-useful default.

There are a few that slice down to a capacity (and equal size) obtained elsewhere:

			n, err := req.Body.Read(buf)
			if n > 0 {
				s.buf.put(&recvMsg{data: buf[:n:n]})
				buf = buf[n:]
			}
	// Allocate only once. Note that both dst and src escape when passed to
	// Transform.
	buf := [2 * initialBufSize]byte{}
	dst := buf[:initialBufSize:initialBufSize]
	src := buf[initialBufSize : 2*initialBufSize]
	args := (*[(math.MaxInt32 - 1) / unsafe.Sizeof((*C.sqlite3_value)(nil))]*C.sqlite3_value)(unsafe.Pointer(argv))[:int(argc):int(argc)]

A few slice both the size and cap down to the original length, in which case the distinction between “new cap” and “original length” is irrelevant:

// pad pads the code sequence with pops.
func pad(enc []byte) []byte {
	if len(enc) < 4 {
		enc = append(enc[:len(enc):len(enc)], zeros[:4-len(enc)]...)
	}
	return enc
}
	ckey := append(key[:len(key):len(key)], 0)

And many slice an arbitrarily large unsafe slice down to a specific capacity, although #19367 (comment) is arguably even better for these cases:

// AllGroups returns a slice that can be used to iterate over the groups in g.
func (g *Tokengroups) AllGroups() []SIDAndAttributes {
	return (*[(1 << 28) - 1]SIDAndAttributes)(unsafe.Pointer(&g.Groups[0]))[:g.GroupCount:g.GroupCount]
}
	u := (*[1 << 29]uint16)(unsafe.Pointer(&data[0]))[: len(data)/2 : len(data)/2]
	buf := (*[1 << 29]byte)(unsafe.Pointer(&v[0]))[: len(v)*2 : len(v)*2]
	n := len(buf)
	bw := (*[maxRate / 8]uint64)(unsafe.Pointer(&buf[0]))[: n/8 : n/8]

@ianlancetaylor
Copy link
Member

To me the issue here is not what is the most useful or most commonly used default. It is: what will people intuitively assume if the length is missing?

@bcmills
Copy link
Contributor Author

bcmills commented Mar 26, 2020

I think those two judgments are closely connected: people will tend to intuitively assume whatever behavior is most convenient to their use-case, because they are primed to think about that use-case and not other possible behaviors.

@ianlancetaylor
Copy link
Member

I think that to some extent you may be talking about the writer of the code, but I'm trying to talk about the reader of the code.

@bcmills
Copy link
Contributor Author

bcmills commented Mar 27, 2020

I'm talking about both: neither the reader nor the writer of the code will expect a slice expression to panic when all of the non-elided indices are valid.

They will not expect a panic in part because the corresponding 2-index slice does not panic when either index is elided, provided that the other index is valid.

And, they will not expect a panic in part because they will observe that the code does not panic when trivial tests are run. (For the “cut cap down to length” use-cases, the distinction between the old length and new cap is meaningless, and for the “cut a buffer down to a computed size” use-cases, the code would panic under even the most trivial usage if the “min” operation were not applied.)

@bradfitz
Copy link
Contributor

What is the run time cost?
None: an equivalent 3-index slice can already be written today.

There's some, right? We can't always statically prove which is min so that would need to happen at runtime? But perhaps that no different than the panicindex checks today.

@ianlancetaylor
Copy link
Member

If in most examples the second operand of the slice expression is identical to the third operand of the slice expression, then perhaps the appropriate default for a missing second operand is simply the third operand. That is slightly simpler than the suggested min expression.

A separate consideration is how easy it would be to miss the extra colon.

    // These mean different things:
    a = s[:10]
    s = s[::10]

@alanfo
Copy link

alanfo commented Apr 1, 2020

I agree with @ianlancetaylor that, if the second operand of the 3-index slice were omitted, then the most intuitive default would be the third operand rather than what is being proposed here.

However, I have my doubts about whether the proposal is worth doing at all since in my experience 3-index slices are not very common and, when encountering them, people may have to look up what they mean anyway.

@jimmyfrasche
Copy link
Member

Instead of defining s[::k], I think it would be more useful and less surprising to define s[:n:] as shorthand for s[:n:n] which defines s[::] naturally.

@bcmills
Copy link
Contributor Author

bcmills commented Apr 1, 2020

@ianlancetaylor

If in most examples the second operand of the slice expression is identical to the third operand of the slice expression, then perhaps the appropriate default for a missing second operand is simply the third operand. That is slightly simpler than the suggested min expression.

That is indeed slightly simpler if we consider eliding only the second index but requiring the third index in perpetuity.

However, if we also consider the potential future meanings for an elided third index, that default becomes more confusing.

Arguably

	a = s[:]

and

	a = s[::]

should mean the same thing (“s as a slice”, even if s is a pointer to an array). If the second argument instead defaults to the third, then s[::] would mean “s as a slice with capacity truncated to len(s)” instead of the arguably more intuitive “s as a slice”.

@jimmyfrasche, defaulting the third argument to the second would have a similar problem: if the third index defaults to the second, then s[::] would presumably mean s[:len(s):len(s)], which may have a smaller capacity that s.

@bcmills
Copy link
Contributor Author

bcmills commented Apr 1, 2020

@bradfitz

There's some, right? We can't always statically prove which is min so that would need to happen at runtime? But perhaps that no different than the panicindex checks today.

The static analysis is trivial for:

  • a slice of a pointer-to-array of constant length
  • a subslice of a slice whose original length is known to be equal to its original capacity
  • a slice whose new capacity is equal to the length of the original slice.

Those three cases cover all of the examples I have observed that use any plausible default for the second index. So, I suppose there is a theoretical run-time cost, but it is nearly always zero in practice.

@ianlancetaylor
Copy link
Member

I don't understand why anyone would ever use a three index slice expression and leave the capacity unspecified. The whole point of a three index slice expression is to change the capacity. In a two index slice expressions there are good reasons to omit the first operand, and good reasons to omit the second operands, and so s[:] happens to fall out as identical to s. But in the absence of a good reason to omit the third operand of a three index slice expression, it's not clear to me that we need s[::] to be identical to s, because it's not clear to me that we need to permit s[::] at all.

@jimmyfrasche
Copy link
Member

@bcmills Yes, s[::] would be the same as s[:len(s):len(s)] and s[::] would mean "s as a slice, with no additional capacity". That seems straightforward to me and it's what most of the code wants to do.

@bcmills
Copy link
Contributor Author

bcmills commented Apr 1, 2020

Ok, I think I buy that argument:

  • (from @ianlancetaylor) the three-argument slice expression is only useful at all when the user wants to change the capacity of a slice, and
  • (from @jimmyfrasche) the only logical default that they would want to change it to is a length, either that the original slice or that of the new one. And then if we default the new length to the original length, then those two alternatives collapse into one, and we have a natural default.

That would indeed make my motivating example even more concise. With only the trailing index elided:

	cmd.Env = append(cfg.OrigEnv[:len(cfg.OrigEnv):], g.env...)

and with both elided:

	cmd.Env = append(cfg.OrigEnv[::], g.env...)

So, should we have @jimmyfrasche file a new proposal, or repurpose this one?

@jimmyfrasche
Copy link
Member

The currently closed #25638 was posted earlier and proposes the same elision. We could re-open that thread.

@ianlancetaylor
Copy link
Member

I'm still concerned by the comment that closed #25638: I don't think that it's obvious what it means when the third operand of a three-index slice expression is omitted.

We can reason our way to a couple of different possibilities. We can say that an omitted capacity gives us an unchanged capacity. Or we can say that an omitted capacity gives us the length (if specified). We can say that an omitted length, with a specified capacity, gives us the maximal value, which is the new capacity. Or we can say that an omitted length gives us the current length (which might panic if larger than the specified capacity).

With the two index slice expression there really isn't any confusion: omitting the value leaves it unchanged. We could use that rule for a three index slice expression but it isn't that useful, since it's not what people actually want from a three index slice expression.

So it's not clear that there is any answer here that is both intuitive and useful. Which is why we didn't specify a default in the first place.

If we have generics, we could write a function that does what you want.

func SetCap(type Elem)(s []Elem, newcap int) []Elem {
    return s[:newcap:newcap]
}

@jimmyfrasche
Copy link
Member

I think s[::] meaning s[0:len(s):cap(s)] is the most intuitive but the least useful (to the point where it almost wraps back around to counterintuitive). It makes sense but doesn't add anything.

s[::] meaning s[0:len(s):len(s)] is the second most intuitive, imo, and the demonstrably most useful. It makes sense if you think of it as s[:] without any extra capacity. The 3-index slice is for controlling the cap and in the vast majority of cases the only reason you would do that is to get rid of it, so that tracks. It doesn't add anything other than convenience, but the current form is awkward in the common case and the common case is by far the most common. If anything, it should be more common since the most common bug from capacity I've seen is appending to s instead of s[:len(s):len(s)].

Both would require explanation because the third index in Go is distinct from other languages, but both of the length-dependent definitions are reasonable.

All of the capacity-dependent definitions, including the SetCap func you wrote above, are unintuitive to me and all come with strange caveats that seem like they'd require more code to guard against unintended behavior than is saved from being less explicit with the indexes. They don't seem especially useful since the majority of the cases are to set the cap to the len and the few that aren't are handled well enough by the current syntax and are all cases where being explicit is better than being concise.

Maybe there's nothing to do here, but, if not and there were generics, what I'd write is

func Decap(type Elem)(s []Elem) []Elem {
  return s[:len(s):len(s)]
}

@IanTayler
Copy link

IanTayler commented Apr 14, 2020

If I were reading a three-index slice in other people's code (especially in a code review) I would much rather see both the second and third indices as explicit. It draws attention to the fact that an important memory-aliasing-related operation is taking place, and it makes it clear what the exact intention is.

Who does this proposal help, and why?

Anyone who uses append and doesn't enjoy debugging aliasing bugs.

This proposal makes it easier (less verbose) to append to a slice that is never otherwise mutated, without risking overlapping mutations to the portion of the slice beyond its length.

To me it's clear that if I'm debugging an aliasing bug, I'd rather have cap and len explicitly written out in every three-index slice expression. I'm not sure making the decap operation slightly more concise to write is going to save me from debugging aliasing bugs in the future. On the contrary, making the code-writer explicitly write len and cap might give them a chance to realize they're making a mistake.

I see nothing wrong with a generic Decap function for slices, though.

@bcmills
Copy link
Contributor Author

bcmills commented Apr 14, 2020

@IanTayler, when you take human factors into account, the choice is not just between explicit indices and elided indices. The choice today is among all of:

  • 3-index slices with correct explicit lengths and capacities
  • 3-index slices with incorrect explicit lengths and capacities
  • 2-index slices with the third index needed but forgotten
  • 2-index slices for which the third index provides no added value
  • 2-index slices with the third index intentionally omitted because author deems the increase in safety for the 3-index slice less important than the increase in verbosity

The point of this proposal is to reduce the rate of incidence of

  • “3-index with incorrect explicit lengths and capacities” (by providing defaults that are usually correct),
  • “2-index slices intentionally chosen due to the verbosity of 3-index slice expressions” (by making 3-index slices less verbose), and
  • “third index needed but forgotten” (by making 3-index slices more common by virtue of being less verbose).

The rate of incidence of correct explicit 3-index slices vs. correct elided 3-index slices seems like — at best — a secondary concern. (The verbosity of 3-index slices today provides a strong incentive for incorrect 2-index slices over correct-but-explicit 3-index slices, and 2-index slices can already elide both of the indices.)

@ianlancetaylor
Copy link
Member

Per the discussion above, we just can't convince ourselves that there is any intuitive meaning for an omitted second operand in a three-element slice expression. It seems reasonable that it means "the length is unchanged," and it seems reasonable that it means "set the length to the (specified) capacity."

Let's see if generic functions like slices.SetCap or slices.DeCap can address this issue.

Therefore, this is a likely decline. Leaving open for four weeks for final comments.

-- for @golang/proposal-review

@ianlancetaylor
Copy link
Member

No further comments.

@zigo101
Copy link

zigo101 commented Oct 18, 2020

Just FYI, I have made two surveys on this topic:

The results are in line with anticipations.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge LanguageChange Suggested changes to the Go language Proposal Proposal-FinalCommentPeriod v2 An incompatible library change
Projects
None yet
Development

No branches or pull requests

8 participants