Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contextutil: add a Context with richer cancellation #66884

Closed
wants to merge 1 commit into from

Conversation

tbg
Copy link
Member

@tbg tbg commented Jun 25, 2021

When a context gets canceled, by contract of context.Context we can
only ever return the opaque errors context.{Canceled,DeadlineExceeded}
from ctx.Err(). This is not helpful and we often struggle with
locating the source of a context cancellation.

The proximate motivation at hand is exposing the caller of cancel(),
which this commit allows:

func TestWithCallerCancel(t *testing.T) {
	ctx, cancel := WithCallerCancel(context.Background())
	cancel()
	err := ExtErr(ctx)
	require.NotNil(t, err)
	_, _, fn, ok := errors.GetOneLineSource(err)
	require.True(t, ok)
	require.Equal(t, "TestWithCallerCancel", fn, "%+v", err)
}

Release note: None

@tbg tbg requested a review from andreimatei June 25, 2021 15:43
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)


pkg/util/contextutil/cancel.go, line 26 at r1 (raw file):

// is exposed via the ExtCanceled and ExtErr methods.
func WithExtCancel(
	parent context.Context, hook func(...interface{}) interface{},

I'm a bit confused about the arguments that hook takes; nobody seems to be using them. Can you add a func WithCancelReason() (_ ctx, cancel func(reason string)) to the patch? I think seeing what that looks like would help me form an opinion about whether they're needed, or whether we're better off


pkg/util/contextutil/cancel.go, line 46 at r1 (raw file):

// `errors.WithStack` from ExtErr, where the stack identifies the caller
// of the cancel function.
func WithCallerCancel(parent context.Context) (context.Context, func(...interface{})) {

I'm confused about the return type. Why isn't it func()?


pkg/util/contextutil/cancel.go, line 56 at r1 (raw file):

// extension of `(Context).Err` that is allowed to return arbitrary values.
// Will fall back to `ctx.Err()` on nil returns.
func ExtCanceled(ctx context.Context) interface{} {

What did you have in mind for this function? It doesn't seem useful to me yet with only what you have in this patch; ExtErr seems sufficient.


pkg/util/contextutil/cancel.go, line 79 at r1 (raw file):

	ext struct {
		sync.Mutex
		canceled bool

I think some comments around the possible states of this context would be really good. I think a crucial element (if I understand correctly how this works) is that extCancelCtx.Err() can return non-nil even if canceled is not set. And this is good, because it lets ExtErr inherit the error from a parent ctx.
But I think this brings me to a short-coming, if I'm not mistaking: if I have ctx1<-ctx2, and I do cancel1(); cancel2(), then ExtErr(ctx2) will refer to the cancel2. But I think we want it to refer to the cancellation of ctx1, because that's the original one. Like, if I write ``cancel1(); ExtErr(ctx2); cancel2(); ExtErr(ctx2), I think we don't want the results of ExtErr(ctx2)` to change in between the two calls. Given how you've written it, I think it does change? To change this, I think you want to expand the check in `if !ctx.ext.canceled {`, in the returned cancel func for `WithExtCancel`.


pkg/util/contextutil/cancel_test.go, line 53 at r1 (raw file):

		name   string
		do     func(*testing.T, ctxs)
		h1, h2 func(...interface{}) interface{}

no test sets h1. Can it be taken out of this struct?


pkg/util/contextutil/cancel_test.go, line 106 at r1 (raw file):

		}, h2: func(args ...interface{}) interface{} { return fmt.Sprint(args...) }},
	}
	for _, tt := range tests {

what's tt? I think tc (for TestCase) is usual

@tbg tbg force-pushed the exterr branch 3 times, most recently from 05b6ce1 to 6fb1f4f Compare July 1, 2021 07:30
@tbg tbg marked this pull request as ready for review July 1, 2021 07:54
@tbg tbg requested a review from andreimatei July 1, 2021 07:55
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFAL @andreimatei. This PR should also introduce linting & switch all 77 relevant context.WithCancel calls over, but I want you to like what's here first.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)


pkg/util/contextutil/cancel.go, line 26 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'm a bit confused about the arguments that hook takes; nobody seems to be using them. Can you add a func WithCancelReason() (_ ctx, cancel func(reason string)) to the patch? I think seeing what that looks like would help me form an opinion about whether they're needed, or whether we're better off

The public API of this package should now be clearer.


pkg/util/contextutil/cancel.go, line 46 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'm confused about the return type. Why isn't it func()?

Discussed offline: because this allows the caller to provide information about the reason of the cancellation.


pkg/util/contextutil/cancel.go, line 56 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

What did you have in mind for this function? It doesn't seem useful to me yet with only what you have in this patch; ExtErr seems sufficient.

Over-engineered generality - unexported now.


pkg/util/contextutil/cancel.go, line 79 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think some comments around the possible states of this context would be really good. I think a crucial element (if I understand correctly how this works) is that extCancelCtx.Err() can return non-nil even if canceled is not set. And this is good, because it lets ExtErr inherit the error from a parent ctx.
But I think this brings me to a short-coming, if I'm not mistaking: if I have ctx1<-ctx2, and I do cancel1(); cancel2(), then ExtErr(ctx2) will refer to the cancel2. But I think we want it to refer to the cancellation of ctx1, because that's the original one. Like, if I write ``cancel1(); ExtErr(ctx2); cancel2(); ExtErr(ctx2), I think we don't want the results of ExtErr(ctx2)` to change in between the two calls. Given how you've written it, I think it does change? To change this, I think you want to expand the check in `if !ctx.ext.canceled {`, in the returned cancel func for `WithExtCancel`.

Done. (By means of simplifying a lot, but also we're now looking for the top-most canceled context to generate the rich error)


pkg/util/contextutil/cancel_test.go, line 53 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

no test sets h1. Can it be taken out of this struct?

Done.


pkg/util/contextutil/cancel_test.go, line 106 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

what's tt? I think tc (for TestCase) is usual

This is what Goland's "generate test" gives you, so I'm inclined to leave it.

When a context gets canceled, by contract of `context.Context` we can
only ever return the opaque errors `context.{Canceled,DeadlineExceeded}`
from `ctx.Err()`. This is not helpful and we often struggle with
locating the source of a context cancellation.

With this commit, you can do the following:

```go
ctx, cancel := contextutil.WithErrCancel(ctx)
go func() {
  cancel(errors.New("something bad happened in a random place")) // line X
}()
<-ctx.Done()
err := contextutil.ExtErr(ctx) // something bad happened in a random place
_, line, _, ok := errors.GetOneLineSource(err) // line X, true
}
```

This doesn't help if the context is canceled by
`context.With{Deadline,Timeout}` but we already have
`contextutil.RunWithTimeout` for that case.

Release note: None
@tbg
Copy link
Member Author

tbg commented Dec 9, 2021

@erikgrinaker not for now or necessarily anytime soon, but I wonder if want to chat about this sometime and maybe help me get this through review at some point.

@tbg tbg marked this pull request as draft December 9, 2021 17:24
@erikgrinaker
Copy link
Contributor

@erikgrinaker not for now or necessarily anytime soon, but I wonder if want to chat about this sometime and maybe help me get this through review at some point.

Yeah sure, seems like there's a bunch of issues around context handling and stoppers that seem ripe for a rethink.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me please how this is different from contextutil.WithCancelReason/GetCancelReason()? Is it just that this one is a better implementation because its context chain walking is smarter, whereas GetCancelReason() just stops at the first context on the way up?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)


pkg/util/contextutil/cancel_test.go, line 56 at r3 (raw file):

	// explody
	// explody
	// explody

how come "explody" overwrote "boom"? By my reading of the code, Err(ctx3) should not change once it was set to "boom".

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me please how this is different from contextutil.WithCancelReason/GetCancelReason()?

FWIW, nobody seems to be using that guy.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)


pkg/util/contextutil/cancel.go, line 79 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Done. (By means of simplifying a lot, but also we're now looking for the top-most canceled context to generate the rich error)

Well but blindly looking at the top-most canceled ctx is not exactly what I wanted. Let's discuss in the new thread just above.


pkg/util/contextutil/cancel.go, line 1 at r3 (raw file):

// Copyright 2021 The Cockroach Authors.

There's an ongoing discussion on the golang repo about propagating cancellation messages, culminating in a promising promise of a prototype: golang/go#26356 (comment)

So maybe let's see where that goes.

In parallel, I've pushed a commit on top of this here, for your consideration.
It changes the semantics of Err(ctx) as I describe below, and it also makes a bold move by returning our custom errors from WithErrCancel.Err(), as also described below.


pkg/util/contextutil/cancel.go, line 23 at r3 (raw file):

// WithErrCancel returns a cancelable context that whose cancellation function
// takes an error. While that error will *not* be returned from `ctx.Err`, the

I think we should be brave and return the nice error from errCancelCtx.Err(). I know that the docs on Context.Err() say that context.Canceled will be returned, but I say we go rogue. I've done some looking around to what this may do to our vendored code, and I think it'd all work out. Some 3rd party code already uses errors.Is(). The only one that doesn't which may affect us is gRPC. gRPC already uses errors.Is() in some places thanks to grpc/grpc-go#4977. I've opened grpc/grpc-go#5112 to see if they want to make all code use it.
Another place that uses the context.Canceled sentinel is in the standard library, here. But, as far I see, nothing would go wrong if that function would fail to match the error, and also the function also has a TODO for cleaning up this reference.

So, I think we'd be in good shape. If you think about it, it doesn't seem likely for someone to write code that matches exclusively on the two sentinels that Context.Err() is documented to return. If we go rogue, I can commit to a grep every 6 months to see if our deps situation has changed.


pkg/util/contextutil/cancel.go, line 26 at r3 (raw file):

// package-level method `Err` will return the error (annotated with
// errors.WithStackDepth) for the returned context and its descendants.
func WithErrCancel(parent context.Context) (context.Context, func(error)) {

what do you think of taking in a name, which is made part of the error, identifying the operation that's being canceled?


pkg/util/contextutil/cancel.go, line 35 at r3 (raw file):

			err = context.Canceled
		}
		err = errors.WithStackDepth(err, 1 /* depth */)

I think we should differentiate between a cancel() call that's actually intended to cancel an operation, and a cancel() call that's expected to be inconsequential, being done only for releasing the context resources. I think we should make the latter case as cheap as possible, because we don't expect anyone to see that error. In particular, let's avoid any error allocations. My prototype uses a sentinel error to signal that case.


pkg/util/contextutil/cancel.go, line 52 at r3 (raw file):

// `cancel(err)` call. When the Context passed to `Err` has multiple such
// parents, the "most distant" one is returned, under the assumption that
// it provides the original reason for the context chain's cancellation.

do you want to make a note that this function relies on nobody overriding the Done() channel in such a way that the parent's cancelation doesn't cancel the child? The standard library protects against such implementers. I don't think we can generally protect against it, but we can mention it.


pkg/util/contextutil/cancel.go, line 69 at r3 (raw file):

		ctx = c.Context

		// If it's canceled, remember the error.

You've said "done" to my past comment below about what error gets returned, but I don't think you've arrived at what I was pushing for. In fact, the example you've written shows that this implementation can behave surprisingly (at least to me). In your example, Err(ctx3) changes over time. I think that's quite weird. Once ctx3 has been canceled, it's error should be frozen. I think we should try to make Err(ctx) reflect the cause of the first cancelation of that context. My commit does that, by making the cancel() function smarter.


pkg/util/contextutil/cancel_test.go, line 56 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

how come "explody" overwrote "boom"? By my reading of the code, Err(ctx3) should not change once it was set to "boom".

I've reread the code and I understand what happens now. But I think this shows that we'd prefer the opposite. Opened a thread on the Err() code, let's continue there.

@tbg
Copy link
Member Author

tbg commented May 17, 2022

golang/go#51365 (comment), if accepted and implemented, will solve this.

@tbg tbg closed this May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants