Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: propagate non-recording spans across rpc boundaries #59992

Merged
merged 15 commits into from
Feb 17, 2021

Conversation

tbg
Copy link
Member

@tbg tbg commented Feb 8, 2021

We were previously propagating RPCs across boundaries only when they
were verbose. We now propagate any span (except the noop span)
regardless of current verbosity.
This ensures that SQL (which always creates real, non-verbose spans by
default) can trace the entirety of its operations.

This enables the background collection of contention metadata in 21.1
and more generally, the idea of always-on tracing. Note that we had
previously considered always-on tracing to mean avoiding the noop span
altogether; due to memory retention issues1 encountered we are
taking a step back from that idea for the 21.1 release.

Release note: None

@tbg tbg requested review from irfansharif, angelapwen and knz February 8, 2021 09:46
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg tbg force-pushed the tracing-real-spans-across-rpc branch 2 times, most recently from c782a70 to 2f8a54a Compare February 8, 2021 10:00
@tbg tbg force-pushed the tracing-real-spans-across-rpc branch from 2f8a54a to 49ff09c Compare February 8, 2021 14:27
Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray "iasd" in the first commit message, typo with "Mult" in second commit.

Reviewed 6 of 6 files at r1, 1 of 1 files at r2, 2 of 2 files at r3, 1 of 1 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angelapwen, @knz, and @tbg)


pkg/sql/logictest/testdata/logic_test/distsql_agg, line 1 at r3 (raw file):

# LogicTest: 5node-default-configs

Why did this change?


pkg/util/tracing/span_test.go, line 187 at r1 (raw file):

func TestSpan_LogStructured(t *testing.T) {
	tr := NewTracer()
	tr._mode = int32(modeBackground)

Why did this change?


pkg/util/tracing/tracer.go, line 297 at r1 (raw file):

	// a real span they need to get one. In all other cases, a noop span will do.
	if !t.AlwaysTrace() &&
		opts.parentTraceID() == 0 &&

I like this simplification. The semantics around recording being switched on/off and what that means for log messages/structured events still feels a bit confused to me (and was trying something elsewhere), and this makes it a bit easier to reason about.

@tbg tbg requested a review from irfansharif February 9, 2021 09:24
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angelapwen, @irfansharif, and @knz)


pkg/sql/logictest/testdata/logic_test/distsql_agg, line 1 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Why did this change?

Thanks for catching that, this is accidental, fixed.


pkg/util/tracing/span_test.go, line 187 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Why did this change?

To validate the changes in GetRecording. Without the fixes to it, this test would have started failing as we short-circuited GetRecording() to nil.

@tbg
Copy link
Member Author

tbg commented Feb 11, 2021

I210209 11:49:00.327146 1 util/log/flags.go:106  stderr capture started
panic: interface conversion: opentracing.Span is *tracing.noopSpan, not *tracing.span [recovered]
	panic: interface conversion: opentracing.Span is *tracing.noopSpan, not *tracing.span

goroutine 628 [running]:
panic(0x4bb5760, 0xc0017c3b30)
	/usr/local/go/src/runtime/panic.go:722 +0x2c2 fp=0xc001737f80 sp=0xc001737ef0 pc=0x7b1a62
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc0004dad80, 0x63133c0, 0xc0017e2ae0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:207 +0x11f fp=0xc001737fe0 sp=0xc001737f80 pc=0x169df4f
runtime.call32(0x0, 0x5808980, 0xc00173bcf8, 0x1800000018)
	/usr/local/go/src/runtime/asm_amd64.s:539 +0x3b fp=0xc001738010 sp=0xc001737fe0 pc=0x7e051b
panic(0x4bb5760, 0xc0017c3b30)
	/usr/local/go/src/runtime/panic.go:679 +0x1b2 fp=0xc0017380a0 sp=0xc001738010 pc=0x7b1952
runtime.panicdottypeE(...)
	/usr/local/go/src/runtime/iface.go:255
runtime.panicdottypeI(0x647e040, 0x510f8e0, 0x4ed8d80)
	/usr/local/go/src/runtime/iface.go:265 +0xf4 fp=0xc0017380c8 sp=0xc0017380a0 pc=0x78c3e4
github.com/cockroachdb/cockroach/pkg/util/tracing.ImportRemoteSpans(0x647e040, 0xc000b6a000, 0xc000171000, 0x1, 0x1, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/tracer_span.go:638 +0x290 fp=0xc001738138 sp=0xc0017380c8 pc=0x111d820
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*grpcTransport).sendBatch(0xc0015f4840, 0x63133c0, 0xc0017e2bd0, 0x3, 0x633f100, 0xc0008940b0, 0x166211f05167de20, 0x0, 0x300000003, 0x3, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/transport.go:173 +0x390 fp=0xc0017381f0 sp=0xc001738138 pc=0x1bd2100
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*grpcTransport).SendNext(0xc0015f4840, 0x63133c0, 0xc0017e2bd0, 0x166211f05167de20, 0x0, 0x300000003, 0x3, 0x0, 0x1, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/transport.go:139 +0x186 fp=0xc0017382e8 sp=0xc0017381f0 pc=0x1bd1d16
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendToReplicas(0xc000865000, 0x63133c0, 0xc0017e2bd0, 0x166211f05167de20, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1870 +0x7d0 fp=0xc001738d80 sp=0xc0017382e8 pc=0x1bc43f0
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendPartialBatch(0xc000865000, 0x63133c0, 0xc0017e2bd0, 0x166211f05167de20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1495 +0x346 fp=0xc00173a1a8 sp=0xc001738d80 pc=0x1bc0946
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).divideAndSendBatchToRanges(0xc000865000, 0x63133c0, 0xc0017e2bd0, 0x166211f05167de20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1132 +0x16f4 fp=0xc00173a808 sp=0xc00173a1a8 pc=0x1bbfe14
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).Send(0xc000865000, 0x63133c0, 0xc0017e2bd0, 0x166211f05167de20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:771 +0x7a6 fp=0xc00173ab68 sp=0xc00173a808 pc=0x1bbc8b6
github.com/cockroachdb/cockroach/pkg/kv.lookupRangeFwdScan(0x63133c0, 0xc0017e2bd0, 0x626a800, 0xc000865000, 0xc0017c4ef0, 0xe, 0xe, 0x1, 0x8, 0x1dcd6500, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/range_lookup.go:312 +0x42b fp=0xc00173af30 sp=0xc00173ab68 pc=0x180baab
github.com/cockroachdb/cockroach/pkg/kv.RangeLookup(0x63133c0, 0xc0017e2bd0, 0x626a800, 0xc000865000, 0xc0017c4ef0, 0xe, 0xe, 0x1, 0x8, 0xab7400, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/range_lookup.go:197 +0x7d2 fp=0xc00173b2e0 sp=0xc00173af30 pc=0x180b1e2
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).RangeLookup(0xc000865000, 0x63133c0, 0xc0017e2bd0, 0xc0017c4ef0, 0xe, 0xe, 0x6313300, 0xc0017e2bd0, 0x977c24, 0xc000508d90, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:447 +0xd4 fp=0xc00173b380 sp=0xc00173b2e0 pc=0x1bbace4
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*RangeDescriptorCache).performRangeLookup(0xc000f4e3c0, 0x63133c0, 0xc0017e2bd0, 0xc0017c4ef0, 0xe, 0xe, 0x0, 0xc0005834d8, 0x97825b, 0x63133c0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/range_cache.go:818 +0x285 fp=0xc00173b450 sp=0xc00173b380 pc=0x1bcc7b5
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*RangeDescriptorCache).tryLookup.func1.1.1(0x6313380, 0xc000508d80, 0x2540be400, 0x6313380)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/range_cache.go:672 +0x8e fp=0xc00173b4e8 sp=0xc00173b450 pc=0x1be6a7e
github.com/cockroachdb/cockroach/pkg/util/contextutil.RunWithTimeout(0x6313380, 0xc000508d80, 0x521601f, 0xc, 0x2540be400, 0xc000583730, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/contextutil/context.go:135 +0xbf fp=0xc00173b570 sp=0xc00173b4e8 pc=0x16a12cf
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*RangeDescriptorCache).tryLookup.func1.1(0x63133c0, 0xc0017e2ae0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/range_cache.go:669 +0x313 fp=0xc00173bc60 sp=0xc00173b570 pc=0x1be6e53
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr(0xc0004dad80, 0x63133c0, 0xc0017e2ae0, 0x524387e, 0x18, 0xc00152fd60, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:326 +0x140 fp=0xc00173bd20 sp=0xc00173bc60 pc=0x169e8f0
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*RangeDescriptorCache).tryLookup.func1(0x16a03b4, 0x580c5e8, 0x0, 0xc00152ff70)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/range_cache.go:657 +0xe9 fp=0xc00173bf48 sp=0xc00173bd20 pc=0x1be7859
github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*Group).doCall(0xc000f4e400, 0xc000b00e10, 0xc000076920, 0x15, 0xc0015f43c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:128 +0x2e fp=0xc00173bfb8 sp=0xc00173bf48 pc=0x1bb14ae
runtime.goexit()

Oh wonderful I love the smell of mixed version problems in the afternoon. Will take a bit longer to get this in I suppose, this is crashing the old version not the new.

@tbg
Copy link
Member Author

tbg commented Feb 12, 2021

Wow so I'm surprised we haven't seen crashes here more (maybe the crash here isn't even this PR, maybe it's just crashing now because I just added a step to turn tracing on in this test. The code on 20.2.4 is pretty sensitive to crashes because it has these assertions that make no sense, in the methods called from this area:

if reply != nil && !rpc.IsLocal(iface) {
for i := range reply.Responses {
if err := reply.Responses[i].GetInner().Verify(ba.Requests[i].GetInner()); err != nil {
log.Errorf(ctx, "%v", err)
}
}
// Import the remotely collected spans, if any.
if len(reply.CollectedSpans) != 0 {
span := opentracing.SpanFromContext(ctx)
if span == nil {
return nil, errors.Errorf(
"trying to ingest remote spans but there is no recording span set up")
}
if err := tracing.ImportRemoteSpans(span, reply.CollectedSpans); err != nil {
return nil, errors.Wrap(err, "error ingesting remote spans")
}
}
}
return reply, err
}

Concretely, here what we see is this:

  • we crash because we call ImportRemoteSpans on the noop span:
  • we call ImportRemoteSpans indiscriminately, whenever a batch response contains a collected trace (see the snippet above)

I assume that the way this particular crash happens is this:

  • test sets tracing to on, setting gets pushed out via gossip
  • n1 sends an RPC to n2. n1 isn't tracing the request yet, so the span is a noop span.
  • when the request arrives at n2, n2 inspects it and sees that there is no trace attached. Since tracing is globally on, it creates a new trace root span and returns the collected recording with the response
  • n1 explodes trying to shove that response into the noop span

But there could also have been a different scenario (pre this PR), though not one that you could hit in practice, since we don't use spans in that way at that part of the stack:

  • n1 has a real span, but it's not recording
  • consequently, it attaches no trace metadata to the RPC
  • n2 is tracing, so it makes a new trace (different traceid) and returns recording
  • n1 hits this error:
    if !s.isRecording() {
    return errors.New("adding Raw Spans to a span that isn't recording")
    }

Now in 20.2.4, if you don't mess with the global tracing (which you shouldn't anyway, as it might well OOM you), you don't see any of this because nobody ever sents any recordings back. But with this PR, since SQL on the new nodes will always propagate a real trace, oof, we're exercising this code path all of the time.

At this point, and seeing how we've made more changes to the tracer than ever before including moving off the OT interfaces, I think we'll play it safe - I'll add a cluster version and make sure that new nodes don't create any spans, period (unless the verbose cluster setting is set, in which case, good luck). I'll also give the code around the interfaces a do-over to make sure they are not exceedingly... assertive. Something that should happen in that context is to gracefully handle importing a recording for trace B into trace A. This should silently ignore the mismatched data. Of course we don't expect that to happen (after this PR), but resilience here will pay off and is cheap.

@tbg
Copy link
Member Author

tbg commented Feb 15, 2021

acceptance: enable tracing in acceptance/version-upgrade

With only this commit, acceptance/version-upgrade passed 6/6 runs. This makes sense, since the test enables tracing before going to upload a binary, so there usually isn't anything going on on the cluster that would have a high chance of triggering #59992 (comment). It's only with the main change here that recordings not expected by v20.2 come back and cause the panic.

@tbg tbg force-pushed the tracing-real-spans-across-rpc branch 3 times, most recently from f9f2103 to 849643a Compare February 15, 2021 13:27
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the first commit: I am not convinceed by the unit test -- if we want to check that traces get collected across RPC boundaries, I'd expect a test that sets up a RPC server, sends an actuall RPC over a network connection, and asserts that the expected trace events are communicated.

In the 6th commit I dislike the approach you've taken, with more explanations about my view below.

(In contrast the 7th commit is super elegant and it was a relief to see that the way forward wasn't too painful)

The rest LGTM

Reviewed 4 of 10 files at r5, 2 of 2 files at r6, 1 of 1 files at r7, 1 of 1 files at r8, 1 of 1 files at r9, 2 of 2 files at r10, 10 of 10 files at r11, 8 of 8 files at r12, 1 of 1 files at r13.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angelapwen, @irfansharif, and @tbg)


pkg/sql/crdb_internal.go, line 1181 at r11 (raw file):

				var numStructured int
				rec.Structured(func(any *ptypes.Any) {

Ouch. I though the entire point of this vtable which leaves out the details of traces was to avoid iterating over all events while populating the table, so that the SQL client can apply a selection before we pay the cost of looking into a recording.

Can you not have a new built-in function which takes a trace/span ID as argument and returns a JSON blob with its properties, including whether it contains any structured event as a boolean (and make the rec.Structured logic stop the iteration as soon as 1 is encountered).

For context, I think you are under-estimating the run-time and memory costs of crdb_internal tables generally and how much (ab)use they take by end-users and our support staff. This stuff needs to be programmed defensively with adequate explanatory comments about their costs in in the table's comment.


pkg/sql/txn_state.go, line 162 at r11 (raw file):

	var traceOpts []tracing.SpanOption
	if !tranCtx.execTestingKnobs.Pretend59315IsFixed {
		traceOpts = append(traceOpts, tracing.WithBypassRegistry())

Please add a comment to indicate that this conditional and the traceOpts can be removed entirely once the linked issue has been addressed.


pkg/sql/sem/builtins/builtins.go, line 3590 at r11 (raw file):

				sp := tracing.SpanFromContext(ctx.Context)
				if sp == nil {
					return nil, errors.New("no active trace span")

The proper way in SQL to indicate there's no value is to return NULL. I don't see a good reason to trash the SQL transaction in this case.


pkg/util/tracing/span.go, line 159 at r9 (raw file):

// recorded traces from other nodes.
func (s *Span) ImportRemoteSpans(remoteSpans []tracingpb.RecordedSpan) error {
	if s.tracer.mode() == modeLegacy && s.crdb.recordingType() == RecordingOff {

Isn't this change needed at soon as the first commit? Especially if, as I suggested above, the first commit was extended to assert that trace events are in fact collected across remote RPCs?


pkg/util/tracing/tracer.go, line 299 at r6 (raw file):

	// also create a real child. Additionally, if the caller explicitly asked for
	// a real span they need to get one. In all other cases, a noop span will do.
	if !t.AlwaysTrace() &&

As a reviewer, I think that !(t.AlwaysTrace() || opts.parentTraceID() != 0 || opts.ForceRealSpan) gives me a more intuitive idea of what this condition is about (i.e. "if I don't have any reason to trace")

tbg added a commit to tbg/cockroach that referenced this pull request Feb 16, 2021
As was rightly pointed out by @knz during review in cockroachdb#59992, the tracing
interceptors were only tested indirectly.

In this commit, we add explicit tests that set up a gRPC server and
verify that for all combinations of unary/streaming the correct trace
spans get created.

Release note: None
@tbg tbg requested a review from knz February 16, 2021 11:25
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Regarding the first commit: I am not convinceed by the unit test -- if we want to check that traces get collected across RPC boundaries, I'd expect a test that sets up a RPC server, sends an actuall RPC over a network connection, and asserts that the expected trace events are communicated.

Good point - I added a comprehensive test for this in an additional commit.

In the 6th commit I dislike the approach you've taken, with more explanations about my view below.

Fair point, I assume this is more a failure to communicate, see the inline discussion.

I added a few more commits - haven't done any squashing or rearranging to keep the review easier, PTAL.

Dismissed @irfansharif from 3 discussions.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angelapwen, @irfansharif, and @knz)


pkg/sql/crdb_internal.go, line 1181 at r11 (raw file):

Previously, knz (kena) wrote…

Ouch. I though the entire point of this vtable which leaves out the details of traces was to avoid iterating over all events while populating the table, so that the SQL client can apply a selection before we pay the cost of looking into a recording.

Can you not have a new built-in function which takes a trace/span ID as argument and returns a JSON blob with its properties, including whether it contains any structured event as a boolean (and make the rec.Structured logic stop the iteration as soon as 1 is encountered).

For context, I think you are under-estimating the run-time and memory costs of crdb_internal tables generally and how much (ab)use they take by end-users and our support staff. This stuff needs to be programmed defensively with adequate explanatory comments about their costs in in the table's comment.

I agree with everything you're saying, and should have clarified how this code came to be (my bad). Essentially @angelapwen had trouble getting the structured data to show up in her attempt to add a built-in to render the payloads from a given span, and while investigating I found the random bits of breakage that caused this. I want this PR to give her a solid foundation to work from, and that meant adding at least a rudimentary test confirming that the payloads to indeed flow from where they originate to where they need to be. Long story short - I would like Angela (with your help) to tackle the shortcomings you're describing. I don't think num_structured should remain in the table once we have a way to ask for the payloads of a specified trace. This was just the easiest thing I could do.


pkg/sql/txn_state.go, line 162 at r11 (raw file):

Previously, knz (kena) wrote…

Please add a comment to indicate that this conditional and the traceOpts can be removed entirely once the linked issue has been addressed.

Done.


pkg/sql/sem/builtins/builtins.go, line 3590 at r11 (raw file):

Previously, knz (kena) wrote…

The proper way in SQL to indicate there's no value is to return NULL. I don't see a good reason to trash the SQL transaction in this case.

Done.


pkg/util/tracing/span.go, line 159 at r9 (raw file):

Previously, knz (kena) wrote…

Isn't this change needed at soon as the first commit? Especially if, as I suggested above, the first commit was extended to assert that trace events are in fact collected across remote RPCs?

Yes, I will rearrange before the merge (when I squash the fixup commits).


pkg/util/tracing/tracer.go, line 299 at r6 (raw file):

Previously, knz (kena) wrote…

As a reviewer, I think that !(t.AlwaysTrace() || opts.parentTraceID() != 0 || opts.ForceRealSpan) gives me a more intuitive idea of what this condition is about (i.e. "if I don't have any reason to trace")

Done.

Copy link

@angelapwen angelapwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @tbg)


pkg/sql/crdb_internal.go, line 1181 at r11 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I agree with everything you're saying, and should have clarified how this code came to be (my bad). Essentially @angelapwen had trouble getting the structured data to show up in her attempt to add a built-in to render the payloads from a given span, and while investigating I found the random bits of breakage that caused this. I want this PR to give her a solid foundation to work from, and that meant adding at least a rudimentary test confirming that the payloads to indeed flow from where they originate to where they need to be. Long story short - I would like Angela (with your help) to tackle the shortcomings you're describing. I don't think num_structured should remain in the table once we have a way to ask for the payloads of a specified trace. This was just the easiest thing I could do.

Acknowledged — this can be removed now that #60616 successfully finds a contention event from a trace via payload.


pkg/sql/crdb_internal.go, line 1187 at r17 (raw file):

				if err := addRow(
					// TODO(angelapwen): we're casting uint64s to int64 here,
					// is that ok?

Hm, I see what you mean - there could be some casting problems if the unsigned int was large enough. I think the DInt needs to be a int64 but we could potentially return an error if the ints are too large?

@angelapwen
Copy link


pkg/sql/sem/builtins/builtins.go, line 3595 at r17 (raw file):

			},
			Info: "Returns the current trace ID or an error if no trace is open.",
			// NB: possibly this is or could be made stable, but it's not worth it.

Nit: did you mean possibly this is not? Not sure what this means.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r14, 1 of 1 files at r15, 1 of 1 files at r16, 7 of 7 files at r17.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)


pkg/sql/crdb_internal.go, line 1187 at r17 (raw file):

Previously, angelapwen (Angela P Wen) wrote…

Hm, I see what you mean - there could be some casting problems if the unsigned int was large enough. I think the DInt needs to be a int64 but we could potentially return an error if the ints are too large?

I think the conversion between uint and int in Go is lossless, i.e. it does not lose information. A large uint value will convert to a negative signed value, but the conversion remains bijective.

Since these are identifiers (and not quantities), we only care about the "identifying" nature of the values.

However, I do agree it may be confusing for UX. @tbg was there any reason to make the trace/span IDs unsigned?


pkg/sql/sem/builtins/builtins.go, line 3595 at r17 (raw file):

Previously, angelapwen (Angela P Wen) wrote…

Nit: did you mean possibly this is not? Not sure what this means.

I think the comment is correct, although tobias could have clarified "stable (i.e. non-volatile)".

(Angela: The volatibility attribute indicates whether the result of a built-in function is going to change throughout the execution of the statement. Since our trace IDs are fixed for an entire SQL txn, at lease for an entire SQL stmt, I believe like Tobias that they are stable and thus this built-in can be marked as non-volatile, but I also agree it's a hassle to test / validate and so to be conservative we can keep this as is.)

@tbg I think also it's worth filing an issue for the optimizer team to look at. The queries we'd like to advertise to consume trace data will require SQL joins. If this builtin is marked volatile, the query plans will be more complex. We need some research to investigate the penalty and set the volatility properly if the penalty is high.

@tbg tbg requested a review from angelapwen February 17, 2021 11:12
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @angelapwen)


pkg/sql/crdb_internal.go, line 1187 at r17 (raw file):

Previously, knz (kena) wrote…

I think the conversion between uint and int in Go is lossless, i.e. it does not lose information. A large uint value will convert to a negative signed value, but the conversion remains bijective.

Since these are identifiers (and not quantities), we only care about the "identifying" nature of the values.

However, I do agree it may be confusing for UX. @tbg was there any reason to make the trace/span IDs unsigned?

They have historically been unsigned, and I don't think there's a good reason. However, we can't just change it due to the necessary migration. Concretely, I think where the uintness matters is here:

carrier.Set(fieldNameTraceID, strconv.FormatUint(sm.traceID, 16))
carrier.Set(fieldNameSpanID, strconv.FormatUint(sm.spanID, 16))

and here:

case fieldNameTraceID:
var err error
traceID, err = strconv.ParseUint(v, 16, 64)
if err != nil {
return opentracing.ErrSpanContextCorrupted
}
case fieldNameSpanID:
var err error
spanID, err = strconv.ParseUint(v, 16, 64)
if err != nil {
return opentracing.ErrSpanContextCorrupted
}

So for 20.2 support, we have to keep it as a uint on the wire. Which should be fine! But honestly I'm not sure if there is enough of a reason here to make the change. uint64 is a fine choice and nobody likes IDs that have signs. Can we make the ID columns BIGINTs instead? I do agree that we want the representation of IDs to match between internal and SQL, or we're going to be very confused once we walk further down the path in which trace messages are "stored" via log messages.

Filed #60668


pkg/sql/sem/builtins/builtins.go, line 3595 at r17 (raw file):

Previously, knz (kena) wrote…

I think the comment is correct, although tobias could have clarified "stable (i.e. non-volatile)".

(Angela: The volatibility attribute indicates whether the result of a built-in function is going to change throughout the execution of the statement. Since our trace IDs are fixed for an entire SQL txn, at lease for an entire SQL stmt, I believe like Tobias that they are stable and thus this built-in can be marked as non-volatile, but I also agree it's a hassle to test / validate and so to be conservative we can keep this as is.)

@tbg I think also it's worth filing an issue for the optimizer team to look at. The queries we'd like to advertise to consume trace data will require SQL joins. If this builtin is marked volatile, the query plans will be more complex. We need some research to investigate the penalty and set the volatility properly if the penalty is high.

I think we should reach out when we have some actual queries, otherwise we're asking them in a vacuum.

@tbg tbg requested a review from knz February 17, 2021 22:47
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @angelapwen, @irfansharif, and @knz)


pkg/sql/logictest/logic.go, line 567 at r23 (raw file):

Previously, RaduBerinde wrote…

[nit] 5node-pretend-59315-is-fixed

I didn't do this, was too scared to eat another cycle

@RaduBerinde
Copy link
Member


pkg/sql/logictest/logic.go, line 567 at r23 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I didn't do this, was too scared to eat another cycle

Heh, no problem. Hopefully it will be fixed soon so we won't have to pretend anymore :)

craig bot pushed a commit that referenced this pull request Feb 18, 2021
59588: streamingest: add job level test for stream ingestion r=pbardea a=adityamaru

This change adds a test that exercises all the components of the stream
ingestion flow. It fixes some missing pieces that were discovered while
writing the test.

Informs: #59175

Release note: None

60424: sql: include sampled stats in TestSampledStatsCollection r=yuzefovich a=asubiotto

Depends on #59992, which is required for this new regression test to pass.

TestSampledStatsCollection would previously only check that stats that are
collected regardless of the sample rate are returned. These types of stats
(rows/bytes read) are propagated using metadata, rather than the trace.

This resulted in us silently failing to collect any stats when sampling was
enabled once the tracing mode was reverted back to legacy. To avoid this kind
of thing happening again, this commit adds a check that max memory usage is
reported to be non-zero.

Release note: None (this is a new feature that has no user impact yet)

60626: kvserver: initialize propBuf LAI tracking r=andreimatei a=andreimatei

The initialization of the LAI tracking in the proposal buffer seems
pretty lacking (see #60625). This patch adds initialization of
propBuf.liBase at propBuf.Init() time, which is irrelevant for
production, but will help future tests which will surely want the
a propBuf's first assigned LAIs to have some relationship to the replica
state.

Release note: None

Co-authored-by: Aditya Maru <adityamaru@gmail.com>
Co-authored-by: Alfonso Subiotto Marques <alfonso@cockroachlabs.com>
Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Mar 1, 2021
Fixes cockroachdb#60677, removing a stop-gap introduced in cockroachdb#59992.

We were previously leaking  "async consensus" spans, which was possible
when a given proposal was never flushed out of the replica's proposal
buffer. On server shut down, this buffered proposal was never finished,
and thus the embedded span never closed. We now add a closer to clean up
after ourselves.

Release justification: bug fixes and low-risk updates to new
functionality.

Release note: None
tbg added a commit that referenced this pull request Mar 2, 2021
We no longer return errors from ImportRemoteSpans, and we saw in the
[past] that it is not a good idea to do so - it's easy to mess up
cross-version behavior.

[past]: #59992 (comment)

Release justification: low-risk code clarification
Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Mar 2, 2021
Fixes cockroachdb#60677, removing a stop-gap introduced in cockroachdb#59992.

We were previously leaking  "async consensus" spans, which was possible
when a given proposal was never flushed out of the replica's proposal
buffer. On server shut down, this buffered proposal was never finished,
and thus the embedded span never closed. We now add a closer to clean up
after ourselves.

Release justification: bug fixes and low-risk updates to new
functionality.

Release note: None
craig bot pushed a commit that referenced this pull request Mar 3, 2021
61279: kvserver: plug a tracing span leak r=irfansharif a=irfansharif

Fixes #60677, removing a stop-gap introduced in #59992.

We were previously leaking  "async consensus" spans, which was possible
when a given proposal was never flushed out of the replica's proposal
buffer. On server shut down, this buffered proposal was never finished,
and thus the embedded span never closed. We now add a closer to clean up
after ourselves.

Release justification: bug fixes and low-risk updates to new
functionality.

Release note: None

---

+cc @angelapwen / @knz / @erikgrinaker for pod-visibility.

61344: sql: add estimated row count to all operators in EXPLAIN r=RaduBerinde a=RaduBerinde

This change adds the estimated row count to all operators in the
vanilla EXPLAIN, as long as the tables involved had statistics.

Fixes #61310.

Release justification: bug fixes and low-risk updates to new functionality

Release note (sql change): EXPLAIN now shows estimated row counts for
all operators even without VERBOSE (except when we don't have
statistics for the tables).

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
tbg added a commit that referenced this pull request Mar 8, 2021
We no longer return errors from ImportRemoteSpans, and we saw in the
[past] that it is not a good idea to do so - it's easy to mess up
cross-version behavior.

[past]: #59992 (comment)

Release justification: low-risk code clarification
Release note: None
craig bot pushed a commit that referenced this pull request Mar 8, 2021
61327: tracing: don't return an error from ImportRemoteSpans r=erikgrinaker,angelapwen a=tbg

We no longer return errors from ImportRemoteSpans, and we saw in the
[past] that it is not a good idea to do so - it's easy to mess up
cross-version behavior.

[past]: #59992 (comment)

Release justification: low-risk code clarification
Release note: None


Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Apr 1, 2021
We arrived at the previous default rate of 10% back in cockroachdb#59379. This was
back when we were creating real tracing spans for all statements, and
for sampled statements, we were propagating additional stats payloads.
Consequently what cockroachdb#59379 ended up measuring (and finding the overhead
acceptable) for was the performance hit we would incur for propagating
stats payloads for statements already using real tracing spans.

Since then, the landscape has changed. Notably we introduced cockroachdb#61777,
which made it so that we were only using real tracing spans for sampled
statements. This was done after performance analysis in cockroachdb#59424 showed
that the use of real tracing spans in all statements resulted in
tremendous overhead, for no real benefit.

What this now leaves us with is a sampling rate that was tuned by only
considering the stats payload overhead. What we want now is to also
consider the overhead of using real tracing spans for sampled
statements, vs. not. Doing this analysis gives us a very different
picture for what the default rate should be.

---

To find out what the overhead for sampled statements are currently, we
experimented with kv95/enc=false/nodes=1/cpu=32. It's a simple
benchmark that does little more than one-off statements, so should give
us a concise picture of the sampling overhead. We ran six experiments
in total (each corresponding to a pair of read+write rows), done in
groups of three (each group corresponding to a table below). Each
run in turn is comprised of 10 iterations of kv95, and what's varied
between each run is the default sampling rate. We pin a sampling rate of
0.0 as the baseline that effectively switches off sampling entirely (and
tracing), and measure the throughput degradation as we vary the sampling
rate.

                          ops/sec            ops/sec
    --------------------|------------------|------------------
    rate   op      grp  | median    diff   | mean      diff
    --------------------|------------------|------------------
    0.00 / read  / #1   | 69817.90         | 69406.37
    0.01 / read  / #1   | 69300.35  -0.74% | 68717.23  -0.99%
    0.10 / read  / #1   | 67743.35  -2.97% | 67601.81  -2.60%
    0.00 / write / #1   |  3672.55         |  3653.63
    0.01 / write / #1   |  3647.65  -0.68% |  3615.90  -1.03%
    0.10 / write / #1   |  3567.20  -2.87% |  3558.90  -2.59%

                          ops/sec            ops/sec
    --------------------|------------------|------------------
    rate   op      grp  | median    diff   | mean      diff
    --------------------|------------------|------------------
    0.00 / read  / #2   | 69440.80          68893.24
    0.01 / read  / #2   | 69481.55  +0.06%  69463.13  +0.82% (probably in the noise margin)
    0.10 / read  / #2   | 67841.80  -2.30%  66992.55  -2.76%
    0.00 / write / #2   |  3652.45           3625.24
    0.01 / write / #2   |  3657.55  -0.14%   3654.34  +0.80%
    0.10 / write / #2   |  3570.75  -2.24%   3526.04  -2.74%

The results above suggest that the current default rate of 10% is too
high, and a 1% rate is much more acceptable.

---

The fact that the cost of sampling is largely dominated by tracing is
extremely unfortunate. We have ideas for how that can be improved
(prototyped in cockroachdb#62227), but they're much too invasive to backport to
21.1. It's unfortunate that we only discovered the overhead this late in
the development cycle. It was due to two major reasons:
- cockroachdb#59992 landed late in the cycle, and enabled tracing for realsies (by
  propagating real tracing spans across rpc boundaries). We had done
  sanity checking for the tracing overhead before this point, but failed
  to realize that cockroachdb#59992 would merit re-analysis.
- The test that alerted us to the degradation (tpccbench) had be
  persistently failing for a myriad of other reasons, so we didn't learn
  until too late that tracing was the latest offendor. tpccbench also
  doesn't deal with VM overload well (something cockroachdb#62361 hopes to
  address), and after tracing was enabled for realsies, this was the
  dominant failure mode. This resulted in perf data not making it's way
  to roachperf, which further hid possible indicators we had a major
  regression on our hands. We also didn't have a healthy process looking
  at roachperf on a continual basis, something we're looking to rectify
  going forward. We would've picked up on this regression had we been
  closely monitoring the kv95 charts.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Apr 2, 2021
We arrived at the previous default rate of 10% back in cockroachdb#59379. This was
back when we were creating real tracing spans for all statements, and
for sampled statements, we were propagating additional stats payloads.
Consequently what cockroachdb#59379 ended up measuring (and finding the overhead
acceptable) for was the performance hit we would incur for propagating
stats payloads for statements already using real tracing spans.

Since then, the landscape has changed. Notably we introduced cockroachdb#61777,
which made it so that we were only using real tracing spans for sampled
statements. This was done after performance analysis in cockroachdb#59424 showed
that the use of real tracing spans in all statements resulted in
tremendous overhead, for no real benefit.

What this now leaves us with is a sampling rate that was tuned by only
considering the stats payload overhead. What we want now is to also
consider the overhead of using real tracing spans for sampled
statements, vs. not. Doing this analysis gives us a very different
picture for what the default rate should be.

---

To find out what the overhead for sampled statements are currently, we
experimented with kv95/enc=false/nodes=1/cpu=32. It's a simple
benchmark that does little more than one-off statements, so should give
us a concise picture of the sampling overhead. We ran six experiments
in total (each corresponding to a pair of read+write rows), done in
groups of three (each group corresponding to a table below). Each
run in turn is comprised of 10 iterations of kv95, and what's varied
between each run is the default sampling rate. We pin a sampling rate of
0.0 as the baseline that effectively switches off sampling entirely (and
tracing), and measure the throughput degradation as we vary the sampling
rate.

                          ops/sec            ops/sec
    --------------------|------------------|------------------
    rate   op      grp  | median    diff   | mean      diff
    --------------------|------------------|------------------
    0.00 / read  / #1   | 69817.90         | 69406.37
    0.01 / read  / #1   | 69300.35  -0.74% | 68717.23  -0.99%
    0.10 / read  / #1   | 67743.35  -2.97% | 67601.81  -2.60%
    0.00 / write / #1   |  3672.55         |  3653.63
    0.01 / write / #1   |  3647.65  -0.68% |  3615.90  -1.03%
    0.10 / write / #1   |  3567.20  -2.87% |  3558.90  -2.59%

                          ops/sec            ops/sec
    --------------------|------------------|------------------
    rate   op      grp  | median    diff   | mean      diff
    --------------------|------------------|------------------
    0.00 / read  / #2   | 69440.80          68893.24
    0.01 / read  / #2   | 69481.55  +0.06%  69463.13  +0.82% (probably in the noise margin)
    0.10 / read  / #2   | 67841.80  -2.30%  66992.55  -2.76%
    0.00 / write / #2   |  3652.45           3625.24
    0.01 / write / #2   |  3657.55  -0.14%   3654.34  +0.80%
    0.10 / write / #2   |  3570.75  -2.24%   3526.04  -2.74%

The results above suggest that the current default rate of 10% is too
high, and a 1% rate is much more acceptable.

---

The fact that the cost of sampling is largely dominated by tracing is
extremely unfortunate. We have ideas for how that can be improved
(prototyped in cockroachdb#62227), but they're much too invasive to backport to
21.1. It's unfortunate that we only discovered the overhead this late in
the development cycle. It was due to two major reasons:
- cockroachdb#59992 landed late in the cycle, and enabled tracing for realsies (by
  propagating real tracing spans across rpc boundaries). We had done
  sanity checking for the tracing overhead before this point, but failed
  to realize that cockroachdb#59992 would merit re-analysis.
- The test that alerted us to the degradation (tpccbench) had be
  persistently failing for a myriad of other reasons, so we didn't learn
  until too late that tracing was the latest offendor. tpccbench also
  doesn't deal with VM overload well (something cockroachdb#62361 hopes to
  address), and after tracing was enabled for realsies, this was the
  dominant failure mode. This resulted in perf data not making it's way
  to roachperf, which further hid possible indicators we had a major
  regression on our hands. We also didn't have a healthy process looking
  at roachperf on a continual basis, something we're looking to rectify
  going forward. We would've picked up on this regression had we been
  closely monitoring the kv95 charts.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants