tracing: propagate non-recording spans across rpc boundaries #59992

tbg · 2021-02-08T09:46:35Z

We were previously propagating RPCs across boundaries only when they
were verbose. We now propagate any span (except the noop span)
regardless of current verbosity.
This ensures that SQL (which always creates real, non-verbose spans by
default) can trace the entirety of its operations.

This enables the background collection of contention metadata in 21.1
and more generally, the idea of always-on tracing. Note that we had
previously considered always-on tracing to mean avoiding the noop span
altogether; due to memory retention issues1 encountered we are
taking a step back from that idea for the 21.1 release.

Release note: None

cockroach-teamcity · 2021-02-08T09:46:45Z

This change is

irfansharif

Stray "iasd" in the first commit message, typo with "Mult" in second commit.

Reviewed 6 of 6 files at r1, 1 of 1 files at r2, 2 of 2 files at r3, 1 of 1 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelapwen, @knz, and @tbg)

pkg/sql/logictest/testdata/logic_test/distsql_agg, line 1 at r3 (raw file):

# LogicTest: 5node-default-configs

Why did this change?

pkg/util/tracing/span_test.go, line 187 at r1 (raw file):

func TestSpan_LogStructured(t *testing.T) {
	tr := NewTracer()
	tr._mode = int32(modeBackground)

Why did this change?

pkg/util/tracing/tracer.go, line 297 at r1 (raw file):

	// a real span they need to get one. In all other cases, a noop span will do.
	if !t.AlwaysTrace() &&
		opts.parentTraceID() == 0 &&

I like this simplification. The semantics around recording being switched on/off and what that means for log messages/structured events still feels a bit confused to me (and was trying something elsewhere), and this makes it a bit easier to reason about.

tbg

TFTR!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelapwen, @irfansharif, and @knz)

pkg/sql/logictest/testdata/logic_test/distsql_agg, line 1 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Why did this change?

Thanks for catching that, this is accidental, fixed.

pkg/util/tracing/span_test.go, line 187 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Why did this change?

To validate the changes in GetRecording. Without the fixes to it, this test would have started failing as we short-circuited GetRecording() to nil.

tbg · 2021-02-11T13:23:00Z

I210209 11:49:00.327146 1 util/log/flags.go:106  stderr capture started
panic: interface conversion: opentracing.Span is *tracing.noopSpan, not *tracing.span [recovered]
	panic: interface conversion: opentracing.Span is *tracing.noopSpan, not *tracing.span

goroutine 628 [running]:
panic(0x4bb5760, 0xc0017c3b30)
	/usr/local/go/src/runtime/panic.go:722 +0x2c2 fp=0xc001737f80 sp=0xc001737ef0 pc=0x7b1a62
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc0004dad80, 0x63133c0, 0xc0017e2ae0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:207 +0x11f fp=0xc001737fe0 sp=0xc001737f80 pc=0x169df4f
runtime.call32(0x0, 0x5808980, 0xc00173bcf8, 0x1800000018)
	/usr/local/go/src/runtime/asm_amd64.s:539 +0x3b fp=0xc001738010 sp=0xc001737fe0 pc=0x7e051b
panic(0x4bb5760, 0xc0017c3b30)
	/usr/local/go/src/runtime/panic.go:679 +0x1b2 fp=0xc0017380a0 sp=0xc001738010 pc=0x7b1952
runtime.panicdottypeE(...)
	/usr/local/go/src/runtime/iface.go:255
runtime.panicdottypeI(0x647e040, 0x510f8e0, 0x4ed8d80)
	/usr/local/go/src/runtime/iface.go:265 +0xf4 fp=0xc0017380c8 sp=0xc0017380a0 pc=0x78c3e4
github.com/cockroachdb/cockroach/pkg/util/tracing.ImportRemoteSpans(0x647e040, 0xc000b6a000, 0xc000171000, 0x1, 0x1, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/tracing/tracer_span.go:638 +0x290 fp=0xc001738138 sp=0xc0017380c8 pc=0x111d820
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*grpcTransport).sendBatch(0xc0015f4840, 0x63133c0, 0xc0017e2bd0, 0x3, 0x633f100, 0xc0008940b0, 0x166211f05167de20, 0x0, 0x300000003, 0x3, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/transport.go:173 +0x390 fp=0xc0017381f0 sp=0xc001738138 pc=0x1bd2100
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*grpcTransport).SendNext(0xc0015f4840, 0x63133c0, 0xc0017e2bd0, 0x166211f05167de20, 0x0, 0x300000003, 0x3, 0x0, 0x1, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/transport.go:139 +0x186 fp=0xc0017382e8 sp=0xc0017381f0 pc=0x1bd1d16
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendToReplicas(0xc000865000, 0x63133c0, 0xc0017e2bd0, 0x166211f05167de20, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1870 +0x7d0 fp=0xc001738d80 sp=0xc0017382e8 pc=0x1bc43f0
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendPartialBatch(0xc000865000, 0x63133c0, 0xc0017e2bd0, 0x166211f05167de20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1495 +0x346 fp=0xc00173a1a8 sp=0xc001738d80 pc=0x1bc0946
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).divideAndSendBatchToRanges(0xc000865000, 0x63133c0, 0xc0017e2bd0, 0x166211f05167de20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1132 +0x16f4 fp=0xc00173a808 sp=0xc00173a1a8 pc=0x1bbfe14
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).Send(0xc000865000, 0x63133c0, 0xc0017e2bd0, 0x166211f05167de20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:771 +0x7a6 fp=0xc00173ab68 sp=0xc00173a808 pc=0x1bbc8b6
github.com/cockroachdb/cockroach/pkg/kv.lookupRangeFwdScan(0x63133c0, 0xc0017e2bd0, 0x626a800, 0xc000865000, 0xc0017c4ef0, 0xe, 0xe, 0x1, 0x8, 0x1dcd6500, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/range_lookup.go:312 +0x42b fp=0xc00173af30 sp=0xc00173ab68 pc=0x180baab
github.com/cockroachdb/cockroach/pkg/kv.RangeLookup(0x63133c0, 0xc0017e2bd0, 0x626a800, 0xc000865000, 0xc0017c4ef0, 0xe, 0xe, 0x1, 0x8, 0xab7400, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/range_lookup.go:197 +0x7d2 fp=0xc00173b2e0 sp=0xc00173af30 pc=0x180b1e2
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).RangeLookup(0xc000865000, 0x63133c0, 0xc0017e2bd0, 0xc0017c4ef0, 0xe, 0xe, 0x6313300, 0xc0017e2bd0, 0x977c24, 0xc000508d90, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:447 +0xd4 fp=0xc00173b380 sp=0xc00173b2e0 pc=0x1bbace4
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*RangeDescriptorCache).performRangeLookup(0xc000f4e3c0, 0x63133c0, 0xc0017e2bd0, 0xc0017c4ef0, 0xe, 0xe, 0x0, 0xc0005834d8, 0x97825b, 0x63133c0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/range_cache.go:818 +0x285 fp=0xc00173b450 sp=0xc00173b380 pc=0x1bcc7b5
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*RangeDescriptorCache).tryLookup.func1.1.1(0x6313380, 0xc000508d80, 0x2540be400, 0x6313380)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/range_cache.go:672 +0x8e fp=0xc00173b4e8 sp=0xc00173b450 pc=0x1be6a7e
github.com/cockroachdb/cockroach/pkg/util/contextutil.RunWithTimeout(0x6313380, 0xc000508d80, 0x521601f, 0xc, 0x2540be400, 0xc000583730, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/contextutil/context.go:135 +0xbf fp=0xc00173b570 sp=0xc00173b4e8 pc=0x16a12cf
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*RangeDescriptorCache).tryLookup.func1.1(0x63133c0, 0xc0017e2ae0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/range_cache.go:669 +0x313 fp=0xc00173bc60 sp=0xc00173b570 pc=0x1be6e53
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr(0xc0004dad80, 0x63133c0, 0xc0017e2ae0, 0x524387e, 0x18, 0xc00152fd60, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:326 +0x140 fp=0xc00173bd20 sp=0xc00173bc60 pc=0x169e8f0
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*RangeDescriptorCache).tryLookup.func1(0x16a03b4, 0x580c5e8, 0x0, 0xc00152ff70)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/range_cache.go:657 +0xe9 fp=0xc00173bf48 sp=0xc00173bd20 pc=0x1be7859
github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*Group).doCall(0xc000f4e400, 0xc000b00e10, 0xc000076920, 0x15, 0xc0015f43c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:128 +0x2e fp=0xc00173bfb8 sp=0xc00173bf48 pc=0x1bb14ae
runtime.goexit()

Oh wonderful I love the smell of mixed version problems in the afternoon. Will take a bit longer to get this in I suppose, this is crashing the old version not the new.

tbg · 2021-02-12T11:43:20Z

Wow so I'm surprised we haven't seen crashes here more (maybe the crash here isn't even this PR, maybe it's just crashing now because I just added a step to turn tracing on in this test. The code on 20.2.4 is pretty sensitive to crashes because it has these assertions that make no sense, in the methods called from this area:

cockroach/pkg/kv/kvclient/kvcoord/transport.go

Lines 160 to 179 in 9c7ef62

    
           	if reply != nil && !rpc.IsLocal(iface) { 
        
           		for i := range reply.Responses { 
        
           			if err := reply.Responses[i].GetInner().Verify(ba.Requests[i].GetInner()); err != nil { 
        
           				log.Errorf(ctx, "%v", err) 
        
           			} 
        
           		} 
        
           		// Import the remotely collected spans, if any. 
        
           		if len(reply.CollectedSpans) != 0 { 
        
           			span := opentracing.SpanFromContext(ctx) 
        
           			if span == nil { 
        
           				return nil, errors.Errorf( 
        
           					"trying to ingest remote spans but there is no recording span set up") 
        
           			} 
        
           			if err := tracing.ImportRemoteSpans(span, reply.CollectedSpans); err != nil { 
        
           				return nil, errors.Wrap(err, "error ingesting remote spans") 
        
           			} 
        
           		} 
        
           	} 
        
           	return reply, err 
        
           }

Concretely, here what we see is this:

we crash because we call ImportRemoteSpans on the noop span:

cockroach/pkg/util/tracing/tracer_span.go

Line 638 in a9eb3fa

s := os.(*span)
we call ImportRemoteSpans indiscriminately, whenever a batch response contains a collected trace (see the snippet above)

I assume that the way this particular crash happens is this:

test sets tracing to on, setting gets pushed out via gossip
n1 sends an RPC to n2. n1 isn't tracing the request yet, so the span is a noop span.
when the request arrives at n2, n2 inspects it and sees that there is no trace attached. Since tracing is globally on, it creates a new trace root span and returns the collected recording with the response
n1 explodes trying to shove that response into the noop span

But there could also have been a different scenario (pre this PR), though not one that you could hit in practice, since we don't use spans in that way at that part of the stack:

n1 has a real span, but it's not recording
consequently, it attaches no trace metadata to the RPC
n2 is tracing, so it makes a new trace (different traceid) and returns recording

n1 hits this error:

cockroach/pkg/util/tracing/tracer_span.go

Lines 639 to 641 in a9eb3fa

    
           if !s.isRecording() { 
        
           	return errors.New("adding Raw Spans to a span that isn't recording") 
        
           }

Now in 20.2.4, if you don't mess with the global tracing (which you shouldn't anyway, as it might well OOM you), you don't see any of this because nobody ever sents any recordings back. But with this PR, since SQL on the new nodes will always propagate a real trace, oof, we're exercising this code path all of the time.

At this point, and seeing how we've made more changes to the tracer than ever before including moving off the OT interfaces, I think we'll play it safe - I'll add a cluster version and make sure that new nodes don't create any spans, period (unless the verbose cluster setting is set, in which case, good luck). I'll also give the code around the interfaces a do-over to make sure they are not exceedingly... assertive. Something that should happen in that context is to gracefully handle importing a recording for trace B into trace A. This should silently ignore the mismatched data. Of course we don't expect that to happen (after this PR), but resilience here will pay off and is cheap.

tbg · 2021-02-15T09:46:50Z

acceptance: enable tracing in acceptance/version-upgrade

With only this commit, acceptance/version-upgrade passed 6/6 runs. This makes sense, since the test enables tracing before going to upload a binary, so there usually isn't anything going on on the cluster that would have a high chance of triggering #59992 (comment). It's only with the main change here that recordings not expected by v20.2 come back and cause the panic.

knz

Regarding the first commit: I am not convinceed by the unit test -- if we want to check that traces get collected across RPC boundaries, I'd expect a test that sets up a RPC server, sends an actuall RPC over a network connection, and asserts that the expected trace events are communicated.

In the 6th commit I dislike the approach you've taken, with more explanations about my view below.

(In contrast the 7th commit is super elegant and it was a relief to see that the way forward wasn't too painful)

The rest LGTM

Reviewed 4 of 10 files at r5, 2 of 2 files at r6, 1 of 1 files at r7, 1 of 1 files at r8, 1 of 1 files at r9, 2 of 2 files at r10, 10 of 10 files at r11, 8 of 8 files at r12, 1 of 1 files at r13.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelapwen, @irfansharif, and @tbg)

pkg/sql/crdb_internal.go, line 1181 at r11 (raw file):

				var numStructured int
				rec.Structured(func(any *ptypes.Any) {

Ouch. I though the entire point of this vtable which leaves out the details of traces was to avoid iterating over all events while populating the table, so that the SQL client can apply a selection before we pay the cost of looking into a recording.

Can you not have a new built-in function which takes a trace/span ID as argument and returns a JSON blob with its properties, including whether it contains any structured event as a boolean (and make the rec.Structured logic stop the iteration as soon as 1 is encountered).

For context, I think you are under-estimating the run-time and memory costs of crdb_internal tables generally and how much (ab)use they take by end-users and our support staff. This stuff needs to be programmed defensively with adequate explanatory comments about their costs in in the table's comment.

pkg/sql/txn_state.go, line 162 at r11 (raw file):

	var traceOpts []tracing.SpanOption
	if !tranCtx.execTestingKnobs.Pretend59315IsFixed {
		traceOpts = append(traceOpts, tracing.WithBypassRegistry())

Please add a comment to indicate that this conditional and the traceOpts can be removed entirely once the linked issue has been addressed.

pkg/sql/sem/builtins/builtins.go, line 3590 at r11 (raw file):

				sp := tracing.SpanFromContext(ctx.Context)
				if sp == nil {
					return nil, errors.New("no active trace span")

The proper way in SQL to indicate there's no value is to return NULL. I don't see a good reason to trash the SQL transaction in this case.

pkg/util/tracing/span.go, line 159 at r9 (raw file):

// recorded traces from other nodes.
func (s *Span) ImportRemoteSpans(remoteSpans []tracingpb.RecordedSpan) error {
	if s.tracer.mode() == modeLegacy && s.crdb.recordingType() == RecordingOff {

Isn't this change needed at soon as the first commit? Especially if, as I suggested above, the first commit was extended to assert that trace events are in fact collected across remote RPCs?

pkg/util/tracing/tracer.go, line 299 at r6 (raw file):

	// also create a real child. Additionally, if the caller explicitly asked for
	// a real span they need to get one. In all other cases, a noop span will do.
	if !t.AlwaysTrace() &&

As a reviewer, I think that !(t.AlwaysTrace() || opts.parentTraceID() != 0 || opts.ForceRealSpan) gives me a more intuitive idea of what this condition is about (i.e. "if I don't have any reason to trace")

@knz

As was rightly pointed out by @knz during review in cockroachdb#59992, the tracing interceptors were only tested indirectly. In this commit, we add explicit tests that set up a gRPC server and verify that for all combinations of unary/streaming the correct trace spans get created. Release note: None

tbg

TFTR!

Regarding the first commit: I am not convinceed by the unit test -- if we want to check that traces get collected across RPC boundaries, I'd expect a test that sets up a RPC server, sends an actuall RPC over a network connection, and asserts that the expected trace events are communicated.

Good point - I added a comprehensive test for this in an additional commit.

In the 6th commit I dislike the approach you've taken, with more explanations about my view below.

Fair point, I assume this is more a failure to communicate, see the inline discussion.

I added a few more commits - haven't done any squashing or rearranging to keep the review easier, PTAL.

Dismissed @irfansharif from 3 discussions.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelapwen, @irfansharif, and @knz)

pkg/sql/crdb_internal.go, line 1181 at r11 (raw file):

Previously, knz (kena) wrote…

Ouch. I though the entire point of this vtable which leaves out the details of traces was to avoid iterating over all events while populating the table, so that the SQL client can apply a selection before we pay the cost of looking into a recording.

Can you not have a new built-in function which takes a trace/span ID as argument and returns a JSON blob with its properties, including whether it contains any structured event as a boolean (and make the rec.Structured logic stop the iteration as soon as 1 is encountered).

For context, I think you are under-estimating the run-time and memory costs of crdb_internal tables generally and how much (ab)use they take by end-users and our support staff. This stuff needs to be programmed defensively with adequate explanatory comments about their costs in in the table's comment.

I agree with everything you're saying, and should have clarified how this code came to be (my bad). Essentially @angelapwen had trouble getting the structured data to show up in her attempt to add a built-in to render the payloads from a given span, and while investigating I found the random bits of breakage that caused this. I want this PR to give her a solid foundation to work from, and that meant adding at least a rudimentary test confirming that the payloads to indeed flow from where they originate to where they need to be. Long story short - I would like Angela (with your help) to tackle the shortcomings you're describing. I don't think num_structured should remain in the table once we have a way to ask for the payloads of a specified trace. This was just the easiest thing I could do.

pkg/sql/txn_state.go, line 162 at r11 (raw file):

Previously, knz (kena) wrote…

Please add a comment to indicate that this conditional and the traceOpts can be removed entirely once the linked issue has been addressed.

Done.

pkg/sql/sem/builtins/builtins.go, line 3590 at r11 (raw file):

Previously, knz (kena) wrote…

The proper way in SQL to indicate there's no value is to return NULL. I don't see a good reason to trash the SQL transaction in this case.

Done.

pkg/util/tracing/span.go, line 159 at r9 (raw file):

Previously, knz (kena) wrote…

Isn't this change needed at soon as the first commit? Especially if, as I suggested above, the first commit was extended to assert that trace events are in fact collected across remote RPCs?

Yes, I will rearrange before the merge (when I squash the fixup commits).

pkg/util/tracing/tracer.go, line 299 at r6 (raw file):

Previously, knz (kena) wrote…

As a reviewer, I think that !(t.AlwaysTrace() || opts.parentTraceID() != 0 || opts.ForceRealSpan) gives me a more intuitive idea of what this condition is about (i.e. "if I don't have any reason to trace")

Done.

angelapwen

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @tbg)

pkg/sql/crdb_internal.go, line 1181 at r11 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I agree with everything you're saying, and should have clarified how this code came to be (my bad). Essentially @angelapwen had trouble getting the structured data to show up in her attempt to add a built-in to render the payloads from a given span, and while investigating I found the random bits of breakage that caused this. I want this PR to give her a solid foundation to work from, and that meant adding at least a rudimentary test confirming that the payloads to indeed flow from where they originate to where they need to be. Long story short - I would like Angela (with your help) to tackle the shortcomings you're describing. I don't think num_structured should remain in the table once we have a way to ask for the payloads of a specified trace. This was just the easiest thing I could do.

Acknowledged — this can be removed now that #60616 successfully finds a contention event from a trace via payload.

pkg/sql/crdb_internal.go, line 1187 at r17 (raw file):

				if err := addRow(
					// TODO(angelapwen): we're casting uint64s to int64 here,
					// is that ok?

Hm, I see what you mean - there could be some casting problems if the unsigned int was large enough. I think the DInt needs to be a int64 but we could potentially return an error if the ints are too large?

angelapwen · 2021-02-16T17:47:30Z

pkg/sql/sem/builtins/builtins.go, line 3595 at r17 (raw file):

			},
			Info: "Returns the current trace ID or an error if no trace is open.",
			// NB: possibly this is or could be made stable, but it's not worth it.

Nit: did you mean possibly this is not? Not sure what this means.

knz

Reviewed 1 of 1 files at r14, 1 of 1 files at r15, 1 of 1 files at r16, 7 of 7 files at r17.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @tbg)

pkg/sql/crdb_internal.go, line 1187 at r17 (raw file):

Previously, angelapwen (Angela P Wen) wrote…

Hm, I see what you mean - there could be some casting problems if the unsigned int was large enough. I think the DInt needs to be a int64 but we could potentially return an error if the ints are too large?

I think the conversion between uint and int in Go is lossless, i.e. it does not lose information. A large uint value will convert to a negative signed value, but the conversion remains bijective.

Since these are identifiers (and not quantities), we only care about the "identifying" nature of the values.

However, I do agree it may be confusing for UX. @tbg was there any reason to make the trace/span IDs unsigned?

pkg/sql/sem/builtins/builtins.go, line 3595 at r17 (raw file):

Previously, angelapwen (Angela P Wen) wrote…

Nit: did you mean possibly this is not? Not sure what this means.

I think the comment is correct, although tobias could have clarified "stable (i.e. non-volatile)".

(Angela: The volatibility attribute indicates whether the result of a built-in function is going to change throughout the execution of the statement. Since our trace IDs are fixed for an entire SQL txn, at lease for an entire SQL stmt, I believe like Tobias that they are stable and thus this built-in can be marked as non-volatile, but I also agree it's a hassle to test / validate and so to be conservative we can keep this as is.)

@tbg I think also it's worth filing an issue for the optimizer team to look at. The queries we'd like to advertise to consume trace data will require SQL joins. If this builtin is marked volatile, the query plans will be more complex. We need some research to investigate the penalty and set the volatility properly if the penalty is high.

tbg

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @angelapwen)

pkg/sql/crdb_internal.go, line 1187 at r17 (raw file):

Previously, knz (kena) wrote…

I think the conversion between uint and int in Go is lossless, i.e. it does not lose information. A large uint value will convert to a negative signed value, but the conversion remains bijective.

Since these are identifiers (and not quantities), we only care about the "identifying" nature of the values.

However, I do agree it may be confusing for UX. @tbg was there any reason to make the trace/span IDs unsigned?

They have historically been unsigned, and I don't think there's a good reason. However, we can't just change it due to the necessary migration. Concretely, I think where the uintness matters is here:

cockroach/pkg/util/tracing/tracer.go

Lines 550 to 551 in a1c3912

    
           carrier.Set(fieldNameTraceID, strconv.FormatUint(sm.traceID, 16)) 
        
           carrier.Set(fieldNameSpanID, strconv.FormatUint(sm.spanID, 16))

and here:

cockroach/pkg/util/tracing/tracer.go

Lines 604 to 615 in a1c3912

    
           case fieldNameTraceID: 
        
           	var err error 
        
           	traceID, err = strconv.ParseUint(v, 16, 64) 
        
           	if err != nil { 
        
           		return opentracing.ErrSpanContextCorrupted 
        
           	} 
        
           case fieldNameSpanID: 
        
           	var err error 
        
           	spanID, err = strconv.ParseUint(v, 16, 64) 
        
           	if err != nil { 
        
           		return opentracing.ErrSpanContextCorrupted 
        
           	}

So for 20.2 support, we have to keep it as a uint on the wire. Which should be fine! But honestly I'm not sure if there is enough of a reason here to make the change. uint64 is a fine choice and nobody likes IDs that have signs. Can we make the ID columns BIGINTs instead? I do agree that we want the representation of IDs to match between internal and SQL, or we're going to be very confused once we walk further down the path in which trace messages are "stored" via log messages.

Filed #60668

pkg/sql/sem/builtins/builtins.go, line 3595 at r17 (raw file):

Previously, knz (kena) wrote…

I think the comment is correct, although tobias could have clarified "stable (i.e. non-volatile)".

(Angela: The volatibility attribute indicates whether the result of a built-in function is going to change throughout the execution of the statement. Since our trace IDs are fixed for an entire SQL txn, at lease for an entire SQL stmt, I believe like Tobias that they are stable and thus this built-in can be marked as non-volatile, but I also agree it's a hassle to test / validate and so to be conservative we can keep this as is.)

@tbg I think also it's worth filing an issue for the optimizer team to look at. The queries we'd like to advertise to consume trace data will require SQL joins. If this builtin is marked volatile, the query plans will be more complex. We need some research to investigate the penalty and set the volatility properly if the penalty is high.

I think we should reach out when we have some actual queries, otherwise we're asking them in a vacuum.

tbg

Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @angelapwen, @irfansharif, and @knz)

pkg/sql/logictest/logic.go, line 567 at r23 (raw file):

Previously, RaduBerinde wrote…

[nit] 5node-pretend-59315-is-fixed

I didn't do this, was too scared to eat another cycle

RaduBerinde · 2021-02-17T22:49:05Z

pkg/sql/logictest/logic.go, line 567 at r23 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I didn't do this, was too scared to eat another cycle

Heh, no problem. Hopefully it will be fixed soon so we won't have to pretend anymore :)

59588: streamingest: add job level test for stream ingestion r=pbardea a=adityamaru This change adds a test that exercises all the components of the stream ingestion flow. It fixes some missing pieces that were discovered while writing the test. Informs: #59175 Release note: None 60424: sql: include sampled stats in TestSampledStatsCollection r=yuzefovich a=asubiotto Depends on #59992, which is required for this new regression test to pass. TestSampledStatsCollection would previously only check that stats that are collected regardless of the sample rate are returned. These types of stats (rows/bytes read) are propagated using metadata, rather than the trace. This resulted in us silently failing to collect any stats when sampling was enabled once the tracing mode was reverted back to legacy. To avoid this kind of thing happening again, this commit adds a check that max memory usage is reported to be non-zero. Release note: None (this is a new feature that has no user impact yet) 60626: kvserver: initialize propBuf LAI tracking r=andreimatei a=andreimatei The initialization of the LAI tracking in the proposal buffer seems pretty lacking (see #60625). This patch adds initialization of propBuf.liBase at propBuf.Init() time, which is irrelevant for production, but will help future tests which will surely want the a propBuf's first assigned LAIs to have some relationship to the replica state. Release note: None Co-authored-by: Aditya Maru <adityamaru@gmail.com> Co-authored-by: Alfonso Subiotto Marques <alfonso@cockroachlabs.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>

Fixes cockroachdb#60677, removing a stop-gap introduced in cockroachdb#59992. We were previously leaking "async consensus" spans, which was possible when a given proposal was never flushed out of the replica's proposal buffer. On server shut down, this buffered proposal was never finished, and thus the embedded span never closed. We now add a closer to clean up after ourselves. Release justification: bug fixes and low-risk updates to new functionality. Release note: None

We no longer return errors from ImportRemoteSpans, and we saw in the [past] that it is not a good idea to do so - it's easy to mess up cross-version behavior. [past]: #59992 (comment) Release justification: low-risk code clarification Release note: None

Fixes cockroachdb#60677, removing a stop-gap introduced in cockroachdb#59992. We were previously leaking "async consensus" spans, which was possible when a given proposal was never flushed out of the replica's proposal buffer. On server shut down, this buffered proposal was never finished, and thus the embedded span never closed. We now add a closer to clean up after ourselves. Release justification: bug fixes and low-risk updates to new functionality. Release note: None

@angelapwen

61279: kvserver: plug a tracing span leak r=irfansharif a=irfansharif Fixes #60677, removing a stop-gap introduced in #59992. We were previously leaking "async consensus" spans, which was possible when a given proposal was never flushed out of the replica's proposal buffer. On server shut down, this buffered proposal was never finished, and thus the embedded span never closed. We now add a closer to clean up after ourselves. Release justification: bug fixes and low-risk updates to new functionality. Release note: None --- +cc @angelapwen / @knz / @erikgrinaker for pod-visibility. 61344: sql: add estimated row count to all operators in EXPLAIN r=RaduBerinde a=RaduBerinde This change adds the estimated row count to all operators in the vanilla EXPLAIN, as long as the tables involved had statistics. Fixes #61310. Release justification: bug fixes and low-risk updates to new functionality Release note (sql change): EXPLAIN now shows estimated row counts for all operators even without VERBOSE (except when we don't have statistics for the tables). Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com>

We no longer return errors from ImportRemoteSpans, and we saw in the [past] that it is not a good idea to do so - it's easy to mess up cross-version behavior. [past]: #59992 (comment) Release justification: low-risk code clarification Release note: None

61327: tracing: don't return an error from ImportRemoteSpans r=erikgrinaker,angelapwen a=tbg We no longer return errors from ImportRemoteSpans, and we saw in the [past] that it is not a good idea to do so - it's easy to mess up cross-version behavior. [past]: #59992 (comment) Release justification: low-risk code clarification Release note: None Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>

We arrived at the previous default rate of 10% back in cockroachdb#59379. This was back when we were creating real tracing spans for all statements, and for sampled statements, we were propagating additional stats payloads. Consequently what cockroachdb#59379 ended up measuring (and finding the overhead acceptable) for was the performance hit we would incur for propagating stats payloads for statements already using real tracing spans. Since then, the landscape has changed. Notably we introduced cockroachdb#61777, which made it so that we were only using real tracing spans for sampled statements. This was done after performance analysis in cockroachdb#59424 showed that the use of real tracing spans in all statements resulted in tremendous overhead, for no real benefit. What this now leaves us with is a sampling rate that was tuned by only considering the stats payload overhead. What we want now is to also consider the overhead of using real tracing spans for sampled statements, vs. not. Doing this analysis gives us a very different picture for what the default rate should be. --- To find out what the overhead for sampled statements are currently, we experimented with kv95/enc=false/nodes=1/cpu=32. It's a simple benchmark that does little more than one-off statements, so should give us a concise picture of the sampling overhead. We ran six experiments in total (each corresponding to a pair of read+write rows), done in groups of three (each group corresponding to a table below). Each run in turn is comprised of 10 iterations of kv95, and what's varied between each run is the default sampling rate. We pin a sampling rate of 0.0 as the baseline that effectively switches off sampling entirely (and tracing), and measure the throughput degradation as we vary the sampling rate. ops/sec ops/sec --------------------|------------------|------------------ rate op grp | median diff | mean diff --------------------|------------------|------------------ 0.00 / read / #1 | 69817.90 | 69406.37 0.01 / read / #1 | 69300.35 -0.74% | 68717.23 -0.99% 0.10 / read / #1 | 67743.35 -2.97% | 67601.81 -2.60% 0.00 / write / #1 | 3672.55 | 3653.63 0.01 / write / #1 | 3647.65 -0.68% | 3615.90 -1.03% 0.10 / write / #1 | 3567.20 -2.87% | 3558.90 -2.59% ops/sec ops/sec --------------------|------------------|------------------ rate op grp | median diff | mean diff --------------------|------------------|------------------ 0.00 / read / #2 | 69440.80 68893.24 0.01 / read / #2 | 69481.55 +0.06% 69463.13 +0.82% (probably in the noise margin) 0.10 / read / #2 | 67841.80 -2.30% 66992.55 -2.76% 0.00 / write / #2 | 3652.45 3625.24 0.01 / write / #2 | 3657.55 -0.14% 3654.34 +0.80% 0.10 / write / #2 | 3570.75 -2.24% 3526.04 -2.74% The results above suggest that the current default rate of 10% is too high, and a 1% rate is much more acceptable. --- The fact that the cost of sampling is largely dominated by tracing is extremely unfortunate. We have ideas for how that can be improved (prototyped in cockroachdb#62227), but they're much too invasive to backport to 21.1. It's unfortunate that we only discovered the overhead this late in the development cycle. It was due to two major reasons: - cockroachdb#59992 landed late in the cycle, and enabled tracing for realsies (by propagating real tracing spans across rpc boundaries). We had done sanity checking for the tracing overhead before this point, but failed to realize that cockroachdb#59992 would merit re-analysis. - The test that alerted us to the degradation (tpccbench) had be persistently failing for a myriad of other reasons, so we didn't learn until too late that tracing was the latest offendor. tpccbench also doesn't deal with VM overload well (something cockroachdb#62361 hopes to address), and after tracing was enabled for realsies, this was the dominant failure mode. This resulted in perf data not making it's way to roachperf, which further hid possible indicators we had a major regression on our hands. We also didn't have a healthy process looking at roachperf on a continual basis, something we're looking to rectify going forward. We would've picked up on this regression had we been closely monitoring the kv95 charts. Release note: None

tbg requested review from irfansharif, angelapwen and knz February 8, 2021 09:46

tbg force-pushed the tracing-real-spans-across-rpc branch 2 times, most recently from c782a70 to 2f8a54a Compare February 8, 2021 10:00

tbg mentioned this pull request Feb 8, 2021

sql: use real contention events instead of the mock ones #59785

Merged

tbg force-pushed the tracing-real-spans-across-rpc branch from 2f8a54a to 49ff09c Compare February 8, 2021 14:27

irfansharif approved these changes Feb 8, 2021

View reviewed changes

tbg requested a review from irfansharif February 9, 2021 09:24

tbg commented Feb 9, 2021

View reviewed changes

tbg force-pushed the tracing-real-spans-across-rpc branch from 49ff09c to 2e713e2 Compare February 9, 2021 09:30

asubiotto mentioned this pull request Feb 10, 2021

sql: include sampled stats in TestSampledStatsCollection #60424

Merged

tbg mentioned this pull request Feb 15, 2021

tracing,sql: remove option to bypass the tracing registry #59315

Closed

tbg force-pushed the tracing-real-spans-across-rpc branch 3 times, most recently from f9f2103 to 849643a Compare February 15, 2021 13:27

knz reviewed Feb 15, 2021

View reviewed changes

tbg requested a review from knz February 16, 2021 11:25

tbg commented Feb 16, 2021

View reviewed changes

angelapwen reviewed Feb 16, 2021

View reviewed changes

angelapwen mentioned this pull request Feb 16, 2021

builtins: add builtin to retrieve the payload(s) for a span. #60616

Merged

knz approved these changes Feb 16, 2021

View reviewed changes

tbg requested a review from angelapwen February 17, 2021 11:12

tbg commented Feb 17, 2021

View reviewed changes

tbg requested a review from knz February 17, 2021 22:47

tbg commented Feb 17, 2021

View reviewed changes

angelapwen mentioned this pull request Feb 22, 2021

sql: expose always-on tracing inflight registry using vtable and builtins #55733

Closed

irfansharif mentioned this pull request Mar 1, 2021

kvserver: plug a tracing span leak #61279

Merged

tbg mentioned this pull request Mar 2, 2021

tracing: don't return an error from ImportRemoteSpans #61327

Merged

This was referenced Mar 10, 2021

roachtest: tpccbench/nodes=3/cpu=16 failed #59424

Closed

roachtest: tpccbench/nodes=3/cpu=16 failed #61973

Closed

roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [overload,closed ts regressing from X to Y] #61981

Closed

irfansharif mentioned this pull request Mar 18, 2021

bench: tpccbench regression between Feb 12th and 13th #62078

Closed

This was referenced Mar 21, 2021

20.2 → 21.1 roachperf benchmark regressions #62322

Closed

kv: investigate kv95 regression around Feb 18 #62148

Closed

kv: investigate kv95 performance after Feb 18 #62359

Closed

irfansharif mentioned this pull request Apr 1, 2021

sql: lower default sampling rate to 1% #62998

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tracing: propagate non-recording spans across rpc boundaries #59992

tracing: propagate non-recording spans across rpc boundaries #59992

tbg commented Feb 8, 2021

cockroach-teamcity commented Feb 8, 2021

irfansharif left a comment

tbg left a comment

tbg commented Feb 11, 2021

tbg commented Feb 12, 2021

tbg commented Feb 15, 2021

knz left a comment

tbg left a comment

angelapwen left a comment

angelapwen commented Feb 16, 2021

knz left a comment

tbg left a comment

tbg left a comment

RaduBerinde commented Feb 17, 2021

	carrier.Set(fieldNameTraceID, strconv.FormatUint(sm.traceID, 16))
	carrier.Set(fieldNameSpanID, strconv.FormatUint(sm.spanID, 16))

	case fieldNameTraceID:
	var err error
	traceID, err = strconv.ParseUint(v, 16, 64)
	if err != nil {
	return opentracing.ErrSpanContextCorrupted
	}
	case fieldNameSpanID:
	var err error
	spanID, err = strconv.ParseUint(v, 16, 64)
	if err != nil {
	return opentracing.ErrSpanContextCorrupted
	}

tracing: propagate non-recording spans across rpc boundaries #59992

tracing: propagate non-recording spans across rpc boundaries #59992

Conversation

tbg commented Feb 8, 2021

cockroach-teamcity commented Feb 8, 2021

irfansharif left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

tbg commented Feb 11, 2021

tbg commented Feb 12, 2021

tbg commented Feb 15, 2021

knz left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

angelapwen left a comment

Choose a reason for hiding this comment

angelapwen commented Feb 16, 2021

knz left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

RaduBerinde commented Feb 17, 2021