Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: improve memory usage by using metadata-only child span types #62020

Closed
irfansharif opened this issue Mar 15, 2021 · 3 comments
Closed
Labels
A-tracing Relating to tracing in CockroachDB. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@irfansharif
Copy link
Contributor

Describe the problem

#61777 was a late addition to 21.1 to bound the memory usage for the tracing package. It was necessary to address #59424, where we observed that creating real tracing spans for each statement resulted in a lot of memory pressure for not much real benefit.

With TPC-C 1k, we were observing ~1000 root spans but around ~100k total spans. Most local root spans had ~270 children. Almost all of these child spans are completely empty, and really could be "metadata only". We can imagine a child span type that only held a pointer to the root span, and a span ID:

type childSpan struct {
    spanID uint64 // and maybe operation name?
    rootSpan *Span
}

Today what we're doing is a lot more dire. Each child span is constructed using the full struct, which is quite large. It's comprised of the following structures:

type Span struct {
// Span itself is a very thin wrapper around spanInner whose only job is
// to guard spanInner against use-after-Finish.
i spanInner
numFinishCalled int32 // atomic
}

type spanInner struct {
tracer *Tracer // never nil
// Internal trace Span; nil if not tracing to crdb.
// When not-nil, allocated together with the surrounding Span for
// performance.
crdb *crdbSpan
// x/net/trace.Trace instance; nil if not tracing to x/net/trace.
netTr trace.Trace
// External opentracing compatible tracer such as lightstep, zipkin, jaeger;
// zero if not using one.
ot otSpan
}

// crdbSpan is a span for internal crdb usage. This is used to power SQL session
// tracing.
type crdbSpan struct {
traceID uint64 // probabilistically unique
spanID uint64 // probabilistically unique
parentSpanID uint64
goroutineID uint64
operation string
startTime time.Time
// logTags are set to the log tags that were available when this Span was
// created, so that there's no need to eagerly copy all of those log tags
// into this Span's tags. If the Span's tags are actually requested, these
// logTags will be copied out at that point.
//
// Note that these tags have not gone through the log tag -> Span tag
// remapping procedure; tagName() needs to be called before exposing each
// tag's key to a user.
logTags *logtags.Buffer
mu crdbSpanMu
testing *testingKnob
}

type crdbSpanMu struct {
syncutil.Mutex
// duration is initialized to -1 and set on Finish().
duration time.Duration
recording struct {
// recordingType is the recording type of the ongoing recording, if any.
// Its 'load' method may be called without holding the surrounding mutex,
// but its 'swap' method requires the mutex.
recordingType atomicRecordingType
logs sizeLimitedBuffer // of *tracingpb.LogRecords
structured sizeLimitedBuffer // of Structured events
// dropped is true if the span has capped out it's memory limits for
// logs and structured events, and has had to drop some. It's used to
// annotate recordings with the _dropped tag, when applicable.
dropped bool
// children contains the list of child spans started after this Span
// started recording.
children []*crdbSpan
// remoteSpan contains the list of remote child span recordings that
// were manually imported.
remoteSpans []tracingpb.RecordedSpan
}
// tags are only set when recording. These are tags that have been added to
// this Span, and will be appended to the tags in logTags when someone
// needs to actually observe the total set of tags that is a part of this
// Span.
// TODO(radu): perhaps we want a recording to capture all the tags (even
// those that were set before recording started)?
tags opentracing.Tags
// The Span's associated baggage.
baggage map[string]string
}

Expected behavior

That's all wildly excessive for most child spans that will never record anything, and could very well be "metadata only". Instead of #61407, but perhaps in the same spirit, we could "collapse" where the recordings are stored into the root span. In #59424 (comment), which an approximation of that idea, we observe that doing so can significantly relieve memory pressure (think "root span centered storage", which also has the nice property that there's no contention across traces).

As for the pesky fact that in regular crdb code we often want to grab a handle on a span from the available context (tracing.SpanFromContext), we don't want the return type be an interface that abstracts over the root span and child span type. The reason for that is there's no way to define methods on nil interfaces, where as for structs we can:

type Example struct {
}

func (e *Example) f() {
    if e == nil { }
}

Plumbing through interfaces would force callers to check whether or not the value is nil before doing anything. Luckily we have a work around. We could wrap the root span type in a stand-alone child span type that holds a reference to the root span, and has the same span ID. Given that, perhaps the type we want for the "child span" is type Span, and type rootSpan for the root, with rootSpanInner instead of today's spanInner.

@irfansharif irfansharif added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-tracing Relating to tracing in CockroachDB. labels Mar 15, 2021
@angelapwen angelapwen removed their assignment Apr 21, 2021
@angelapwen
Copy link

I'm tying up loose ends as I offboard this week, and unassigning myself for now. Thanks again for filing this Irfan 😸

@andreimatei
Copy link
Contributor

As for the pesky fact that in regular crdb code we often want to grab a handle on a span from the available context (tracing.SpanFromContext), we don't want the return type be an interface that abstracts over the root span and child span type.

I'm not sure that I follow this. If you want a no-op span accessed through an interface, you can have the interface be implemented by a *noopSpan struct{}, and you can allocate a singleton noopSpan.

@irfansharif
Copy link
Contributor Author

I tried some version of this as part of #64233 and realized I deluded myself writing this. While it's unfortunate child span objects are larger than they need to be, go's GC doesn't care too much about object size, just object count, and reducing total # of bytes allocated here doesn't much help performance. So it's unlikely we'll do anything here. What we did in #59424 (comment) by collapsing all child spans into the root ends up foregoing all trace structure and really reduces the utility tracing provides.

I'm not sure that I follow this.

That bit was more about not wanting to introduce a separate Span interface, working instead with the concrete types we have today. Reading it again, I'm not sure what I'm talking about either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tracing Relating to tracing in CockroachDB. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

No branches or pull requests

3 participants