Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop,log: tracing improvements #70370

Merged
merged 3 commits into from
Oct 8, 2021
Merged

Commits on Oct 8, 2021

  1. util/tracing: introduce the sterile span

    This patch introduces a new options for tracing spans: a sterile span
    does not admit child spans. Trying to create a child of a sterile span
    results in a root span. In other words, in:
    
    parent := t.StartSpan(WithSterile())
    child1 := t.StartSpan(WithParentAndAutoCollection(parent))
    child2 := t.StartSpan(WithParentAndManualCollection(parent.Meta()))
    
    both child1 and child2 are root spans.
    
    This is useful for modelling long-running operations that don't want a
    relationship with all the sub-operations they spawn (think long-running
    stopper tasks or pgwire connections). In particular, these long
    operations don't want to accumulate the recordings of their children,
    and they more generally don't want to be associated with their children
    in a trace collector because that leads to infinitely long traces which
    don't display well. For example, Zipkin doesn't make a distinction
    between our ChildOf and FollowsFrom relationships, displaying both types
    of relationships the same way. This makes long-running tasks with
    children quite awkward.  Given the tools at our disposal, maintaining
    the relationship between random operations and their long-running
    parents is more trouble than it's worth. The modelling of different
    kinds of relationships between spans is an active topic of debate in the
    OpenTelemetry project, so in the future I expect that we'll take note of
    what has been decided there and apply it to our tracing.
    
    Without this new option, it is difficult to decouple a span from what
    would otherwise be its parent. I've explored different options that push
    the responsibility of detaching to the point where the would-be child is
    created, but the practicality sucks. Sometimes the code creating a span
    doesn't know what context it's running in, and so it can't really make a
    decision about whether it wants its span to be a child or not (i.e.
    sometimes it wants a child, sometimes it doesn't). Instead, this patch
    let's the parent brutally say "I never want any children", and then the
    sub-operations can use functions like EnsureChildSpan() to get a child
    span or a root span depending on the decision the parent made for them.
    
    The next commits start using this new option for long-running stopper
    tasks and such.
    
    Release note: None
    andreimatei committed Oct 8, 2021
    Configuration menu
    Copy the full SHA
    fef8c39 View commit details
    Browse the repository at this point in the history
  2. stop: add more span options for RunAsyncTask

    Before this patch, RunAsyncTask would not create a span for the task is
    the caller didn't have a span. If the caller did have a span, the method
    could create either a child or a follows-from span.
    
    This patch makes RunAsyncTask create a root span by default when the
    caller doesn't have a span. This seems like the right thing to do; async
    tasks should generally have spans. Then, the patch adds a new options
    about the kind of span to be created: a sterile span (as introduced in
    the previous commit). The sterile option results in the task having a
    sterile root span. The sterile option is used for some long-running
    tasks started under the server's startup span. Having a relationship
    between these tasks and the server's statup span didn't serve any
    purpose, and made the startup span look very confusing in Zipkin (Zipkin
    displays the follows-from spans as regular children) Similarly, having a
    relationship between the children of these long-running tasks and the
    long-running task was also a mostly bad thing.
    
    Release note: None
    andreimatei committed Oct 8, 2021
    Configuration menu
    Copy the full SHA
    2ad2bee View commit details
    Browse the repository at this point in the history
  3. kvserver: improve tracing for queues

    This patch makes the purgatories' spans be sterile, just like the
    regular queue task already was. This means that the span representing
    the processing of individual ranges in the purgatory are no longer
    children of the long-running purgatory operation.
    
    Also, this patch clean up the creation of spans for the processing of a
    replica. Before the span creation was awkward because it was using the
    queue's AmbientCtx after a lower-level AmbientCtx (the replica's) had
    been used to annotate the ctx.
    
    Release note: None
    andreimatei committed Oct 8, 2021
    Configuration menu
    Copy the full SHA
    e0e83ca View commit details
    Browse the repository at this point in the history