Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colflow: clean up vectorized stats propagation #61380

Merged
merged 1 commit into from
Mar 4, 2021

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Mar 3, 2021

Previously, in order to propagate execution statistics we were creating
temporary tracing spans, setting the stats on them, and finishing the
spans right away. This allowed for using (to be more precise, abusing)
the existing infrastructure. The root of the problem is that in the
vectorized engine we do not start per-operator span if stats collection
is enabled at the moment, so we had to get around that limitation.

However, this way is not how tracing spans are intended to be used and
creates some performance slowdown in the hot path, so this commit
refactors the situation. Now we are ensuring that there is always
a tracing span available at the "root" components (either root
materializer or an outbox), so when root components are finishing the
vectorized stats collectors for their subtree of operators, there is
a span to record the stats into.

This required the following minor adjustments:

  • in the materializer, we now delegate attachment of the stats to the
    tracing span to the drainHelper (which does so on ConsumerDone). Note
    that the drainHelper doesn't get the recording from the span and leaves
    that to the materializer (this is needed in order to avoid collecting
    duplicate trace data).
  • in the outbox, we now start a "remote child span" (if there is a span
    in the parent context) in the beginning of Run method, and we attach
    that stats in sendMetadata.

Addresses: #59379.
Fixes: #59555.

Release justification: low-risk update to existing functionality.

Release note: None

@yuzefovich yuzefovich added the do-not-merge bors won't merge a PR with this label. label Mar 3, 2021
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the span-improvements branch 2 times, most recently from 3e25dcb to e1a5889 Compare March 3, 2021 00:46
@yuzefovich yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Mar 3, 2021
@yuzefovich yuzefovich requested review from asubiotto, a team and RaduBerinde March 3, 2021 00:46
@yuzefovich yuzefovich force-pushed the span-improvements branch 2 times, most recently from 03279ec to f31dda0 Compare March 3, 2021 02:57
Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 22 of 25 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @yuzefovich)


pkg/sql/colexec/materializer.go, line 223 at r1 (raw file):

	// trailing meta callback, so we tell the ProcessorBase to skip trace data.
	// If we don't do so, then we would get duplicate traces.
	m.SkipTraceData = true

I don't like this knob. The underlying problem is that we pass on the recording before adding the stats. I think there are two better ways to fix this:

  1. Add stats in drainHelper's ConsumerDone method above. This will be called before the trailing meta stage.
  2. Call StartInternalNoSpan on the ProcessorBase in Start and do manual span management in the materializer.

I think 1) will be cleaner, but I prefer either to having to add a new knob to ProcessorBase if we can avoid it, because I see this as a specific need rather than a more general one.


pkg/sql/colflow/vectorized_flow.go, line 365 at r1 (raw file):

	vectorizedStatsCollectors []vectorizedStatsCollector,
) []*execinfrapb.ComponentStats {
	result := make([]*execinfrapb.ComponentStats, 0, len(vectorizedStatsCollectors))

Should we pool this?


pkg/sql/colflow/vectorized_flow.go, line 992 at r1 (raw file):

			// appended to.
			vscq := append([]vectorizedStatsCollector(nil), s.vectorizedStatsCollectorsQueue...)
			getStats = func() []*execinfrapb.ComponentStats {

nit: I wonder if keeping this as a callback is the best way forward. It seems a bit hacky when we could just pass down the vectorized stats collectors and they could be finished by the outbox/materializer just like metadata sources are drained explicitly, instead of passing down a callback that will drain metadata sources. I don't feel strongly about this though, but it might be nice to keep "execution" code outside the planning part.


pkg/sql/colflow/colrpc/outbox.go, line 306 at r1 (raw file):

			}
		}
		o.span.Finish()

nit: seems like you could also remove these two lines and rely on the defer above since we don't need to Finish the span before grabbing the recording.


pkg/sql/logictest/testdata/logic_test/inverted_index_geospatial, line 64 at r1 (raw file):

            └── • scan
                  cluster nodes: <hidden>
                  actual row count: 4

🤔 is this right?

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @RaduBerinde)


pkg/sql/colexec/materializer.go, line 223 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I don't like this knob. The underlying problem is that we pass on the recording before adding the stats. I think there are two better ways to fix this:

  1. Add stats in drainHelper's ConsumerDone method above. This will be called before the trailing meta stage.
  2. Call StartInternalNoSpan on the ProcessorBase in Start and do manual span management in the materializer.

I think 1) will be cleaner, but I prefer either to having to add a new knob to ProcessorBase if we can avoid it, because I see this as a specific need rather than a more general one.

Thanks for the suggestions, I implemented option 1.


pkg/sql/colflow/vectorized_flow.go, line 365 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Should we pool this?

This is a perf optimization that is somewhat orthogonal to the cleanup. Left a TODO for this.


pkg/sql/colflow/vectorized_flow.go, line 992 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

nit: I wonder if keeping this as a callback is the best way forward. It seems a bit hacky when we could just pass down the vectorized stats collectors and they could be finished by the outbox/materializer just like metadata sources are drained explicitly, instead of passing down a callback that will drain metadata sources. I don't feel strongly about this though, but it might be nice to keep "execution" code outside the planning part.

I prefer the callback because it allows us to hide everything about the stats collectors from all other code. Namely, this allows us to keep the stats code in colflow package and expose only this callback to colexec and colrpc. I think such layout is acceptable because we only two places need to collect stats whereas metadata sources is more general concept which we need to hide behind an interface.


pkg/sql/colflow/colrpc/outbox.go, line 306 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

nit: seems like you could also remove these two lines and rely on the defer above since we don't need to Finish the span before grabbing the recording.

Done. I originally thought that it is a good practice to Finish the span before getting the recording, but it is not necessary.


pkg/sql/logictest/testdata/logic_test/inverted_index_geospatial, line 64 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

🤔 is this right?

I was also surprised to see this change, and now it is correct (see that KV rows read: 4), also confirmed with vec-off config.

I didn't dive into why it was wrong before though - the vectorized stats propagation was probably broken in some edge cases.

Previously, in order to propagate execution statistics we were creating
temporary tracing spans, setting the stats on them, and finishing the
spans right away. This allowed for using (to be more precise, abusing)
the existing infrastructure. The root of the problem is that in the
vectorized engine we do not start per-operator span if stats collection
is enabled at the moment, so we had to get around that limitation.

However, this way is not how tracing spans are intended to be used and
creates some performance slowdown in the hot path, so this commit
refactors the situation. Now we are ensuring that there is always
a tracing span available at the "root" components (either root
materializer or an outbox), so when root components are finishing the
vectorized stats collectors for their subtree of operators, there is
a span to record the stats into.

This required the following minor adjustments:
- in the materializer, we now delegate attachment of the stats to the
tracing span to the drainHelper (which does so on `ConsumerDone`). Note
that the drainHelper doesn't get the recording from the span and leaves
that to the materializer (this is needed in order to avoid collecting
duplicate trace data).
- in the outbox, we now start a "remote child span" (if there is a span
in the parent context) in the beginning of `Run` method, and we attach
that stats in `sendMetadata`.

Release justification: low-risk update to existing functionality.

Release note: None
Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 25 files at r1, 4 of 4 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto and @RaduBerinde)

@yuzefovich
Copy link
Member Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 4, 2021

Build succeeded:

@craig craig bot merged commit d7748a9 into cockroachdb:master Mar 4, 2021
@yuzefovich yuzefovich deleted the span-improvements branch March 4, 2021 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: incomplete vectorized traces
3 participants