-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: span use-after-Finish in "batch flow coordinator" #75425
Comments
I think this problem is detected as #75334 when span reuse is enabled. |
Thanks for the help! But unfortunately it would appear that there's still some problem, even after #75479 :(. But hopefully whatever is still wrong is similar to what you've fixed before. The failures look like this (first stack is where the span is Finished(), the second one is the access after
and
|
Alright, I'll take another look. Previously, I didn't try running #75424 with the original fix since I thought I saw the problem clearly. What's confusing to me on the first glance is that we have a mix of the vectorized and row-based flows in the stack traces. |
I see what the problem is. First, we have a vectorized flow that is created in order to evaluate the query Now the question is what do we do about this. We could derive a new context with a separate span in it with something like:
We'll then need to do a similar trick in other places where one flow creates another flow that outlives its creator. I'm not sure whether there is a more principled way to do this, but I'll push this issue back to you Andrei to talk with CDC team on what to do. |
cc @cockroachdb/cdc |
Note that #75479 did fix an actual problem in the infrastructure owned by the SQL Queries, now it seems that we need to fix things in the users of that infrastructure. |
Fixing in #76084 |
76084: sql: run planHooks in a new tracing span r=andreimatei a=andreimatei For some reason, plan hooks run in a new goroutine; that goroutine communicates results through a channel, that's adapted to look like a planNode by the hookFnNode. Before this patch, that goroutine was capturing the caller's ctx, and hence the caller's tracing span. This was leading to span-use-after-Finish because the goroutine in question was sometimes outliving the caller. Although not written anywhere, I think the expectation is that generally the goroutine will not outlive the the hookFnNode by much, since hookFnNode.Next() is supposed to be consumed fully. Still, at the very least, there were races related to ctx cancellation, as the hookFnNode listens for cancellation in parallel with the goroutine running (which is probably a bad idea). This patch fixes ths span use-after-finish by giving the goroutine a new span. I took the opportunity to give all the plan hooks names, and use that as the span name. Since these hooks are all weirdos, it seems like a good idea to particularly reflect them in the trace, regardless of the bug that's being fixed. Fixes #75425 Release note: None 76178: roachtest: add new pgjdbc passing tests r=rafiss a=rafiss fixes #76061 Release note: None Co-authored-by: Andrei Matei <andrei@cockroachlabs.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
#75424 adds some more span use-after-Finish detection. With that patch, the
streamproducer
package immediately finds a violation. The span that's being used after having beenFinish()
ed is thebatch flow coordinator
one.@yuzefovich will you please help me with this? :)
The span was finished at:
and later used at:
interestingly, if I run a race test, I get a different scenario, but for the same span:
The text was updated successfully, but these errors were encountered: