-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug Report: CheckMySQL indefinitely gets stuck if there is a long running callback in StreamExecute #11915
Comments
Trying to fix this issue isn't trivial. Here are my findings (and issues that have presented themselves).
// In StreamExecute
ctx, cancel := context.WithCancel(ctx)
defer cancel()
tabletserver.ContextCancels = append(tabletserver.ContextCancels, cancel)
err = q.server.StreamExecute(ctx, request.Target, request.Query.Sql, request.Query.BindVariables, request.TransactionId, request.ReservedId, request.Options, func(reply *sqltypes.Result) error {
errChan := make(chan error)
log.Errorf("Sending a stream now with %d rows", len(reply.Rows))
go func() {
errChan <- stream.Send(&querypb.StreamExecuteResponse{
Result: sqltypes.ResultToProto3(reply),
})
}()
select {
case err2 := <-errChan:
log.Errorf("Finished sending a stream now, error = %v", err2)
return err2
case <-ctx.Done():
return ctx.Err()
}
})
// And in closeAll we also call cancelContexts before waiting for empty requests
var ContextCancels []context.CancelFunc
func cancelContexts() {
for _, cancel := range ContextCancels {
cancel()
}
} The code above is just a poc, it isn't production grade. I don't even cleanup the cancel funcs after use. But I have verified that this indeed does unblock CheckMySQL. |
On further investigation, me and @harshit-gangal have gleaned more information. @harshit-gangal had an epiphany and remembered an old bug that we had a long time ago - #5497. We verified that this is the underlying issue! If a mysql client is connected to a vtgate, and it executes a streaming query and proceeds to close the entire connection, then we see vtgate and vttablets are stuck in The steps for reproduction are as follows -
Notice that vtgate is indefinitely stuck on -
and so it the vttablet -
We had wireshark running too when we ran this test and we see that when the terminal is closed, we get a The problem therefore is immediately evident, vtgate is indefinitely waiting on trying to send a packet on a connection which no longer has a listener. |
Overview of the Issue
CheckMySQL can get indefinitely stuck if there is a long-running callback in StreamExecute.
The way that
StreamExecute
rpc from vtgate works is as follows -vtgate and vttablet communicate via grpc and vttablets call
stream.Send
while vtgates call the counter-partstream.Recv
.vtgates after receiving a result set, run the callback function on it. Let's say that the callback function is a long-running one and would take a lot of time.
vttablet on the other hand would keep getting results from MySQL and call
stream.Send
repeatedly. According to the docs ofstream.Send
-even though SendMsg doesn't block for the receiver to receive the message, it does block for flow control. So if the streaming query requires a lot of results to be sent, the first few calls don't block, but the next one does, because of flow control.
If MySQL crashes now, StreamExecute continues to stay blocked. Now since StreamExecute is blocked on this call, it doesn't finish its execution of
execRequest
. This causes the wait for requests to be blocked until the callback function from vtgate doesn't complete. When it does, one more message is received, clearing more space in the buffer for StreamExecute. This causes StreamExecute to attempt to read one more packet from MySQL and finally fail unblocking everything.Reproduction Steps
select * from table
as the query and give it a very long callback. (Add time.Sleep to it, etc)Binary Version
Operating System and Environment details
Log Fragments
No response
The text was updated successfully, but these errors were encountered: