-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(replication): slave blocks until keepalive timer is reached when master is gone without fin/rst notification #2662
Conversation
…master is gone without fin/rst notification
…recv_timeout configurable
…into fix-slave-block
By the way, can we add a test with master removed and about 5s' connection timeout in go integration? |
We can't distinguish whether the result is EOF or error if EvbufferRead returns an error. When the underlying I/O syscall returns EOF, the errno will not be set. So, I added a new EOF status to break the loop if the connection is EOF. @git-hulk @PragmaTwice please have a look |
…into fix-slave-block
Co-authored-by: Twice <twice@apache.org>
Co-authored-by: Twice <twice@apache.org>
Quality Gate failedFailed conditions |
If the master is lost, the replication thread will block until the keepalive timer is reached when receiving full-sync SST files. At the same time, if we execute a 'clusterx setnodes' command, it will hold an exclusive lock until the replication thread is stopped. This will cause all other worker threads to block.
My solution is to enable the socket read timeout on the file descriptor receiving SST files, if a timeout occurs and the replication thread is marked as stopped, the receiving action will be broken.