Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(replication): slave blocks until keepalive timer is reached when master is gone without fin/rst notification #2662

Merged
merged 17 commits into from
Nov 16, 2024

Conversation

sryanyuan
Copy link
Contributor

If the master is lost, the replication thread will block until the keepalive timer is reached when receiving full-sync SST files. At the same time, if we execute a 'clusterx setnodes' command, it will hold an exclusive lock until the replication thread is stopped. This will cause all other worker threads to block.

My solution is to enable the socket read timeout on the file descriptor receiving SST files, if a timeout occurs and the replication thread is marked as stopped, the receiving action will be broken.

src/cluster/replication.cc Outdated Show resolved Hide resolved
kvrocks.conf Outdated Show resolved Hide resolved
kvrocks.conf Outdated Show resolved Hide resolved
@mapleFU
Copy link
Member

mapleFU commented Nov 15, 2024

By the way, can we add a test with master removed and about 5s' connection timeout in go integration?

git-hulk
git-hulk previously approved these changes Nov 15, 2024
@sryanyuan
Copy link
Contributor Author

sryanyuan commented Nov 15, 2024

We can't distinguish whether the result is EOF or error if EvbufferRead returns an error. When the underlying I/O syscall returns EOF, the errno will not be set. So, I added a new EOF status to break the loop if the connection is EOF.

434f148

@git-hulk @PragmaTwice please have a look

git-hulk
git-hulk previously approved these changes Nov 15, 2024
src/common/status.h Outdated Show resolved Hide resolved
src/common/io_util.cc Outdated Show resolved Hide resolved
src/common/io_util.cc Outdated Show resolved Hide resolved
Co-authored-by: Twice <twice@apache.org>
PragmaTwice
PragmaTwice previously approved these changes Nov 16, 2024
git-hulk
git-hulk previously approved these changes Nov 16, 2024
src/config/config.h Outdated Show resolved Hide resolved
src/config/config.h Outdated Show resolved Hide resolved
Co-authored-by: Twice <twice@apache.org>
@sryanyuan sryanyuan dismissed stale reviews from git-hulk and PragmaTwice via fdacabe November 16, 2024 12:20
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
37.9% Coverage on New Code (required ≥ 50%)

See analysis details on SonarQube Cloud

@PragmaTwice PragmaTwice merged commit 5e9db79 into apache:unstable Nov 16, 2024
31 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants