Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Test Failure] Sync should continue if not all slaves dropped dual-channel-replication #1153

Closed
madolson opened this issue Oct 11, 2024 · 3 comments · Fixed by #1164
Closed
Assignees
Labels
test-failure An issue indicating a test failure

Comments

@madolson
Copy link
Member

Sync should continue if not all slaves dropped dual-channel-replication no in tests/integration/dual-channel-replication.tcl

https://github.com/valkey-io/valkey/actions/runs/11283922852/job/31417235703#step:5:7269

Doesn't specifically look like a dual channel issue, but seems to be a test issue.

@madolson madolson added the test-failure An issue indicating a test failure label Oct 11, 2024
@madolson
Copy link
Member Author

@madolson
Copy link
Member Author

@ranshid
Copy link
Member

ranshid commented Oct 14, 2024

Seems the issue is related to the fact that without dual-channel the replication COB is filled up during the test, for example:

30566:M 11 Oct 2024 01:04:08.153 # Client id=6 addr=127.0.0.1:63067 laddr=127.0.0.1:21984 fd=16 name= age=48 idle=48 flags=S db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=15865 omem=268435800 tot-mem=268437720 events=r cmd=psync user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=241 tot-net-out=22 tot-cmds=5 scheduled to be closed ASAP for overcoming of output buffer limits.

I think of 2 fixes:

  1. configure high COB limit for replica (256mb)
  2. There are hard status checks for the number of sync/partial_sync_ok. in caes there will be some random disconnects during the test, it might terminate flaky. I think we might want to change the checks to '>' instead of '=='

@naglera WDYT?

enjoy-binbin added a commit that referenced this issue Oct 14, 2024
…el-replication (#1164)

Sometimes when dual-channel is turned off the tested replica might
disconnect on COB overrun. disable the replica COB limit in order to
prevent such cases.

Fixes: #1153

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
eifrah-aws pushed a commit to eifrah-aws/valkey that referenced this issue Oct 20, 2024
…el-replication (valkey-io#1164)

Sometimes when dual-channel is turned off the tested replica might
disconnect on COB overrun. disable the replica COB limit in order to
prevent such cases.

Fixes: valkey-io#1153

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test-failure An issue indicating a test failure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants