-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid retrying on IO errors when it’s unclear if the server received the request #2479
Avoid retrying on IO errors when it’s unclear if the server received the request #2479
Conversation
13a11da
to
53bfd90
Compare
…the request Signed-off-by: barshaul <barshaul@amazon.com>
53bfd90
to
fe9b603
Compare
Signed-off-by: barshaul <barshaul@amazon.com>
@@ -7,7 +7,7 @@ use crate::connection::{ | |||
resp2_is_pub_sub_state_cleared, resp3_is_pub_sub_state_cleared, ConnectionAddr, ConnectionInfo, | |||
Msg, RedisConnectionInfo, | |||
}; | |||
#[cfg(any(feature = "tokio-comp"))] | |||
#[cfg(feature = "tokio-comp")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch 👍🏽
@@ -83,7 +83,7 @@ macro_rules! reconnect_if_dropped { | |||
macro_rules! reconnect_if_io_error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: macro should be renamed to reflect the change: e.g. reconnect_if_conn_dropped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll change it, but in a different PR we need to remove this whole file - we have no use in redis-rs's connection_manager
Signed-off-by: barshaul <barshaul@amazon.com>
Signed-off-by: barshaul <barshaul@amazon.com>
…the request (#2479) * Avoid retrying on IO errors when it’s unclear if the server received the request Signed-off-by: barshaul <barshaul@amazon.com>
…the request (valkey-io#2479) * Avoid retrying on IO errors when it’s unclear if the server received the request Signed-off-by: barshaul <barshaul@amazon.com> Signed-off-by: avifenesh <aviarchi1994@gmail.com>
…the request (valkey-io#2479) * Avoid retrying on IO errors when it’s unclear if the server received the request Signed-off-by: barshaul <barshaul@amazon.com>
…the request (valkey-io#2479) * Avoid retrying on IO errors when it’s unclear if the server received the request Signed-off-by: barshaul <barshaul@amazon.com>
…the request (#2479) * Avoid retrying on IO errors when it’s unclear if the server received the request Signed-off-by: barshaul <barshaul@amazon.com>
Main Change:
ATM, on I/O errors, we would reconnect to the failed node and retry the request if there were more retries left. This approach had a critical issue: we couldn’t reliably determine if the server had already received the request before the connection was broken. Retrying in such cases could result in duplicate command execution.
Example:
INCR key
.INCR key
.This PR differentiates between errors where it’s safe to retry and those where it’s not. Specifically, with multiplexed connections, if the
send
function returns an error, it guarantees that the server never received the data, making retries safe (see https://docs.rs/tokio/latest/tokio/sync/mpsc/struct.Sender.html#method.send). For other errors, where we can’t be certain, retries are unsafe and will not be automatically attempted. Instead, these errors will now be returned to the user, who must manually retry if they determine it’s safe.Test Changes:
Since I/O errors are now returned to the user, tests that previously killed the server now loop to retry the request, simulating the handling of I/O errors on the user side.
Refresh Slots Change:
While testing this fix, I found that when all connections were unavailable and
refresh_slots
was called, it didn’t raise the expectedallConnectionsUnavailable
error. This has been fixed by updatingrandom_connections
function to return anOption
. Now, if no connections are found,refresh_slots
raises theallConnectionsUnavailable
error immediately. The state then shifts to reconnecting to the initial nodes, and slot refreshes are attempted using the new connections.Out of Scope:
A future PR could introduce a new configuration option to enable retries on connection error, allowing users to control this behavior at the client level.