scan_iter family commands gives inconsistent result when using Sentinel connection pool #3197

agnesnatasya · 2024-04-01T06:51:31Z

Version: What redis-py and what redis version is the issue happening on?
redis-py 4.5.0

Platform: What platform / version? (For example Python 3.5.1 on Windows 7 / Ubuntu 15.10 / Azure)
Python 3.10

Description: Description of your issue, stack traces from errors and code that reproduces the issue

scan_iter family commands (scan_iter, sscan_iter, hscan_iter, zscan_iter) might give inconsistent result when the client is created using a connection pool, and when there are multiple concurrent requests.

Assume we have this setup

2 replicas, host A and host B
use SentinelConnectionPool to manage connections to different server
2 concurrent scan_iter commands, in which each will issue multiple scan commands. scan commands issued by these scan_iter commands are labelled scan (1) and scan (2) below.

What might happen is:

scan (1) is issued
scan (1) gets connection from the pool
- The pool is empty so it creates a new connection
- For sentinel connection pool, creating a new connection means getting the next replica in the connection_pool.rotate_slaves rotation.
- Since this can return any replicas on rotation, let's say it arbitrarily connects to host A
scan (1) executed at host A
scan (2) is issued in the meantime
scan (2) gets connection from the pool
- The pool is empty (there was 1 connection created but it's still in use)so it creates a new connection
- Get the next replica in the connection_pool.rotate_slaves rotation.
- Since this can return any replicas on rotation, let's say it arbitrarily connects to host B
scan (2) executed on host B
scan (1) is finished. Connection to host A is put back to the pool
scan (2) is finished. Connection to host B is put back to the pool
scan (1) gets connection from connection pool, it gets the connection to host B (since connection pool will just pop() the last element from the available connections)
scan (1) is executed on host B

Step 9 is the bug. All scan commands coming from the same scan_iter command needs to go to the same replica. This is because the 'state' of the scan_iter command is stored in the cursor and different replicas will store keys in a different order.
Hence, if we use the cursor from host A to do a scan on host B, we'll get an inconsistent result.

There are 3 different base implementations of a connection pool, ConnectionPool, SentinelConnectionPool and BlockingConnectionPool. All of them does something similar when getting a new connection from the pool. It creates a 'dummy' connection object, and call connection.connect(), which will actually connect to the intended replica.

There are 4 different implementations of a connection, Connection, SSLConnection, SentinelManagedConnection, and SentinelManagedSSLConnection.

For SentinelManagedConnection and SentinelManagedSSLConnection, this is fixable by making SentinelConnectionPool maintaining an id of the scan iter command to the host it has previously issued command to
For Connection and SSLConnection, connection.connect(), will depend on the impl of the connection class' .connect but by default will connect to self.host and self.port of the connection.

The text was updated successfully, but these errors were encountered:

This was referenced Apr 30, 2024

fix scan iter command issued to different replicas agnesnatasya/redis-py#1

Open

fix scan iter command issued to different replicas #3220

Open

gerzse self-assigned this May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scan_iter family commands gives inconsistent result when using Sentinel connection pool #3197

scan_iter family commands gives inconsistent result when using Sentinel connection pool #3197

agnesnatasya commented Apr 1, 2024

scan_iter family commands gives inconsistent result when using Sentinel connection pool #3197

scan_iter family commands gives inconsistent result when using Sentinel connection pool #3197

Comments

agnesnatasya commented Apr 1, 2024