Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python: cluster-require-full-coverage no - scan fail #2436

Open
raphaelauv opened this issue Oct 11, 2024 · 6 comments
Open

python: cluster-require-full-coverage no - scan fail #2436

raphaelauv opened this issue Oct 11, 2024 · 6 comments

Comments

@raphaelauv
Copy link

raphaelauv commented Oct 11, 2024

Describe the bug

When I scan a redis cluster with a missing node , the scan do not work even if the redis cluster is setup with

cluster-require-full-coverage no

Expected Behavior

with cluster-require-full-coverage no a redis scan should work with a missing node , like this ->

image

Current Behavior

scan fail with a log on the missing node

  File "/usr/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/raphael/REPO/xx/yy/src/redis_clean.py", line 42, in main2
    cursor, keys = await client.scan(cursor,match=match_regex, count=100)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/raphael/REPO/xx/yy/venv/lib/python3.11/site-packages/glide/async_commands/cluster_commands.py", line 1139, in scan
    await self._cluster_scan(cursor, match, count, type),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/raphael/REPO/xx/yy/venv/lib/python3.11/site-packages/glide/glide_client.py", line 562, in _cluster_scan
    response = await self._write_request_await_response(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/raphael/REPO/xx/yy/venv/lib/python3.11/site-packages/glide/glide_client.py", line 446, in _write_request_await_response
    await response_future
glide.exceptions.ConnectionError: Received connection error `Requested connection not found - ConnectionNotFoundForRoute: 172.20.0.2:6379`. Will attempt to reconnect

Reproduction Steps

image

import asyncio
import logging
from logging import getLogger
from glide import GlideClusterClientConfiguration, NodeAddress, GlideClusterClient, ClusterScanCursor

logger = getLogger(__name__)
logging.basicConfig(level=logging.INFO)



async def main2():
    addresses = [NodeAddress("localhost", 9379)]
    config = GlideClusterClientConfiguration(addresses)
    client = await GlideClusterClient.create(config)
    cursor = ClusterScanCursor()
    all_keys = []
    while not cursor.is_finished():
        cursor, keys = await client.scan(cursor,match="{toto}*", count=100)
        all_keys.extend(keys)
    print(all_keys)

if __name__ == '__main__':
    asyncio.run(main2())

Possible Solution

No response

Additional Information/Context

No response

Client version used

1.1.0

Engine type and version

8.0.1

OS

ubuntu 22.04

Language

Python

Language Version

3.11

Cluster information

No response

Logs

No response

Other information

No response

@raphaelauv raphaelauv added the bug Something isn't working label Oct 11, 2024
@avifenesh
Copy link
Collaborator

@raphaelauv Hi,
We discussed it when creating the cluster scan, which is not usual scan.
Can you explain your use-case?
You happened to have a wrong state of the cluster and wanted to keep scanning or is it a state you aware of ahead?

Less relevant, but the improvement of scan with match can be done (internally by Valkey) per node, but not per cluster.
At this point there's no built in functionality of wide scanning of a cluster.

@raphaelauv
Copy link
Author

hi, thanks @avifenesh

we want our scan to work even with missing nodes (we have big cluster(s) and a missing node is a common thing)

would be great that we can explicitly use scan with a cluster-require-full-coverage at false

@avifenesh
Copy link
Collaborator

we want our scan to work even with missing nodes (we have big cluster(s) and a missing node is a common thing)

The challenge with missing nodes and cluster scan is that in such cases we can't provide any guarantee about the validity of the scan.

Since currently there's no cluster wide scan provided by Valkey, we need to create some complex logic to provide this kind of scan with the guarantees of scan.
The way to do that is to track the covered slots.
In case a cluster has missing slots, there's no way we can validate that the scan is over and everything is covered.

The solution in such a case would probably be to iterate blindly over the cluster nodes and to ignore nodes that have connection issues, without giving guarantees, and users will need to choose this scan type ahead with an optional flag or so.

In your case, you would prefer to use a scan with no guarantees over a scan with guarantees which can't scan a not covered cluster?

I'm just trying to get the picture and to understand the needs better.

If you are fine with sharing this information, i would also like to hear why a case of missing nodes (a full shard, in case you have replicas) is common.
I have some experience, and missing shards are not supposed to be common.
If you prefer to take the specific privately, we can chat in Valkey discord, and you can approach me there in private as well.

@raphaelauv
Copy link
Author

raphaelauv commented Oct 11, 2024

The solution in such a case would probably be to iterate blindly over the cluster nodes and to ignore nodes that have connection issues, without giving guarantees, and users will need to choose this scan type ahead with an optional flag or so.

yes this is the missing feature

@avifenesh avifenesh self-assigned this Oct 11, 2024
@avifenesh avifenesh added Feature Rust core 1_3_candidate and removed bug Something isn't working labels Oct 11, 2024
@avifenesh avifenesh added this to the 1.3 milestone Oct 11, 2024
@asafpor
Copy link

asafpor commented Oct 12, 2024

Avi is there any workaround we can provide?

@avifenesh
Copy link
Collaborator

@asafpor I will check. Can also try to speed up a patch.

@raphaelauv is it blocking you currently from working with Glide?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

3 participants