Cluster Scan - glide-core and python #1623

avifenesh · 2024-06-20T22:03:05Z

Scan command API impl in glide-core and command impl in python.
The PR contain the general implementation of the core communication with the command interface in redis-rs, translating the protobuf args into the desired args for the cluster scan command in redis-rs, which is a different API than the CMD interface and need different handling.
Contain the implementation of the container of the cursors ref, and the functionality of adding and removing the refs from the container on order to drop the object held by redis-rs.

On the python side, the PR contain the wrapper implementation which include wrapping the cursor id in a special object that implement the get and remove function, the remove function is part of del function of the object and will be triggered when the object is dropped by the user and the GC of the language clean it, which trigger del of the object.

It contain the implementation of SCAN for CME and SCAN for standalone.
Please notice that the commands are completely different, and performing completely different logic.

Profiled memory consumption differences between cluster scan to standalone scan with kinda big amount of keys, sets, hash's, and zsets - 10000 each, hence a lot of ScanState objects creation and storing in the CME SCAN.
For both added arbitrary sleeps, in order to create scenarios which the ScanState is held for a while. The differences are about 11%, which is exactly the same differences when setting the keys, hash's, sets and zset's, hence the differences are probably coming from the different type of servers and the work done when working in cluster, while the usage of the cluster scan logic and the ScanState object doesnt affect memory.

The PR is based on the work done in redis-rs, and the API provided by it. This PR can be find in the link below.
redis-rs impl of Cluster Scan

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

glide-core/src/client/mod.rs

glide-core/src/client/value_conversion.rs

glide-core/src/cluster_scan_container.rs

glide-core/src/protobuf/redis_request.proto

python/python/glide/async_commands/cluster_commands.py

python/python/glide/async_commands/command_args.py

python/python/glide/async_commands/cluster_commands.py

CHANGELOG.md

glide-core/src/cluster_scan_container.rs

glide-core/src/protobuf/redis_request.proto

python/python/glide/async_commands/cluster_commands.py

python/python/glide/async_commands/standalone_commands.py

python/python/glide/constants.py

CHANGELOG.md

python/python/glide/glide_client.py

python/src/lib.rs

glide-core/src/client/mod.rs

jduo · 2024-07-01T00:52:15Z

Hi @avifenesh ,

I'm working on the Java port of this in #1737.

I'm not sure if the memory management is quite right. When the PythonGC runs on a cursor object, we'll call back to Rust to remove the container for the cursor state, correct?

However every call to cluster_scan() creates a new ClusterScanCursor object, even if they are continuing the same cursor. Would each of these all point to the same cursor string?

If so, you've got duplicate ownership of the same cursor, so if any of them get GC'd before the user is done iterating through all cursors (perhaps the user reuses the same local variable to store cursors), the cursor could get removed prematurely.

The specific use case I think we can run into problems in is if the user runs cluster_scan, gets a Rust cursor and puts it in a local variable, runs cluster_scan again, assigns the updated cursor to a local variable, waits for awhile and the gc kicks in, then runs cluster_scan again. If the cursor is shared, it'll get terminated prematurely.

Perhaps what we need to do is change the function to return something more like an iterator that the user can close when they are done with:

The client could have a cluster_scan method that takes all parameters other than the cursor.
That function returns a cursor and no data directly.
The cursor object has an accessor for the current data.
The cursor object has a next() function that takes in the scan parameters and under-the-hood submits another scan request, then updates the data.
The cursor object has a method to close early and release resources. It should work with python with statements.
The cursor object has a method to check if it was the last result, like it currently does.
The cursor string is completely abstracted from the user.

This has the added benefit of allowing deterministic resource clean-up instead of relying on the garbage collector. The design above would work both in the case where the same cursor string is used by Rust or if it changes from call-to-call. It'd be the same API from the caller's perspective and the cursor-management mechanics would just be an implementation detail.

What do you think?

python/python/tests/test_async_client.py

glide-core/src/cluster_scan_container.rs

avifenesh · 2024-07-01T09:49:34Z

glide-core/src/protobuf/redis_request.proto

+    string cursor = 1;
+    optional string match_pattern = 2;
+    optional int64 count = 3;
+    optional string object_type = 4;


It is a bit problematic. Enum in protobuf is numbers, enum in python need to choose between string to enumerate (as far as i was able when tried)
I wanted to use the same enum in python for both standalone which has to strings, and both cluster mode, so i couldn't use an enumerate enum in python.
The first try was enum as you suggest but i then i tackle this issue.
Do you have any other solution?

jduo · 2024-07-01T10:02:02Z

Hi @avifenesh ,

I'm working on the Java port of this in #1737.

I'm not sure if the memory management is quite right. When the PythonGC runs on a cursor object, we'll call back to Rust to remove the container for the cursor state, correct?

However every call to cluster_scan() creates a new ClusterScanCursor object, even if they are continuing the same cursor. Would each of these all point to the same cursor string?

If so, you've got duplicate ownership of the same cursor, so if any of them get GC'd before the user is done iterating through all cursors (perhaps the user reuses the same local variable to store cursors), the cursor could get removed prematurely.

The specific use case I think we can run into problems in is if the user runs cluster_scan, gets a Rust cursor and puts it in a local variable, runs cluster_scan again, assigns the updated cursor to a local variable, waits for awhile and the gc kicks in, then runs cluster_scan again. If the cursor is shared, it'll get terminated prematurely.

Perhaps what we need to do is change the function to return something more like an iterator that the user can close when they are done with:
* The client could have a cluster_scan method that takes all parameters other than the cursor.

* That function returns a cursor and no data directly.

* The cursor object has an accessor for the current data.

* The cursor object has a next() function that takes in the scan parameters and under-the-hood submits another scan request, then updates the data.

* The cursor object has a method to close early and release resources. It should work with python with statements.

* The cursor object has a method to check if it was the last result, like it currently does.

* The cursor string is completely abstracted from the user.
This has the added benefit of allowing deterministic resource clean-up instead of relying on the garbage collector. The design above would work both in the case where the same cursor string is used by Rust or if it changes from call-to-call. It'd be the same API from the caller's perspective and the cursor-management mechanics would just be an implementation detail.

What do you think?

@avifenesh @barshaul , I've updated my PR at #1737 to implement the design proposed above. I've ported one of your integration tests to Java using this design as well to demonstrate how this works from a caller perspective.

With this:

The user never has to create a dummy starting cursor. They just ask for a cluster scan cursor from their cluster client.
The cursor API has a next() function that takes in the scan parameters. It asynchronously returns if there is more data available srver-side.
The cursor object updates itself with new cursor handles it receives.
The cursor object has an accessor for the data.
The cursor object can clean up the Rust side automatically when it gets the last cursor (sees "finished").
The user can explicitly clean up the cursor with a try-with-resources block. This is a standard practice when working with database cursors so it fits well here.
Two cursor objects can never be built from the same iteration of a server-side cursor, so there's never any concerns about shared ownership.

glide-core/src/socket_listener.rs

python/python/glide/async_commands/cluster_commands.py

python/python/glide/async_commands/standalone_commands.py

python/python/tests/test_async_client.py

barshaul

Left some last comments.
🚀🚀🚀

glide-core/src/socket_listener.rs

python/python/tests/test_async_client.py

Scan command for Glide-Core and Py

avifenesh requested a review from a team as a code owner June 20, 2024 22:03

avifenesh marked this pull request as draft June 20, 2024 22:03

avifenesh requested review from barshaul, asafpamzn, acarbonetto, shohamazon, ikolomi and eifrah-aws June 20, 2024 22:03

Yury-Fridlyand reviewed Jun 21, 2024

View reviewed changes

acarbonetto reviewed Jun 24, 2024

View reviewed changes

avifenesh force-pushed the Py/Commands/ClusterScan branch 6 times, most recently from 0d4ee26 to 61933b8 Compare June 27, 2024 16:35

avifenesh marked this pull request as ready for review June 27, 2024 16:35

avifenesh changed the title ~~draft of cluster scan impl~~ Cluster Scan - glide-core and python Jun 27, 2024

avifenesh force-pushed the Py/Commands/ClusterScan branch 3 times, most recently from f703258 to b253eda Compare June 27, 2024 16:42

Yury-Fridlyand added the python label Jun 27, 2024

Yury-Fridlyand reviewed Jun 27, 2024

View reviewed changes

avifenesh force-pushed the Py/Commands/ClusterScan branch 5 times, most recently from 42581bc to dcd6da0 Compare June 28, 2024 10:03

barshaul reviewed Jun 30, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

avifenesh force-pushed the Py/Commands/ClusterScan branch from dcd6da0 to 14693c9 Compare June 30, 2024 09:51

jduo reviewed Jun 30, 2024

View reviewed changes

python/python/glide/glide_client.py Outdated Show resolved Hide resolved

jduo reviewed Jun 30, 2024

View reviewed changes

python/src/lib.rs Outdated Show resolved Hide resolved

jduo reviewed Jun 30, 2024

View reviewed changes

glide-core/src/client/mod.rs Outdated Show resolved Hide resolved

jduo reviewed Jul 1, 2024

View reviewed changes

python/python/tests/test_async_client.py Outdated Show resolved Hide resolved

jduo reviewed Jul 1, 2024

View reviewed changes

glide-core/src/cluster_scan_container.rs Show resolved Hide resolved

avifenesh commented Jul 1, 2024

View reviewed changes