Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster Scan - glide-core and python #1623

Merged
merged 1 commit into from
Jul 2, 2024

Conversation

avifenesh
Copy link
Collaborator

@avifenesh avifenesh commented Jun 20, 2024

Scan command API impl in glide-core and command impl in python.
The PR contain the general implementation of the core communication with the command interface in redis-rs, translating the protobuf args into the desired args for the cluster scan command in redis-rs, which is a different API than the CMD interface and need different handling.
Contain the implementation of the container of the cursors ref, and the functionality of adding and removing the refs from the container on order to drop the object held by redis-rs.

On the python side, the PR contain the wrapper implementation which include wrapping the cursor id in a special object that implement the get and remove function, the remove function is part of del function of the object and will be triggered when the object is dropped by the user and the GC of the language clean it, which trigger del of the object.

It contain the implementation of SCAN for CME and SCAN for standalone.
Please notice that the commands are completely different, and performing completely different logic.

Profiled memory consumption differences between cluster scan to standalone scan with kinda big amount of keys, sets, hash's, and zsets - 10000 each, hence a lot of ScanState objects creation and storing in the CME SCAN.
For both added arbitrary sleeps, in order to create scenarios which the ScanState is held for a while. The differences are about 11%, which is exactly the same differences when setting the keys, hash's, sets and zset's, hence the differences are probably coming from the different type of servers and the work done when working in cluster, while the usage of the cluster scan logic and the ScanState object doesnt affect memory.

The PR is based on the work done in redis-rs, and the API provided by it. This PR can be find in the link below.
redis-rs impl of Cluster Scan

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@avifenesh avifenesh requested a review from a team as a code owner June 20, 2024 22:03
@avifenesh avifenesh marked this pull request as draft June 20, 2024 22:03
glide-core/src/client/mod.rs Show resolved Hide resolved
glide-core/src/client/value_conversion.rs Outdated Show resolved Hide resolved
glide-core/src/client/value_conversion.rs Outdated Show resolved Hide resolved
glide-core/src/cluster_scan_container.rs Show resolved Hide resolved
glide-core/src/protobuf/redis_request.proto Outdated Show resolved Hide resolved
python/python/glide/async_commands/cluster_commands.py Outdated Show resolved Hide resolved
python/python/glide/async_commands/cluster_commands.py Outdated Show resolved Hide resolved
python/python/glide/async_commands/cluster_commands.py Outdated Show resolved Hide resolved
@avifenesh avifenesh force-pushed the Py/Commands/ClusterScan branch 6 times, most recently from 0d4ee26 to 61933b8 Compare June 27, 2024 16:35
@avifenesh avifenesh marked this pull request as ready for review June 27, 2024 16:35
@avifenesh avifenesh changed the title draft of cluster scan impl Cluster Scan - glide-core and python Jun 27, 2024
@avifenesh avifenesh force-pushed the Py/Commands/ClusterScan branch 3 times, most recently from f703258 to b253eda Compare June 27, 2024 16:42
CHANGELOG.md Outdated Show resolved Hide resolved
glide-core/src/cluster_scan_container.rs Show resolved Hide resolved
glide-core/src/protobuf/redis_request.proto Show resolved Hide resolved
python/python/glide/async_commands/cluster_commands.py Outdated Show resolved Hide resolved
python/python/glide/async_commands/cluster_commands.py Outdated Show resolved Hide resolved
python/python/glide/async_commands/cluster_commands.py Outdated Show resolved Hide resolved
python/python/glide/async_commands/standalone_commands.py Outdated Show resolved Hide resolved
python/python/glide/constants.py Outdated Show resolved Hide resolved
@avifenesh avifenesh force-pushed the Py/Commands/ClusterScan branch 5 times, most recently from 42581bc to dcd6da0 Compare June 28, 2024 10:03
CHANGELOG.md Outdated Show resolved Hide resolved
python/src/lib.rs Outdated Show resolved Hide resolved
glide-core/src/client/mod.rs Outdated Show resolved Hide resolved
@jduo
Copy link
Collaborator

jduo commented Jul 1, 2024

Hi @avifenesh ,

I'm working on the Java port of this in #1737.

I'm not sure if the memory management is quite right. When the PythonGC runs on a cursor object, we'll call back to Rust to remove the container for the cursor state, correct?

However every call to cluster_scan() creates a new ClusterScanCursor object, even if they are continuing the same cursor. Would each of these all point to the same cursor string?

If so, you've got duplicate ownership of the same cursor, so if any of them get GC'd before the user is done iterating through all cursors (perhaps the user reuses the same local variable to store cursors), the cursor could get removed prematurely.

The specific use case I think we can run into problems in is if the user runs cluster_scan, gets a Rust cursor and puts it in a local variable, runs cluster_scan again, assigns the updated cursor to a local variable, waits for awhile and the gc kicks in, then runs cluster_scan again. If the cursor is shared, it'll get terminated prematurely.

Perhaps what we need to do is change the function to return something more like an iterator that the user can close when they are done with:

  • The client could have a cluster_scan method that takes all parameters other than the cursor.
  • That function returns a cursor and no data directly.
  • The cursor object has an accessor for the current data.
  • The cursor object has a next() function that takes in the scan parameters and under-the-hood submits another scan request, then updates the data.
  • The cursor object has a method to close early and release resources. It should work with python with statements.
  • The cursor object has a method to check if it was the last result, like it currently does.
  • The cursor string is completely abstracted from the user.

This has the added benefit of allowing deterministic resource clean-up instead of relying on the garbage collector. The design above would work both in the case where the same cursor string is used by Rust or if it changes from call-to-call. It'd be the same API from the caller's perspective and the cursor-management mechanics would just be an implementation detail.

What do you think?

string cursor = 1;
optional string match_pattern = 2;
optional int64 count = 3;
optional string object_type = 4;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit problematic. Enum in protobuf is numbers, enum in python need to choose between string to enumerate (as far as i was able when tried)
I wanted to use the same enum in python for both standalone which has to strings, and both cluster mode, so i couldn't use an enumerate enum in python.
The first try was enum as you suggest but i then i tackle this issue.
Do you have any other solution?

@jduo
Copy link
Collaborator

jduo commented Jul 1, 2024

Hi @avifenesh ,

I'm working on the Java port of this in #1737.

I'm not sure if the memory management is quite right. When the PythonGC runs on a cursor object, we'll call back to Rust to remove the container for the cursor state, correct?

However every call to cluster_scan() creates a new ClusterScanCursor object, even if they are continuing the same cursor. Would each of these all point to the same cursor string?

If so, you've got duplicate ownership of the same cursor, so if any of them get GC'd before the user is done iterating through all cursors (perhaps the user reuses the same local variable to store cursors), the cursor could get removed prematurely.

The specific use case I think we can run into problems in is if the user runs cluster_scan, gets a Rust cursor and puts it in a local variable, runs cluster_scan again, assigns the updated cursor to a local variable, waits for awhile and the gc kicks in, then runs cluster_scan again. If the cursor is shared, it'll get terminated prematurely.

Perhaps what we need to do is change the function to return something more like an iterator that the user can close when they are done with:

* The client could have a cluster_scan method that takes all parameters other than the cursor.

* That function returns a cursor and no data directly.

* The cursor object has an accessor for the current data.

* The cursor object has a next() function that takes in the scan parameters and under-the-hood submits another scan request, then updates the data.

* The cursor object has a method to close early and release resources. It should work with python with statements.

* The cursor object has a method to check if it was the last result, like it currently does.

* The cursor string is completely abstracted from the user.

This has the added benefit of allowing deterministic resource clean-up instead of relying on the garbage collector. The design above would work both in the case where the same cursor string is used by Rust or if it changes from call-to-call. It'd be the same API from the caller's perspective and the cursor-management mechanics would just be an implementation detail.

What do you think?

@avifenesh @barshaul , I've updated my PR at #1737 to implement the design proposed above. I've ported one of your integration tests to Java using this design as well to demonstrate how this works from a caller perspective.

With this:

  • The user never has to create a dummy starting cursor. They just ask for a cluster scan cursor from their cluster client.
  • The cursor API has a next() function that takes in the scan parameters. It asynchronously returns if there is more data available srver-side.
  • The cursor object updates itself with new cursor handles it receives.
  • The cursor object has an accessor for the data.
  • The cursor object can clean up the Rust side automatically when it gets the last cursor (sees "finished").
  • The user can explicitly clean up the cursor with a try-with-resources block. This is a standard practice when working with database cursors so it fits well here.
  • Two cursor objects can never be built from the same iteration of a server-side cursor, so there's never any concerns about shared ownership.

Copy link
Collaborator

@barshaul barshaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some last comments.
🚀🚀🚀

@avifenesh avifenesh force-pushed the Py/Commands/ClusterScan branch 3 times, most recently from 98fe470 to ffb967e Compare July 2, 2024 17:27
@avifenesh avifenesh merged commit 905a17b into valkey-io:main Jul 2, 2024
66 of 67 checks passed
cyip10 pushed a commit to Bit-Quill/valkey-glide that referenced this pull request Jul 16, 2024
@avifenesh avifenesh deleted the Py/Commands/ClusterScan branch October 20, 2024 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

5 participants