Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterating all entries in a cache #93

Closed
tatsuya6502 opened this issue Feb 20, 2022 · 5 comments · Fixed by #101
Closed

Iterating all entries in a cache #93

tatsuya6502 opened this issue Feb 20, 2022 · 5 comments · Fixed by #101
Labels
enhancement New feature or request
Milestone

Comments

@tatsuya6502
Copy link
Member

Split from #92. There are some needs for iterating all entries ((&K, &V)) in a cache.

@Dessix
Copy link

Dessix commented Feb 28, 2022

In order to do this locklessly, it's worth noting that either all keys will be sampled once but values may change, or that the iterator may or may not miss keys that were added during iteration. &V should be sampled by lookup per-key during a streaming iteration if possible, with now-missing keys skipped if the iterator reaches a pair which is no longer present.

@tatsuya6502
Copy link
Member Author

I am adding iterator to newly added moka::dash::Cache, which uses dashmap::DashMap as the central storage. Its iterator is a simple wrapper of DashMap iterator, which employs read lock on currently iterating shard of the map.

Here is the pull request:

At this moment, I think this will be good enough. If somebody wants iterator, they could consider using moka::dash::Cache.

I will revisit iterator for other concurrent caches such as moka::sync::Cache and moka::future::Cache. I will do it after I finish implementing other features (e.g. notification).

I will post some thoughts in next comment.

@tatsuya6502
Copy link
Member Author

This is my current thoughts on iterators for concurrent lock-free caches.

or that the iterator may or may not miss keys that were added during iteration

Yeah. It will be very hard to guarantee not to miss keys that were added during iteration.

There may be a way. Moka records operation logs for writes (upserts and deletes) in a channel. These logs are used for batch-updating cache policy’s data structures (doc). Iterators could subscribe the operation logs for newly added keys. For example, it will firstly iterate thorough a snapshot of keys, and then iterate thorough newly added keys retrieved from the operation logs. It will repeat until catching up to the operation logs.

However, there is short delay until those operation logs can be read from the channel. So even though iterator will do the above, it can miss last few new keys.

Another idea is to relax the guarantee; making iterator to allow missing some new keys, but do its best to minimize the amount of missed keys.

I think DashMap is taking this approach by making iterator to acquire a read lock on the shard of the map that it is currently iterating. The read lock ensures the shard to block updates, so iterator can guarantee that no new keys are missed within the shard. Iterator visits every shards in the map, one by one, and never go back to the shards it has already visited. So it will miss new keys that has been added to a shard that the iterator has already visited.

These lock-free Moka caches could take a similar approach to DashMap. Moka's cht::HashMap has multiple internal segments (like the shards in DashMap). It will be possible to take a consistent snapshot of keys and values in a segment, although it is a lock-free data structure. And, in Moka, each key and value are wrapped by Arc (The internal data structures as of v0.7.2). So, by cloning these Arcs, a snapshot can retain keys and values that have been deleted from the cache after taking the snapshot.

I do not know which way will be better yet.

@tatsuya6502
Copy link
Member Author

tatsuya6502 commented Mar 20, 2022

Hi @Milo123459 — I am going to review and merge #101 to add iterator to moka::dash::Cache. Once it is merged, you can start trying dash::Cache including iterator.

Add the following to the [dependencies] section of your Cargo.toml:

[dependencies]
moka = { git = "https://github.com/moka-rs/moka", branch = "master", features = ["dash"] }

Then generate the Rust doc using the following command:

$ cargo doc

Open ./target/doc/moka/index.html with a web browser, and navigate to moka::dash::Cache. You may want to start with the example on the page.

I would strongly recommend to try it now (before I publish the new version v0.8.0 to crates.io) to ensure it has everything you need. Once published, it will become harder to change APIs although I will state in the doc that dash::Cache is highly experimental and the API will likely be changed in the future.

I still have some other works to do for v0.8.0 release, so I will not be able to publish it until next weekend (March 27th).

@tatsuya6502 tatsuya6502 added this to the v0.8.0 milestone Mar 20, 2022
@tatsuya6502 tatsuya6502 added the enhancement New feature or request label Mar 20, 2022
@tatsuya6502
Copy link
Member Author

I will revisit iterator for other concurrent caches such as moka::sync::Cache and moka::future::Cache. I will do it after I finish implementing other features (e.g. notification).

Just FYI. I added iterators to all other caches via #114 because somebody else asked for it (#112). I released Moka v0.8.2 with them.

For the concurrent caches, I took the simplest design that I could came up with. The iterators do not guarantee to return new keys if they were inserted after the iterator was created. (It may or may not return them)

Please see #114 or the doc for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants