Try `seize` as a replacement of `crossbeam-epoch`? #385

tatsuya6502 · 2024-01-20T01:53:20Z

Try seize crate as a replacement of crossbeam-epoch crate.

moka cache uses crossbeam-epoch in its lock-free concurrent hash table called cht. crossbeam-epoch is an epoch-based garbage collector and helps to implement high-performance concurrent data structures like cht. However, we have been struggling for years with the spec of crossbeam-epoch, which does not guarantee to run the destructor of the pointed objects.

https://docs.rs/crossbeam-epoch/0.9.18/crossbeam_epoch/struct.Guard.html#method.defer_destroy

There is no guarantee when exactly the destructor will be executed. The only guarantee is that it won’t be executed until all currently pinned threads get unpinned. In theory, the destructor might never run, but the epoch-based garbage collection will make an effort to execute it reasonably soon.

Many of moka cache users expect the cached entries to be dropped as soon as they are evicted from the cache, of course. However, this crossbeam-epoch spec causes these evicted entries to be dropped with some delays or never dropped.

We have added mitigations for it (notes) and they mostly work except performance degradation. However, there are still some corner cases which seems difficult to solve/mitigate. It might be the time to try alternatives like seize. seize is based on Hyaline reclamation scheme and used in a concurrent hash table called flurry.

The text was updated successfully, but these errors were encountered:

tatsuya6502 · 2024-01-20T04:48:45Z

Unlike crossbeam-epoch, seize provides the followings:

Configurable batch_size: Collector::batch_size
- This controls the tradeoff between throughput and memory efficiency.
- Maybe moka cache should allow user to change this value when creating an instance of cache.
Protection against stalled threads: Collector::epoch_frequency
- This ensures evicted cache entries to be eventually dropped.
No global collector:
- Each moka cache instance will have a dedicated seize collector.
- This ensures all cached entries to be dropped when the cache instance is dropped.

tatsuya6502 · 2024-01-21T03:31:53Z

I am doing some experiments in a separate repository.

tatsuya6502 · 2024-01-22T14:13:09Z

Memo:

Ideally, I want a concurrent memory reclamation library to provide the followings:

Give us a control whether this differed destruction request should be buffered in a thread local storage or not.
- We want some of them not to be buffered, while others to be buffered.
Provide an API to trigger a GC.
When a GC runs, pull differed destruction requests from the local buffers of all registered threads,
- instead of letting each threads to push them to the global queue.

ibraheemdev · 2024-07-01T02:05:32Z

Hi! I happened to come across this issue and can share some details about seize.

One of the biggest benefits of seize over crossbeam-epoch is predictable latency. Epoch based reclamation has the issue that you have to check whether it is safe to free objects every once in a while when pinning/unpinning a threads, and this check is very expensive as you have to access shared memory from other threads. This means you run into a fundamental tradeoff between latency and memory efficiency. Seize on the other hand only performs this check once when retiring a batch (by pushing the objects to other threads), and uses reference counting to free memory.

This is an important consideration for a cache as you want an optimal balance between performance and memory usage.

Give us a control whether this differed destruction request should be buffered in a thread local storage or not.

This is something I've been thinking about. The problem is that seize requires at least as many objects as the number of active threads to be able to retire a batch, as each object acts as a node in a local thread linked list. Skipping the thread-local buffer would require allocating a large chunk of memory to perform the retirement process, which may be confusing.

What you can do is try to reclaim by calling Guard::flush after large objects are retired.

ibraheemdev · 2024-11-04T22:04:24Z

I also recently released papaya, a hash table built on seize. I'd be interested in porting moka to use papaya and see the results, I'd expect significant performance improvements compared to the internal cht, especially for readers. Is that something you'd be open to?

tatsuya6502 · 2024-11-04T23:56:16Z

@ibraheemdev

Is that something you'd be open to?

Thanks. Yes, I am open to it. I was not aware of papaya, but at first glance, it sounds like a good idea to use it in moka!

tatsuya6502 · 2025-01-16T00:09:04Z

I briefly checked the papaya API for the first time.

A HashMap can have a custom seize collector via the builder. (nice)
- moka cache should expose this feature to the users.
sieze::Guard::flush
- moka cache should expose a similar method to the users.
- The cache will internally call self.map.guard().flush()?
It just got the Equivalent trait. ~~(not yet released)~~ → v0.1.8
- Add Equivalence trait ibraheemdev/papaya#45
As of v0.1.8, the MSRV is 1.72.0.

In some use cases, it is very important to customize the collector and/or to be able to call flush. User may store large values in the cache and want to drop them as soon as they are removed from the cache:

#125 (comment)

Our cache entries are quite large and not that many so having about ~100 of them hanging around might look like a memory leak to us.

quickwit-oss/quickwit#1735

Unfortunately they do not fit our need.

They target a highly concurrent small k,v use case, and do not strictly respect the limits they are assigned. All data is stored, and eventually GCed by a background task.

In our use case, we get/store very large expensive objects, at a pace of few hundreds per seconds.

tatsuya6502 · 2025-01-16T15:27:54Z

It just got the Equivalent trait. ~~(not yet released)~~ → v0.1.8

I am adding support for the Equivalent trait to moka:

Support Equivalent trait for the cache key type K #492

tatsuya6502 · 2025-01-18T02:03:32Z

CC: @ibraheemdev

I started experimenting with papaya in the moka cache using a separate repository.

So far, so good.

Replaced the main concurrent hash table in sync caches with papaya.
- There are four concurrent hash tables used in a cache:
  - The main hash table to store cached entries.
  - One for the waiter locks used by the get_with and entry and_compute methods.
  - One for the key locks used by the eviction listener.
  - One for the predicate store used by the invalidate_entries_if.
TODOs
- Need to replace crossbeam_epoch::pin().flush() with something like self.map.guard().flush().
- Implement and re-enable the iterator.
- Remove redundant hash calculations (if any).
- Do the same things for the future cache.

Some thoughts:

I got a good impression of papaya:
- It has nice APIs.
- The GC algorithm of seize looks promising.
Currently, the resize mode is hard-corded to Blocked, otherwise I get very poor benchmark results from mokabench including single-threaded ones.
Need the ability of shrink the capacity.
- It seems that papaya (at this time) will never shrink the capacity. (code)
- This could be a blocker as moka cache supports expiration policies to evict entries; caches can grow and shrink over time.
- The moka cache currently supports shrinking the capacity. Memory utilization grows significantly beyond max_capacity() under load #72 (comment)
Nice to have the remove_entry_if method in papaya to remove an entry with key based on a predicate.
- The moka cache's eviction policy relies heavily on this method.
- Currently, I use the compute method instead of remove_entry_if (code). But it is not ideal because it requires an owned key, instead of a borrowed key.

I need to think about how to transition the user to a papaya based implementation.

Provide a crate feature to switch between the current cht and papaya implementations.
- e.g. legacy-cht, which will be a default feature. In the future, it will become non-default and then eventually removed.
Or, do the same thing at the runtime by using a builder option.
- Choose a concurrent hash table implementation per cache instance.

Perhaps do 2. as it gives more flexibility to the users?

ibraheemdev · 2025-01-18T02:44:11Z

Exciting!

Currently, the resize mode is hard-corded to Blocked, otherwise I get very poor benchmark results from mokabench including single-threaded ones.

Yeah, this is somewhat expected in microbenchmarks. However, it can be useful in some applications where writes are rare and latency is more important than throughput. Ideally this would be exposed as a configuration option.

Need the ability of shrink the capacity.

We can do this dynamically based on the number of live entries in a table when resizing. Is there a heuristic that you use currently to determine when to shrink?

Nice to have the remove_entry_if method in papaya to remove an entry with key based on a predicate.

Ah yeah using a borrowed key here would be nice. I opened an issue here: ibraheemdev/papaya#53.

I need to think about how to transition the user to a papaya based implementation.

The feature approach sounds good. Ideally papaya doesn't have any downsides over the current implementation so there shouldn't be any burden on the user to decide the internal datastructure.

Another thought I had is that it might be possible for moka to set epoch_frequency to None by default. This can greatly improve performance, and if seize exposed a compile-time feature it could also remove a usize of metadata per entry in the map. The main downside is that you lose protection against stalled threads (a thread that is holding a guard for a long time and preventing reclamation of new objects that are being allocated). However, I'm not sure how useful this is in practice, because moka doesn't hold guards for very long and just clones values. The only time a guard is held for a long time, from what I can tell, is for iterators, which are likely to access new values anyways. I'll leave this up to you but I think there should at least be a compile-time feature for this due to the memory usage saving. I opened an issue for this here ibraheemdev/seize#33.

I also opened an issue about force flushing a guard (ibraheemdev/seize#34), because this seems like it might be important to some users.

tatsuya6502 · 2025-01-19T05:17:59Z

Thanks for the feedback and opening the issues!

Need the ability of shrink the capacity.

We can do this dynamically based on the number of live entries in a table when resizing. Is there a heuristic that you use currently to determine when to shrink?

papaya expands the capacity when the number of live entries exceeds half of the capacity, and then it doubles the capacity. So the next allocation will start at 1/4 (25%) full. How about shrinking the capacity when the number of live entries is less than 1/8 (12.5%) or 1/16 (6.25%) of the capacity? Also, I think it should respect the initial capacity when shrinking the capacity; the table should not be shrunk below the initial capacity.

So the new heuristic would be:

let next_capacity = match cfg!(papaya_stress) {
    true => ...,

    // Double the table capacity if we are at least 50% (1/2) full.
    false if self.len() >= (table.len() >> 1) => table.len() << 1,

    // Halve the table capacity if we are at most 12.5% (1/8) full,
    // but don't halve below the minimum capacity.
    false if self.len() <= (table.len() >> 3) => table.min_cap().max(table.len() >> 1),

    // Otherwise, keep the table capacity the same.
    false => table.len(),
};

If the table was created with an initial capacity of 0, the minimum capacity min_cap is 1.
Otherwise, the min_cap is the initial capacity.
The min_cap can be updated when the reserve method is called with an additional capacity.

tatsuya6502 · 2025-01-19T05:52:08Z

Another thought I had is that it might be possible for moka to set epoch_frequency to None by default. This can greatly improve performance, and if seize exposed a compile-time feature it could also remove a usize of metadata per entry in the map. The main downside is that you lose protection against stalled threads (a thread that is holding a guard for a long time and preventing reclamation of new objects that are being allocated). However, I'm not sure how useful this is in practice, because moka doesn't hold guards for very long and just clones values. The only time a guard is held for a long time, from what I can tell, is for iterators, which are likely to access new values anyways. I'll leave this up to you but I think there should at least be a compile-time feature for this due to the memory usage saving. I opened an issue for this here ibraheemdev/seize#33.

Thanks for the suggestion. I implemented enough stuff to moka sync cache to run some micro benchmarks. I compared the elapse time to process a dataset (cache trace) by the original moka with the cht and with papaya. I set None to the papaya epoch_frequency for this run.

The results are below. moka with papaya clearly outperforms the one with the original cht. Great job!

Ran on a Linux x86_64 PC on Intel Core i7-12700F. (20 hyperthreads)
- I also tried macOS arm64 on 8-core Apple M1 chip, and got similar results.
For a fair comparison, I removed all the crossbeam_epoch::pin().flush() calls in the original moka. (commit)
The configurations of the papaya HashMap:
- The resize mode: Blocked
- The epoch frequency: None

ibraheemdev · 2025-01-19T22:32:22Z

papaya expands the capacity when the number of live entries exceeds half of the capacity, and then it doubles the capacity. So the next allocation will start at 1/4 (25%) full. How about shrinking the capacity when the number of live entries is less than 1/8 (12.5%) or 1/16 (6.25%) of the capacity? Also, I think it should respect the initial capacity when shrinking the capacity; the table should not be shrunk below the initial capacity.

Shrinking at 1/8 seems reasonable. My only concern was that in a workload where you continually insert and clear the map, shrinking would lead to unnecessary extra growth. I suppose that respecting the initial capacity is a solution for that.

tatsuya6502 mentioned this issue Jan 20, 2024

entries are leaked when sync::Cache is dropped #383

Closed

tatsuya6502 self-assigned this Jan 21, 2024

taiki-e mentioned this issue Feb 4, 2025

Segfault with rayon & moka crossbeam-rs/crossbeam#1175

Closed

ibraheemdev mentioned this issue Feb 9, 2025

Add heuristic to shrink table capacity ibraheemdev/papaya#57

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try `seize` as a replacement of `crossbeam-epoch`? #385

Try `seize` as a replacement of `crossbeam-epoch`? #385

tatsuya6502 commented Jan 20, 2024 •

edited

Loading

tatsuya6502 commented Jan 20, 2024 •

edited

Loading

tatsuya6502 commented Jan 21, 2024 •

edited

Loading

tatsuya6502 commented Jan 22, 2024

ibraheemdev commented Jul 1, 2024 •

edited

Loading

ibraheemdev commented Nov 4, 2024

tatsuya6502 commented Nov 4, 2024

tatsuya6502 commented Jan 16, 2025 •

edited

Loading

tatsuya6502 commented Jan 16, 2025

tatsuya6502 commented Jan 18, 2025 •

edited

Loading

ibraheemdev commented Jan 18, 2025

tatsuya6502 commented Jan 19, 2025

tatsuya6502 commented Jan 19, 2025 •

edited

Loading

ibraheemdev commented Jan 19, 2025

Try seize as a replacement of crossbeam-epoch? #385

Try seize as a replacement of crossbeam-epoch? #385

Comments

tatsuya6502 commented Jan 20, 2024 • edited Loading

tatsuya6502 commented Jan 20, 2024 • edited Loading

tatsuya6502 commented Jan 21, 2024 • edited Loading

tatsuya6502 commented Jan 22, 2024

ibraheemdev commented Jul 1, 2024 • edited Loading

ibraheemdev commented Nov 4, 2024

tatsuya6502 commented Nov 4, 2024

tatsuya6502 commented Jan 16, 2025 • edited Loading

tatsuya6502 commented Jan 16, 2025

tatsuya6502 commented Jan 18, 2025 • edited Loading

ibraheemdev commented Jan 18, 2025

tatsuya6502 commented Jan 19, 2025

tatsuya6502 commented Jan 19, 2025 • edited Loading

ibraheemdev commented Jan 19, 2025

Try `seize` as a replacement of `crossbeam-epoch`? #385

Try `seize` as a replacement of `crossbeam-epoch`? #385

tatsuya6502 commented Jan 20, 2024 •

edited

Loading

tatsuya6502 commented Jan 20, 2024 •

edited

Loading

tatsuya6502 commented Jan 21, 2024 •

edited

Loading

ibraheemdev commented Jul 1, 2024 •

edited

Loading

tatsuya6502 commented Jan 16, 2025 •

edited

Loading

tatsuya6502 commented Jan 18, 2025 •

edited

Loading

tatsuya6502 commented Jan 19, 2025 •

edited

Loading