Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entry count never reaches zero even when calling run_pending_tasks() in a loop [moka future 0.12] #345

Closed
peter-scholtens opened this issue Nov 12, 2023 · 14 comments · Fixed by #348
Assignees
Labels
bug Something isn't working
Milestone

Comments

@peter-scholtens
Copy link
Contributor

Trying to convert the eviction listener example from a sync version to a future version, I see some weird behavior. Depending on the presence of a second insertion of the same key, the entry_count() will never reach zero, even after calling run_pending_tasks() in a loop. See the example below. Uncommenting the line number 48 (the line after the line with the triple question mark) will hold the entry_count() above zero forever, causing the loop to continue forever.

Some remarks:

  1. Is this expected behavior?
  2. Would it be an idea to let run_pending_tasks() return a true value if it is finished or return a false if more than 500 items were processed and it is still not finished. (Finished from the perspective of e.g. the timestamp of calling the function, later insertions can be ignored for this judgement).
  3. I wrote this example to study the possibility to use the information Arc::strong_count() in the eviction listener, see discussion 344 where I want to avoid orphaned entries in the Btree with shared Arc's.
use moka::future::Cache;
use std::collections::BTreeMap;
use std::sync::Arc;
use std::thread::sleep;
use std::time::Duration;

#[tokio::main]
async fn main() {
    // Store user names in a B-tree.
    let mut tree = BTreeMap::new();
    let arc1 = Arc::new(String::from("Alice"));
    let arc2 = Arc::new(String::from("Bob"));
    let arc3 = Arc::new(String::from("Charlie"));
    let arc4 = Arc::new(String::from("Dave"));
    let arc5 = Arc::new(String::from("Eve"));
    let arc6 = Arc::new(String::from("Faythe"));
    tree.insert(1, arc1.clone());
    tree.insert(2, arc2.clone());
    tree.insert(3, arc3.clone());
    tree.insert(4, arc4.clone());
    tree.insert(5, arc5.clone());
    tree.insert(6, arc6.clone());

    // Make an artificially small cache and 1-second ttl to observe eviction listener.
    let ttl = 1;
    {
        let cache = Cache::builder()
            .max_capacity(2)
            .time_to_live(Duration::from_secs(ttl))
            .eviction_listener(|key, value, cause| {
                println!("Evicted ({:08X},{:?}) because {:?}", *key, value, cause)
            })
            .build();
        // Create some simple CRC-32 session keys with Bash:
        //   for ((i = 1; i < 7 ; i++)); do rhash <(echo "$i")|tail -1; done
        cache.insert(0x6751FC53, arc1.clone()).await;
        cache.insert(0x4C7CAF90, arc2.clone()).await;
        cache.insert(0x55679ED1, arc3.clone()).await;
        cache.insert(0x1A260816, arc4.clone()).await;
        cache.insert(0x033D3957, arc5.clone()).await;
        cache.insert(0x28106A94, arc6.clone()).await;
        // Due to race condition spilled over maybe evicted twice by cause
        // Replaced and Size.
        cache.insert(0x4C7CAF90, arc2.clone()).await;
        // With 1-second ttl, keys 0 and 1 will be evicted if we wait long enough.
        sleep(Duration::from_secs(ttl));
        println!("Wake up!");
        // UNCOMMENTING THE NEXT LINE WILL BLOCK THE run_pending_tasks() LOOP LATER ON?        <------ ???
        //cache.insert(0x55679ED1,arc3.clone()).await;

        // Remove from cache and return value:
        if let Some(v) = cache.remove(&0x55679ED1).await {
            println!("Removed: {v}")
        };
        // Or remove from cache without returning the value.
        cache.invalidate(&0x1A260816).await;

        // invalidate_all() removes entries using a background thread, so there will
        // be some delay before entries are removed and the eviction listener is
        // called. If you want to remove all entries immediately, call
        // run_pending_tasks() method repeatedly like the loop below.
        cache.invalidate_all();
        loop {
            // Synchronization is limited to at most 500 entries for each call.
            cache.run_pending_tasks().await;
            // Check if all is done. Calling entry_count() requires calling
            // run_pending_tasks() first!
            if cache.entry_count() == 0 {
                break;
            }
        }

        cache.insert(0x28106A94, arc6.clone()).await;
        // When cache is dropped eviction listener is not called. Either call
        // invalidate_all() or wait longer than ttl.
        sleep(Duration::from_secs(ttl + 1));
    } // cache is dropped here.

    println!("Cache structure removed.");
    sleep(Duration::from_secs(1));
    println!("Exit program.");
}
@tatsuya6502 tatsuya6502 self-assigned this Nov 12, 2023
@tatsuya6502
Copy link
Member

@peter-scholtens

Hi. Thank you for reporting this issue.

Uncommenting the line number 48 (the line after the line with the triple question mark) will hold the entry_count() above zero forever, causing the loop to continue forever.

I am afraid that I cannot reproduce the issue with moka v0.12.1 and v0.12.0. I uncommented the line but entry_count() still returned zero. Can you please check if the issue still remains at your side? I am using a macOS target aarch64-apple-darwin.

Thanks!

around-line-48
$ cargo r --release
    Finished release [optimized] target(s) in 0.04s
     Running `target/release/repro`
Evicted (4C7CAF90,"Bob") because Replaced
Wake up!
Evicted (55679ED1,"Charlie") because Replaced
Evicted (55679ED1,"Charlie") because Size
Evicted (1A260816,"Dave") because Size
Evicted (033D3957,"Eve") because Size
Evicted (28106A94,"Faythe") because Size
Evicted (55679ED1,"Charlie") because Size
Evicted (6751FC53,"Alice") because Explicit
Evicted (4C7CAF90,"Bob") because Explicit
[src/main.rs:68] cache.entry_count() = 0
Cache structure removed.
Exit program
$ rustc -Vv
rustc 1.73.0 (cc66ad468 2023-10-03)
binary: rustc
commit-hash: cc66ad468955717ab92600c770da8c1601a4ff33
commit-date: 2023-10-03
host: aarch64-apple-darwin
release: 1.73.0
LLVM version: 17.0.2

$ cargo tree -i moka
moka v0.12.1
└── repro v0.1.0 (/Users/tatsuya/ ... /repro)

$ cargo tree -i tokio
tokio v1.34.0
└── repro v0.1.0 (/Users/tatsuya/ ... /repro)

@peter-scholtens
Copy link
Contributor Author

I use the same version for moka v01.12.1, but have a host triple like x86_64-unknown-linux-gnu. Maybe the problem is related to that? Some further experimenting with it:

  1. I was using the nightly toolchain, switched back to stable (rustc 1.73.0) and still see the problem.
  2. Also did an cargo update today, so tokio upgraded from v1.32.0 to v1.34.0, and still see the problem.
  3. Copied the code sample above from github back to the file and uncommented line (to make sure it is not a weird typo on my side), still the same problem.

(I've also located the file at the examples directory of moka, so I actually run cargo run --example eviction_listener_async --features="future" , just for your information)

@tatsuya6502
Copy link
Member

Thank you for experimenting further.

but have a host triple like x86_64-unknown-linux-gnu. Maybe the problem is related to that?

That was it. Now I can reproduce the issue by running it on x86_64-unknown-linux-gnu.

$ rustc -Vv
rustc 1.73.0 (cc66ad468 2023-10-03)
binary: rustc
commit-hash: cc66ad468955717ab92600c770da8c1601a4ff33
commit-date: 2023-10-03
host: x86_64-unknown-linux-gnu
release: 1.73.0
LLVM version: 17.0.2

$ rg 'Core.*i7' /proc/cpuinfo | head -1
model name      : 12th Gen Intel(R) Core(TM) i7-12700F
$ cargo r --release
...
Evicted (4C7CAF90,"Bob") because Replaced
Wake up!
Evicted (55679ED1,"Charlie") because Expired
Evicted (55679ED1,"Charlie") because Size
Evicted (1A260816,"Dave") because Size
Evicted (033D3957,"Eve") because Size
Evicted (28106A94,"Faythe") because Size
Evicted (6751FC53,"Alice") because Expired
Evicted (4C7CAF90,"Bob") because Expired
[src/main.rs:68] cache.entry_count() = 1
[src/main.rs:68] cache.entry_count() = 1
[src/main.rs:68] cache.entry_count() = 1
[src/main.rs:68] cache.entry_count() = 1
[src/main.rs:68] cache.entry_count() = 1
[src/main.rs:68] cache.entry_count() = 1
...

So, on Linux x86_64, I do not see Charlie being evicted after Faythe.

# macOS arm64
...
Evicted (28106A94,"Faythe") because Size
Evicted (55679ED1,"Charlie") because Size   // <= This one is missing on Linux x86_64.
Evicted (6751FC53,"Alice") because Explicit
...

I will look into this hopefully later today. (It is 8:50 AM in my timezone UTC+0800 now)

@tatsuya6502
Copy link
Member

I can reproduce it on aarch64-unknown-linux-gnu too. So this issue should depend on the OS, not the CPU architecture.

@tatsuya6502
Copy link
Member

A quick update:

  1. I found the root cause of the issue in moka.
    • I will spend few days to examine some options to address the issue.
  2. I found the reason why the issue could not be reproduced on macOS arm64.
    • It is an issue with quanta::Instant on macOS.
      • I found quanta::Instant::elapsed on macOS tends to return a duration that is shorter than the one returned by std::time::Instant::elapsed.
      • moka uses quanta::Instant by default for measuring duration for key expiration.
    • If we change the reproducer code as the followings, it will reproduce issue (1) on macOS too.
let ttl = 1;

      // With 1-second ttl, keys 0 and 1 will be evicted if we wait long enough.
-     sleep(Duration::from_secs(ttl));
+     sleep(Duration::from_millis(ttl * 1000 + 100));

@tatsuya6502 tatsuya6502 added the bug Something isn't working label Nov 14, 2023
@tatsuya6502 tatsuya6502 added this to the v0.12.2 milestone Nov 14, 2023
@tatsuya6502
Copy link
Member

tatsuya6502 commented Nov 15, 2023

  1. Is this expected behavior?

No, it is not. It is a bug in moka relating to the timings between (1) reinserting an existing key when cache is full, and (2) processing pending tasks for the key. I will fix it soon.

  1. Would it be an idea to let run_pending_tasks() return a true value if it is finished or return a false if more than 500 items were processed and it is still not finished. (Finished from the perspective of e.g. the timestamp of calling the function, later insertions can be ignored for this judgement).

I have a plan to remove the 500 items limit from run_pending_tasks(). Once it is done, it will run until finished (but later insertions will be ignored). So you will not have to worry about it anymore.

moka/src/future/base_cache.rs

Lines 1456 to 1459 in ce70442

// TODO: When run_pending_tasks was called explicitly, do not stop evicting
// at the batch size.
if self.has_expiry() || self.has_valid_after() {
self.evict_expired_entries_using_deqs(

@peter-scholtens
Copy link
Contributor Author

I have a plan to remove the 500 items limit from run_pending_tasks(). Once it is done, it will run until finished (but later insertions will be ignored)

Just some worries and suggestions from my side: I have no clear picture how the eviction algorithm is implemented, but what I remember from (ri-) stretto is that is was possible to invalidate a huge amount of entries at once due to lifetime constraints. Can this happen in moka too? And does the eviction handler then block data or threads for such a long time that the end-user observes a very long response time? If that is the case, would it not be more handy to give the user some control over this behavior? Either beforehand by giving the run_pending_tasks() function an input u64 parameter of the maximum amount of items which can processed, and/or afterwards supply a boolean or u64 as return value containing the numbers of items which still needs to be processed...?

@tatsuya6502
Copy link
Member

tatsuya6502 commented Nov 15, 2023

Just some worries and suggestions from my side: I have no clear picture how the eviction algorithm is implemented, but what I remember from (ri-) stretto is that is was possible to invalidate a huge amount of entries at once due to lifetime constraints. Can this happen in moka too?

Thank you for sharing your thoughts. I am not familiar with the issue that (re-)stretto have with their eviction algorithms. I read this article a couple of years ago, but I have not updated my knowledge since then.

Here are the cases in moka:

  • Reads are never blocked.
  • Writes will be blocked when one of the following conditions meet:
    • Writing a key that has been processed by eviction listener.
      • One key at a time is blocked.
    • Or, the buffer for write operation logs gets full.
      • All writes will be blocked.
      • For some information about the buffer, see this diagram in the migration guide.
      • The buffer size is currently hard coded to 3,072 writes per cache.
      • The write logs are processed when run_pending_tasks are called implicitly and explicitly. (See the migration guide for when they are processed)
  • Even if a write has been blocked, the result of write (insert/remove) is immediately visible to other threads/aync tasks.
    • The write is blocked at the last step of write, which is to push the write operation log to the buffer.
  • As the diagram shows, entries become ready for eviction policies once their write logs in the buffer are processed.
  • All data structures and algorithms for eviction policies (size, TTL, TTI, and per-entry expiration) provide O(1) time complexly for finding next victim (= candidate to evict).

And does the eviction handler then block data or threads for such a long time that the end-user observes a very long response time?

If the provided eviction listener closure is really slow, and cache is receiving a high volume of writes, the write operation log buffer can get full. Then end-user will observe a very long response time.

Before moka v0.12, it had an optional notification delivery mode called Queued delivery mode, which used a dedicated thread to call the eviction listener closure to avoid blocking cache writes. It was removed from v0.12 for simplicity, but we are open to add it back. See the migration guide for more details.

If that is the case, would it not be more handy to give the user some control over this behavior? Either beforehand by giving the run_pending_tasks() function an input u64 parameter of the maximum amount of items which can processed, and/or afterwards supply a boolean or u64 as return value containing the numbers of items which still needs to be processed...?

So, without queued delivery mode, cache writes could be blocked in the above case. As you see in the migration guide, run_pending_tasks does many types of task, so we could provide a struct with some fields to specify the maximum amounts of items to process per task type. As for the return type, it is expensive (e.g. O(n)) to count the numbers of items, which still needs to be processed. So probably we will return a stuct having the counters for the numbers of items, which were processed, per task type.

@tatsuya6502
Copy link
Member

Hi @peter-scholtens,

I am working on the fix for the original issue here:

Some unimportant tests are failing for some 32-bit targets now, and I will fix them later. But, I believe #348 is already good enough for you to try. You will no longer see the wired behavior in your example program.

Use the following dependency to try it out:

[dependencies]
moka = { git = "https://github.com/moka-rs/moka.git", branch = "fix-race-in-handle-upseart", features = ["future"] }

@peter-scholtens
Copy link
Contributor Author

peter-scholtens commented Nov 19, 2023

It seems now both:

moka = { git = "https://github.com/moka-rs/moka.git", branch = "fix-race-in-handle-upseart", features = ["future"] }

and also this work correctly... (I tried to re-produce the error, and failed)

moka = { version = "0.12.1", features = ["future"] }

Could it be solved by an update of other dependencies? (I assume I did a cargo update also, I'm not sure)

@tatsuya6502
Copy link
Member

Thank you for trying!

Could it be solved by an update of other dependencies? (I assume I did a cargo update also, I'm not sure)

No. It was clearly Moka's bug.

In your program, try to sleep a bit longer than the TTL. It may help to reproduce the issue:
#345 (comment)

let ttl = 1;

      // With 1-second ttl, keys 0 and 1 will be evicted if we wait long enough.
-     sleep(Duration::from_secs(ttl));
+     sleep(Duration::from_millis(ttl * 1000 + 100));

Also, I put a minimized version of your program as a test:
https://github.com/moka-rs/moka/pull/348/files#diff-338f772312880199137f8bbb508f15dd7f83457565cc983b85043cddd71b2f24R3065-R3101

I confirmed it fails on v0.12.1, and passes with #348.

@peter-scholtens
Copy link
Contributor Author

Confirmed! (I forgot to uncomment my own code).

@tatsuya6502
Copy link
Member

FYI, I published moka v0.12.2 with the fix to crates.io.

@tatsuya6502
Copy link
Member

  • I found quanta::Instant::elapsed on macOS tends to return a duration that is shorter than the one returned by std::time::Instant::elapsed.
  • moka uses quanta::Instant by default for measuring duration for key expiration.

FYI, I reported the above issue to quanta project: metrics-rs/quanta#96

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants