Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch cache to custom without write-lock for reads. #5576

Merged
merged 2 commits into from
Nov 18, 2022

Conversation

absoludity
Copy link
Contributor

Signed-off-by: Michael Nelson minelson@vmware.com

Follows on from #5518, this time replacing the cached package with a custom credential cache.

Description of the change

After further digging, I found that one cause of the slow handling of 50 concurrent requests going through the pinniped-proxy was that:

  • We were caching a function with an async/await signature which means that even the cached version must have that signature as well - which means a blocking i/o call (which switches the task), and
  • The Cached trait specifies that even a cache_get operation mutates the cache (in our case, just for statistics of hits/misses), which, as a result, requires acquiring a write lock to the cache to read a cached value.

For more details, please see the discussion with the Cached author.

To avoid both of those issues, this PR:

  1. Adds a cache module that provides a generic read/write LockableCache (for multiple readers, single writer) and builds on that with a PruningCache that will prune entries (given a test function) when they should no longer be cached,
  2. Uses (1) to create a single CredentialCache on startup (in main.rs) specifically for caching TokenCredentialRequest objects and pruning expired entries, and then passes this through for use in different threads concurrently.
  3. Uses the cache to fetch the credentials.

Benefits

Fetching from the cache is now non-blocking (generally, except when an entry is being added) and so leads to less task switching, improving the total query time by ~2s (down to 3-4).

There is still something else using significant CPU when creating the client itself (cert-related), which I'm investigating now in a separate PR.

Possible drawbacks

Applicable issues

Additional information

Example log when using RUST_LOG=info,pinniped_proxy::pinniped=debug which shows the cache being used after the first request. I've not included it in the output generally, but the cache get is now always under a millisecond. As above, the significant delays (some calls to prepare_and_call_pinniped_exchange only 4ms, others 98ms) are what I'll look at next.

2022-10-27T01:42:47.820245 [INFO] - Listening on http://0.0.0.0:3333
2022-10-27T01:43:05.077116 [DEBUG] - prepare_and_call_pinniped_exchange took 17ms. Used cache?: false
2022-10-27T01:43:05.085273 [INFO] - GET https://kubernetes.default/api?timeout=32s 200 OK
2022-10-27T01:43:05.091663 [DEBUG] - prepare_and_call_pinniped_exchange took 5ms. Used cache?: true
2022-10-27T01:43:05.100437 [INFO] - GET https://kubernetes.default/apis?timeout=32s 200 OK
2022-10-27T01:43:05.106005 [DEBUG] - prepare_and_call_pinniped_exchange took 4ms. Used cache?: true
2022-10-27T01:43:05.209952 [DEBUG] - prepare_and_call_pinniped_exchange took 21ms. Used cache?: true
2022-10-27T01:43:05.299424 [DEBUG] - prepare_and_call_pinniped_exchange took 5ms. Used cache?: true
2022-10-27T01:43:05.311599 [DEBUG] - prepare_and_call_pinniped_exchange took 5ms. Used cache?: true
2022-10-27T01:43:05.493269 [DEBUG] - prepare_and_call_pinniped_exchange took 98ms. Used cache?: true
2022-10-27T01:43:05.593683 [DEBUG] - prepare_and_call_pinniped_exchange took 4ms. Used cache?: true
2022-10-27T01:43:05.604348 [DEBUG] - prepare_and_call_pinniped_exchange took 4ms. Used cache?: true
2022-10-27T01:43:05.697828 [DEBUG] - prepare_and_call_pinniped_exchange took 87ms. Used cache?: true
2022-10-27T01:43:05.811590 [DEBUG] - prepare_and_call_pinniped_exchange took 20ms. Used cache?: true
2022-10-27T01:43:06.004358 [DEBUG] - prepare_and_call_pinniped_exchange took 94ms. Used cache?: true
2022-10-27T01:43:06.098603 [DEBUG] - prepare_and_call_pinniped_exchange took 5ms. Used cache?: true
2022-10-27T01:43:06.108756 [DEBUG] - prepare_and_call_pinniped_exchange took 4ms. Used cache?: true

@netlify
Copy link

netlify bot commented Oct 27, 2022

Deploy Preview for kubeapps-dev ready!

Name Link
🔨 Latest commit c848f59
🔍 Latest deploy log https://app.netlify.com/sites/kubeapps-dev/deploys/63775589d3e4d50009462191
😎 Deploy Preview https://deploy-preview-5576--kubeapps-dev.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

/// Importantly, checking the cache does not require a write-lock
/// (unlike the [`Cached` trait's `cache_get`](https://github.com/jaemk/cached/blob/f5911dc3fbc03e1db9f87192eb854fac2ee6ac98/src/lib.rs#L203))
#[derive(Default)]
struct LockableCache<K, V>(RwLock<HashMap<K, V>>);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct is a NewType pattern which is a zero-cost abstraction (ie. no penalty at run-time) that defines a new type as a thin wrapper - a 1-tuple of another type, so that we can add our caching functions (get, insert) to this type without needing an extra struct. So in this case, a LockableCache is really just a read-write lock wrapping a hash, but one which behaves like a cache.

record.args()
)
})
.init();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops - I'd left the .init() off when I re-enabled logging in my last PR, but didn't check it (so nothing will be logged without this change).

Copy link
Contributor

@antgamdia antgamdia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into the root cause; it's a pity we had to drop the cached crait and implement it on our own. I didn't expect a read operation to be blocking :S

@absoludity
Copy link
Contributor Author

Thanks for digging into the root cause; it's a pity we had to drop the cached crait and implement it on our own. I didn't expect a read operation to be blocking :S

Yeah - either did I, though I did afterwards remember that the cached trait keeps statistics of misses etc.

Signed-off-by: Michael Nelson <minelson@vmware.com>
Signed-off-by: Michael Nelson <minelson@vmware.com>
@absoludity absoludity merged commit a9d3601 into main Nov 18, 2022
@absoludity absoludity deleted the new-pinniped-cache-2 branch November 18, 2022 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants