Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transforms Rust SDK: Add caching layer to Schema Registry client (LRU) #19859

Merged

Conversation

oleiman
Copy link
Member

@oleiman oleiman commented Jun 16, 2024

.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

Improvements

  • Add an LRU caching layer to Rust transform SDK Schema Registry client

@oleiman oleiman self-assigned this Jun 16, 2024
@github-actions github-actions bot added the area/wasm WASM Data Transforms label Jun 16, 2024
@oleiman oleiman force-pushed the xform/sdk/core-2771/rust-caching-sr-client branch 5 times, most recently from 5462866 to a090d2d Compare June 18, 2024 21:54
@oleiman oleiman marked this pull request as ready for review June 18, 2024 22:07
@oleiman oleiman changed the title DO NOT MERGE: Xform/sdk/core 2771/rust caching sr client Transforms Rust SDK: Add caching layer to Schema Registry client (LRU) Jun 20, 2024
@oleiman oleiman requested a review from rockwotj July 2, 2024 23:13
@oleiman
Copy link
Member Author

oleiman commented Jul 2, 2024

@rockwotj - this has been hangin' out for a little bit. back burnered for things with core dependencies. would be interested to hear your thoughts. particularly on the unfortunate reality of making mut& infecting everything 🤷

@rockwotj
Copy link
Contributor

rockwotj commented Jul 3, 2024

I will take a closer look next week. We should do LRU caching, but should consider interior mutability if this design is too invasive: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html

@oleiman
Copy link
Member Author

oleiman commented Jul 3, 2024

I will take a closer look next week. We should do LRU caching, but should consider interior mutability if this design is too invasive: https://doc.rust-lang.org/book/ch15-05-interior-mutability.html

Ah yeah, that's what i was looking for but didn't find. Nice one, thanks!

@oleiman oleiman force-pushed the xform/sdk/core-2771/rust-caching-sr-client branch from a090d2d to 00f1bd6 Compare July 3, 2024 04:23
@@ -33,6 +33,7 @@ use redpanda_transform_sdk_sr_types::*;
use redpanda_transform_sdk_varint as varint;

use lru::LruCache;
use std::cell::UnsafeCell;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ace 🎯

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use RefCell here please, I don't think the safety checks are a performance concern (and let's prove it with benchmarks first).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will note that I chose unsafecell because I was not aware refcell could also do this (i.e. not for perf reasons). Easy benchmark tho.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, no need to benchmark, let's just stick with RefCell for now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wound up kinda gross. get operations on the cache return optional borrows, but as I understand it those are bound to the lifetime of the RefMut that we peel off from RefCell, so you can't (?) return them (easily?).

I suspect there's a way around it with Ref::map & friends, but then there's other weird stuff with passing in a RefMut to get a Ref out, etc.

Took a copy for now, but open to feedback.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kinda gross

or totally fine. fact remains that I didn't find a common pattern for this exact usage, but we've not introduced any additional clones afaict: #19859 (comment)

@oleiman oleiman force-pushed the xform/sdk/core-2771/rust-caching-sr-client branch 2 times, most recently from 472f2b8 to f453848 Compare July 3, 2024 21:41
@@ -33,6 +33,7 @@ use redpanda_transform_sdk_sr_types::*;
use redpanda_transform_sdk_varint as varint;

use lru::LruCache;
use std::cell::UnsafeCell;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use RefCell here please, I don't think the safety checks are a performance concern (and let's prove it with benchmarks first).

src/transform-sdk/rust/sr-sys/src/lib.rs Outdated Show resolved Hide resolved
src/transform-sdk/rust/sr/src/lib.rs Outdated Show resolved Hide resolved
src/transform-sdk/rust/sr-sys/src/lib.rs Outdated Show resolved Hide resolved
@oleiman oleiman force-pushed the xform/sdk/core-2771/rust-caching-sr-client branch 2 times, most recently from c9a89db to a2aae64 Compare July 12, 2024 22:21
impl SchemaRegistryClientImpl for CachingSchemaRegistryClient {
fn lookup_schema_by_id(&self, id: SchemaId) -> Result<Schema> {
if let Some(schema) = self.schema_by_id_cache.get(&id) {
Ok(schema.clone())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, i guess we're cloning these out anyway. now cache::get will return a value type so we can just relocate into the Result.

@oleiman oleiman force-pushed the xform/sdk/core-2771/rust-caching-sr-client branch from a2aae64 to 6a88bbb Compare July 12, 2024 22:56
@oleiman
Copy link
Member Author

oleiman commented Jul 12, 2024

force push contents:

  • move cache code into sr crate and make private
  • remove unnecessary Cache trait
  • s/UnsafeCell/RefCell/ and minor changes to make copacetic
  • remove some unnecessary copies

@oleiman oleiman requested a review from rockwotj July 15, 2024 22:28
src/transform-sdk/rust/sr/src/lib.rs Outdated Show resolved Hide resolved
}

fn get(&self, k: &Key) -> Option<Value> {
if let Some(v) = self.underlying.borrow_mut().get(k) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need a mutable borrow here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, presumably since the cache needs to update it's recency accounting. It's in the API anyway.

src/transform-sdk/rust/sr/src/lib.rs Outdated Show resolved Hide resolved
underlying: RefCell<LruCacheImpl<K, V>>,
}

#[allow(dead_code)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the dead code? Is this because we always pass max_entries as None? if that's so can we either fix that or move the scope of this allow to that function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this to silence a warning in the commit that introduced the struct. removed now at first usage.

oleiman added 5 commits July 25, 2024 11:11
For convenience during schema-based record serde.

Also deprecates the slightly clunkier extract_id and migrates
the SR/rust example transform to the new version.

Includes simple roundtrip unit tests.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
The idea here is to wrap an RefCell<LruCache<>> so that
we can have interior mutability for cache operations

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Encapsulates a non-caching client impl AND LruCaches keyed on SchemaId
and SubjectVersion.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
@oleiman oleiman force-pushed the xform/sdk/core-2771/rust-caching-sr-client branch from 6a88bbb to d74df0f Compare July 25, 2024 18:13
@oleiman
Copy link
Member Author

oleiman commented Jul 25, 2024

force push CR changes

@oleiman oleiman requested a review from rockwotj July 25, 2024 18:15
@oleiman oleiman merged commit e44dece into redpanda-data:dev Jul 25, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/wasm WASM Data Transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants