-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE-5092] Transform SDK: Schema Registry client in C++ #21292
[CORE-5092] Transform SDK: Schema Registry client in C++ #21292
Conversation
0310bde
to
e78412b
Compare
e78412b
to
482fef6
Compare
force push: minor QoL tweak to simple_named_type. Hopefully we don't wind up reimplementing the whole thing 😅 |
schema_version version; | ||
|
||
private: | ||
friend bool operator==(const reference&, const reference&) = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this to disable comparison?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, this should generate a default overload for the reference struct.
Per the standard, if it's not a member it must be a friend. For some "reason" i mildly prefer the friend formulation to the non-static member 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, the privateness? doesn't matter since it's not a member. Private friend feels natural to me for whatever reason
[[nodiscard]] const sr::schema& schema() const { return _schema; } | ||
[[nodiscard]] const std::string& subject() const { return _subject; } | ||
[[nodiscard]] schema_version version() const { return _version; } | ||
[[nodiscard]] schema_id id() const { return _id; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice. +1 nodiscard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All credit to surrounding code. But yeah, feels good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very cool
be_id.resize(sizeof(schema_id)); | ||
std::memcpy(be_id.data(), sid.data(), sizeof(schema_id)); | ||
bytes result = MAGIC_BYTES; | ||
result.append_range(be_id | std::views::reverse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also htonl
(https://linux.die.net/man/3/htonl) but this is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already require C++23 because of std::expected, so let's use byteswap: https://en.cppreference.com/w/cpp/numeric/byteswap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoided hton
because I think the swap is conditional on host endianness, which shouldn't be an issue in practice but feels like a mild (if extremely common) abuse of the API. might be wrong about that though.
byteswap
killer. wasn't aware
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well if the device is big endian and we receive data in network byte order (which is big endian) hton* is a no-op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's kinda my point - the endianness is well-defined on both ends (Wasm vs SR header), so what I want is a byteswap, unconditionally. no functional difference in this case though.
be_id.resize(sizeof(schema_id)); | ||
std::memcpy(be_id.data(), sid.data(), sizeof(schema_id)); | ||
bytes result = MAGIC_BYTES; | ||
result.append_range(be_id | std::views::reverse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already require C++23 because of std::expected, so let's use byteswap: https://en.cppreference.com/w/cpp/numeric/byteswap
const uint32_t raw_id = (static_cast<uint32_t>(id_bytes[3]) << 0U) | ||
| (static_cast<uint32_t>(id_bytes[2]) << 8U) | ||
| (static_cast<uint32_t>(id_bytes[1]) << 16U) | ||
| (static_cast<uint32_t>(id_bytes[0]) << 24U); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Torn on casting to uint32 and using byteswap... this is probably fine :)
I always get tripped on removing UB in these things: https://justine.lol/endian.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya, I think as long as there's no intermediate promotion to signed anything, we're all good. if i can avoid typing memcpy
in these situations I'm basically happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only reservation here is that this assumes we're always running on a little endian device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasm is little endian.
See list here: https://webassembly.org/docs/portability/
Straightforward approximation of named_type, intended for the schema registry client interface and providing access to the value of and a non-const pointer to the underlying value.
482fef6
to
7ec02d3
Compare
force push CR comments, various and sundry:
|
7ec02d3
to
b3e6799
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
- enum class schema_format - schema_id (named int32) - schema_version (named int32) - struct reference{name, subject, version} - class schema{raw, format, reference[]} - class subject_schema{schema, subject, version, ID}
With simple tests
We don't have an easy way to support avro serde from c++/wasm, so this isn't suitable for the existing integration test in its current form. However, it's useful for manual verification and as a reference. Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
b3e6799
to
1d94ebf
Compare
force push s/references_c/reference_container/ |
Primarily in service of extending Schema Registry support to the JavaScript SDK, this commit implements the full SR ABI v0 in the C++ SDK.
TODO (exclusive of mundane cleanup):
schema_registry_client::new_client()
could be a free functionFinish building outpunting on this as the javascript version will test a superset of functionality.examples/schema_registry.cc
Avro dependency for deser (ingress)JSON dependency for ser (egress)Enable SR integration testBackports Required
Release Notes
Improvements