Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-5092] Transform SDK: Schema Registry client in C++ #21292

Merged
merged 8 commits into from
Jul 17, 2024

Conversation

oleiman
Copy link
Member

@oleiman oleiman commented Jul 8, 2024

Primarily in service of extending Schema Registry support to the JavaScript SDK, this commit implements the full SR ABI v0 in the C++ SDK.

TODO (exclusive of mundane cleanup):

  • Type documentation
  • improve error code forwarding across SDK API boundary
  • organization of namespces could use a second look
  • schema_registry_client::new_client() could be a free function
  • Finish building out examples/schema_registry.cc punting on this as the javascript version will test a superset of functionality.
    • Avro dependency for deser (ingress)
    • JSON dependency for ser (egress)
    • Enable SR integration test

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

Improvements

  • Add schema registry support to experimental Data Transforms C++ SDK

@oleiman oleiman self-assigned this Jul 8, 2024
@github-actions github-actions bot added the area/wasm WASM Data Transforms label Jul 8, 2024
@oleiman oleiman changed the title Transform SDK: Schema Registry client in C++ [CORE-5092] Transform SDK: Schema Registry client in C++ Jul 8, 2024
@oleiman oleiman force-pushed the xform/sdk/core-5092/sr-c++ branch 8 times, most recently from 0310bde to e78412b Compare July 15, 2024 22:18
@oleiman oleiman marked this pull request as ready for review July 15, 2024 22:26
@oleiman oleiman requested review from a team, michael-redpanda and rockwotj and removed request for a team July 15, 2024 22:26
@oleiman oleiman force-pushed the xform/sdk/core-5092/sr-c++ branch from e78412b to 482fef6 Compare July 15, 2024 23:31
@oleiman
Copy link
Member Author

oleiman commented Jul 15, 2024

force push: minor QoL tweak to simple_named_type. Hopefully we don't wind up reimplementing the whole thing 😅

schema_version version;

private:
friend bool operator==(const reference&, const reference&) = default;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this to disable comparison?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, this should generate a default overload for the reference struct.

Per the standard, if it's not a member it must be a friend. For some "reason" i mildly prefer the friend formulation to the non-static member 🤷‍♂️

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the privateness? doesn't matter since it's not a member. Private friend feels natural to me for whatever reason

Comment on lines 339 to 342
[[nodiscard]] const sr::schema& schema() const { return _schema; }
[[nodiscard]] const std::string& subject() const { return _subject; }
[[nodiscard]] schema_version version() const { return _version; }
[[nodiscard]] schema_id id() const { return _id; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. +1 nodiscard

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All credit to surrounding code. But yeah, feels good

Copy link
Contributor

@michael-redpanda michael-redpanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool

src/transform-sdk/cpp/src/transform_sdk.cc Outdated Show resolved Hide resolved
be_id.resize(sizeof(schema_id));
std::memcpy(be_id.data(), sid.data(), sizeof(schema_id));
bytes result = MAGIC_BYTES;
result.append_range(be_id | std::views::reverse);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also htonl (https://linux.die.net/man/3/htonl) but this is fine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already require C++23 because of std::expected, so let's use byteswap: https://en.cppreference.com/w/cpp/numeric/byteswap

Copy link
Member Author

@oleiman oleiman Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoided hton because I think the swap is conditional on host endianness, which shouldn't be an issue in practice but feels like a mild (if extremely common) abuse of the API. might be wrong about that though.

byteswap

killer. wasn't aware

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well if the device is big endian and we receive data in network byte order (which is big endian) hton* is a no-op.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's kinda my point - the endianness is well-defined on both ends (Wasm vs SR header), so what I want is a byteswap, unconditionally. no functional difference in this case though.

src/transform-sdk/cpp/include/redpanda/transform_sdk.h Outdated Show resolved Hide resolved
src/transform-sdk/cpp/include/redpanda/transform_sdk.h Outdated Show resolved Hide resolved
src/transform-sdk/cpp/include/redpanda/transform_sdk.h Outdated Show resolved Hide resolved
src/transform-sdk/cpp/include/redpanda/transform_sdk.h Outdated Show resolved Hide resolved
src/transform-sdk/cpp/include/redpanda/transform_sdk.h Outdated Show resolved Hide resolved
src/transform-sdk/cpp/include/redpanda/transform_sdk.h Outdated Show resolved Hide resolved
src/transform-sdk/cpp/include/redpanda/transform_sdk.h Outdated Show resolved Hide resolved
src/transform-sdk/cpp/include/redpanda/transform_sdk.h Outdated Show resolved Hide resolved
be_id.resize(sizeof(schema_id));
std::memcpy(be_id.data(), sid.data(), sizeof(schema_id));
bytes result = MAGIC_BYTES;
result.append_range(be_id | std::views::reverse);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already require C++23 because of std::expected, so let's use byteswap: https://en.cppreference.com/w/cpp/numeric/byteswap

Comment on lines +783 to +812
const uint32_t raw_id = (static_cast<uint32_t>(id_bytes[3]) << 0U)
| (static_cast<uint32_t>(id_bytes[2]) << 8U)
| (static_cast<uint32_t>(id_bytes[1]) << 16U)
| (static_cast<uint32_t>(id_bytes[0]) << 24U);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Torn on casting to uint32 and using byteswap... this is probably fine :)

I always get tripped on removing UB in these things: https://justine.lol/endian.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, I think as long as there's no intermediate promotion to signed anything, we're all good. if i can avoid typing memcpy in these situations I'm basically happy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only reservation here is that this assumes we're always running on a little endian device.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasm is little endian.

See list here: https://webassembly.org/docs/portability/

Straightforward approximation of named_type, intended for the
schema registry client interface and providing access to the
value of and a non-const pointer to the underlying value.
@oleiman
Copy link
Member Author

oleiman commented Jul 17, 2024

force push CR comments, various and sundry:

  • supporting classes can be plain structs
  • some minor modernizations (views, c++23 features, constants)

@oleiman oleiman force-pushed the xform/sdk/core-5092/sr-c++ branch from 7ec02d3 to b3e6799 Compare July 17, 2024 18:16
@oleiman
Copy link
Member Author

oleiman commented Jul 17, 2024

and fix SR example

rockwotj
rockwotj previously approved these changes Jul 17, 2024
Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/transform-sdk/cpp/include/redpanda/transform_sdk.h Outdated Show resolved Hide resolved
oleiman added 2 commits July 17, 2024 11:36
- enum class schema_format
- schema_id (named int32)
- schema_version (named int32)
- struct reference{name, subject, version}
- class schema{raw, format, reference[]}
- class subject_schema{schema, subject, version, ID}
oleiman added 5 commits July 17, 2024 11:36
We don't have an easy way to support avro serde from c++/wasm, so
this isn't suitable for the existing integration test in its current
form.

However, it's useful for manual verification and as a reference.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
@oleiman
Copy link
Member Author

oleiman commented Jul 17, 2024

force push s/references_c/reference_container/

@michael-redpanda michael-redpanda self-requested a review July 17, 2024 18:43
@oleiman oleiman merged commit 6032904 into redpanda-data:dev Jul 17, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/wasm WASM Data Transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants