-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iceberg: initial data structures for logical data types #21415
Conversation
77d6336
to
66cb6c2
Compare
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51570#0190ba63-c9d2-43f5-9d32-452b1fe0b226 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51614#0190beee-8ac8-421c-ae3d-3e6f758c5e7c |
a1460f6
to
6c1b97f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! This looks really good.
I left a couple minor comments, but otherwise this looks ready to merge.
} | ||
bool operator==(const nested_field& lhs, const nested_field& rhs) { | ||
return lhs.id == rhs.id && lhs.required == rhs.required | ||
&& lhs.name == rhs.name && lhs.type == rhs.type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to require the names be equal? What is this used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I'm just using these to ensure my serialization is correct.
You're right though that schema type equivalence may have a different criteria. I'd argue though that if we want equivalence of just types, it should be a dedicated comparison method, rather than operator==.
6c1b97f
to
70d5400
Compare
Adds initial logical types[1] that will be used to represent an Iceberg schema. These include "primitives" (basic types like int, float, string, etc) as well as complex types (types that are composed of other types, like list, map, struct). Complex types are represented in Iceberg as composed of "nested fields" which are types that are themselves composed of other types (primitives or complex types). This PR introduces the recursive definition of these types, with a basic equality operator and ostream operator (note, this is for JSON serialization, that will come in a following commit). The implementation of these types is similar in style to the Iceberg Rust library, though we are using std::variants instead of Rust enums, since C++ enums aren't expressive enough to express complex nested enums with members.
Adds some helpers for operating on the json::Value class, that will be useful in building JSON parsing for Iceberg metadata types. Long term it may make sense to move these to some general purpose module, but since this is only used in Iceberg for now, and to avoid distractions, the utilities are just added to the Iceberg module.
70d5400
to
93b1ffd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
.match("binary", binary_type{}); | ||
} | ||
if (!v.IsObject()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handle the default match case with an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This already appears to be handled?
redpanda/src/v/strings/tests/string_switch_test.cc
Lines 58 to 70 in 015afa7
BOOST_CHECK_EXCEPTION( | |
string_switch<int8_t>("ccc").match("a", 0).match("b", 1). | |
operator int8_t(), | |
std::runtime_error, | |
[](const std::runtime_error& e) { | |
// check that the error string includes the string we were searching | |
// for as a weak hint to where the error occurred | |
if (!std::string(e.what()).ends_with("ccc")) { | |
BOOST_TEST_FAIL( | |
"Expected error message to end with ccc but was: " << e.what()); | |
}; | |
return true; | |
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh i thought it just fell through or something. sgtm
Very cool |
Adds JSON serialization for newly added Iceberg data types. A test is added to demonstrate the roundtrip with a type comprised of multiple nested fields.
93b1ffd
to
f4f231a
Compare
new failures in https://buildkite.com/redpanda/redpanda/builds/51624#0190bf88-4433-4b6d-b663-2861e98c87c9:
|
CI failure: #20224 |
Adds initial logical types[1] that will be used to represent an Iceberg schema. These include "primitives" (basic types like int, float, string, etc) as well as complex types (types that are composed of other types, like list, map, struct). Complex types are represented in Iceberg as composed of "nested fields" which are types that are themselves composed of other types (primitives or complex types).
This PR introduces the recursive definition of these types, and initial code to serialize them to and from JSON (Iceberg manifests include table schemas represented as JSON). The implementation of these types is similar in style to the Iceberg Rust library, though we are using std::variants instead of Rust enums, since C++ enums aren't expressive enough to express complex nested enums with members
For reference, here are implementations in other languages:
[1] https://iceberg.apache.org/spec/#schemas-and-data-types
Backports Required
Release Notes