-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud_storage: serde for segment_meta_cstore #9102
cloud_storage: serde for segment_meta_cstore #9102
Conversation
a345a36
to
b4e8c4d
Compare
f75c1d8
to
71ae315
Compare
@@ -18,6 +18,7 @@ | |||
#include <seastar/core/circular_buffer.hh> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serde/serde.h: added support for std::array as size,elements (+ throw)
what does + throw
mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that it will throw a serde_exception if the serialized size does not match the std::array size
src/v/serde/serde.h
Outdated
@@ -287,6 +288,9 @@ void write(iobuf& out, T t); | |||
template<typename T> | |||
void write(iobuf& out, std::vector<T> t); | |||
|
|||
template<typename T, size_t Size> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::size_t
is used in the concept, but size_t
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -31,30 +31,33 @@ static constexpr uint32_t FOR_buffer_depth = 16; | |||
*/ | |||
struct delta_xor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with a nttp value
what is nttp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acronym for Non-Type Template Parameter. It's a really bad name for a template parameter that is a value and not a type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, of course. thanks
@@ -713,6 +729,20 @@ void read_nested(iobuf_parser& in, T& t, std::size_t const bytes_left_limit) { | |||
for (auto i = 0U; i < size; ++i) { | |||
t.push_back(read_nested<value_type>(in, bytes_left_limit)); | |||
} | |||
} else if constexpr (reflection::is_std_array<Type>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the implication of this? Curious i don't know myself but would the compiler generate a new branch of generated code here once per every type of std::array with a different size parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but would the compiler generate a new branch of generated code here once per every type of std::array with a different size parameter?
yeh, that's the way i understand it. i think that alone is a good argument for not supporting std::array. OTOH, we probably have very few instances of it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should isntead support std::span<T, std::dynamic_extent>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's probably ok unless we have any good alternative ideas. the most important thing is that we keep an on-disk format that doesn't constrain us much. and i think we have here: its the same as vector, for exmaple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes std::span seems a good idea. let me try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://gcc.godbolt.org/z/sf3GoeoGj did a quick test. gcc will produce reasonable code, but clang needs attribute((noinline)).
If it's ok i can use this code.
another option is to only add support for std::span in serde, and i can use serde_write/serde_read to convert std::array to std::span.
let me know if I should go with the first or the second option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah i'd vote to keep it as it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: benchmark test for the serialisation will be very helpful going forward, otherwise LGTM
Since I have to fix conflicts and modify how std::array is serialized, I'll also add a benchmark test |
serialization is done as std::vector, deserialization will check that size matches the std::tuple_size<std_array_t>, and will throw if different. This is done as a safety measure in case of a size change without a change in decoding logic of old binary data. It's meant to catch bugs during testing, not as a recoverable error.
this in done in a retro-compatible way, so as to not touch preexisting run-time use. delta_for_encoder can now be used with a runtime version of delta_* algo or with a nttp value. at the same time, delta_for_encoder and delta_stream_pos_t are serde-enabled, and serde is aware if delta_* is nttp or not, reducing slightly the amount of data used
serde is done naively via serde::envelope. std::list and absl::btree_map are not serde enabled, they are processed in their containing class via serde_write/read mirroring what it's done for other sequence/map types (size,[elements...]/[(key,value)...]). the nttp version of delta_* algorithm is used to remove the need to serde it. this has a cascading effect of removing the need for some user defined constructors. column_store::hint_vec_t is transformed from a std::tuple to a std::array since all the elements of the tuple were of type hint_t. memory and performance wise there should be no difference, but it's compatible with serde. a unit test is added that serializes and deserializes a segment_meta_cstore and checks the elements
71ae315
to
ee99486
Compare
force push: rebase and benchmark for serialization. |
ducktape failure https://buildkite.com/redpanda/redpanda/builds/24707#0186c6dd-483a-457d-b52d-def05a028b21 issue is #9315 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR builds upon #7898
Fixes #9232
Adds segment_meta_cstore::from_iobuf/to_iobuf via serde of the sub-objects.
most of the work is done via serde_fields and serde::envelope, but some manual work is done via serde_write/serde_read, to support std::list and absl::btree_map
std::array is added to the list of containers recognized by serde. serialization is done like other sequence containers, as size,[elements...]. Deserialization first checks the serialized size against std::tuple_size<std_array_t> and throws in case of mismatch. this is done to catch bugs where the size of the std::array is changed in code, but the writer didn't add decoding logic for older binary data. it's not meant to be a recoverable error.
a new unit test checks round-trip serialization/deserialization
a benchmark check serialization/deserialization against a memcpy baseline
Backports Required
UX Changes
Release Notes