Serialize Message Metadata into Flatbuffers #32

Benjamin-Philip · 2023-08-03T13:18:30Z

This commit adds support to serialize message metadata into flatbuffers.

Benjamin-Philip · 2023-08-04T00:21:21Z

@josevalim This PR is still a draft...

josevalim · 2023-08-04T07:38:24Z

And the plan is looking good so far!

Benjamin-Philip · 2023-08-04T11:53:57Z

I don't think I'm going to make any Erlang changes after this, so just as well I guess. :)

This commit adds the implementation and tests to serialize a RecordBatch.

In addition to serializing schema fields, there have been some soft changes. In creating this commit, I've come to realise that Atoms are a bad thing to keep in Rust structs. So, in this commit, I've also changed Name and Dictionary to use enums to represent undefined instead of an atom.

This commit serializes messages to their planus, or Rust form. It does not serialize it to proper flatbuffers yet.

This commit adds the final commit required to serialize message metadata to IPC. Note though, that this function has not been integrated with `serde_arrow_ipc_message:to_ipc` function.

Benjamin-Philip · 2023-09-12T14:21:17Z

@josevalim, There haven't been any Erlang changes since you last reviewed, but you still may want to take a look.

polvalente · 2023-09-19T04:13:50Z

native/arrow_format_nif/src/utils.rs

 pub type CustomMetadata = Vec<HashMap<String, String>>;
+
+/// Returns an example `Message` of a `Schema`
+pub fn schema() -> Message {


I generally don't like "utils" as a module name because it tends to become a blackhole of all things that don't quite have a place.

If I understood correctly, schema and message_batch a just dummy functions that will be replaced where they are currently called by actual code in the future.

If this is indeed the case, I think it would be best if they were prefixed with dummy_ and set in the message module instead.

They are functions which return testing data, and are used solely in testing.
Presently they are are used in the test_decode function in crate, and in various Rust specific tests.

Should I still prefix them with dummy and move them or is keeping them in utils or in a test_utils good enough?

I think this is even more indicative that this module should be something else :)
Maybe test/fixtures?

Then you won't need to prefix. Not sure if this would work, but maybe calling it as:

use test::fixtures ... fixtures::schema(...)

would be readable enough

Thanks sound good.

However, test_decode is not a Rust specific test - it's included in the NIF, and tested from Erlang which is why I'm not very sure on what to do with it.

This means that schema and record_batch are used out of Rust tests.

The functions prefixed arrow_ can probably go to test/fixtures.

Should test_decode actually be exposed in the NIF?

Yes, it's a function to test if everything decodes properly into Erlang.

This PR only implements the encoding part of this NIF, and I will have to deal with decoding later.

So that function will be removed as soon as the decoding is implemented, right?
I believe it's fine to leave it where it is then :)

I've put the arrow_* into a test::fixtures module in lib.rs. I hoping this is what you had in mind when you said test::fixtures? I wasn't able to find a way to keep them in a file in the tests dir and use them in unit tests in the src dir.

Since these are test only functions, it's better to keep them in a test specific module.

Since the code in schema and record_batch are no longer dead, we need not allow dead code in the clippy checks.

polvalente · 2023-09-21T01:02:02Z

native/arrow_format_nif/src/lib.rs

@@ -49,10 +46,156 @@ mod atoms {
    }
 }

+#[cfg(test)]
+pub mod test {


I believe you could have this extracted to a test.rs file in the src directory. This is a non-blocker, though :)

Benjamin-Philip added 2 commits August 3, 2023 18:45

Add Erlang side testing of serialize_message

2333233

Fix Clippy errors

18a0914

josevalim approved these changes Aug 3, 2023

View reviewed changes

Benjamin-Philip added 5 commits August 5, 2023 19:59

Serialize Record batches

8fd382d

This commit adds the implementation and tests to serialize a RecordBatch.

Serialize Types

f757522

Serialize Schemas

a901f42

Reduce test data duplication

477734b

Benjamin-Philip force-pushed the bp-rust-serde branch from 6558c27 to 477734b Compare September 11, 2023 01:38

Benjamin-Philip added 5 commits September 11, 2023 16:52

Fix failing test

ccf2549

Serialize Headers

0530d76

Serialize Messages to their Rust form

858e3e7

This commit serializes messages to their planus, or Rust form. It does not serialize it to proper flatbuffers yet.

Serialize messages to IPC

f8d1c33

This commit adds the final commit required to serialize message metadata to IPC. Note though, that this function has not been integrated with `serde_arrow_ipc_message:to_ipc` function.

Fix lints

c6d45ac

Benjamin-Philip marked this pull request as ready for review September 12, 2023 12:40

polvalente reviewed Sep 19, 2023

View reviewed changes

Benjamin-Philip added 3 commits September 20, 2023 19:24

Move utils::arrow_* functions to test::fixtures

192cb23

Since these are test only functions, it's better to keep them in a test specific module.

Remove FIXMEs for dead code

0491ae7

Since the code in schema and record_batch are no longer dead, we need not allow dead code in the clippy checks.

cargo fmt

d4bd388

Benjamin-Philip requested a review from polvalente September 20, 2023 14:00

polvalente reviewed Sep 21, 2023

View reviewed changes

polvalente approved these changes Sep 21, 2023

View reviewed changes

Benjamin-Philip merged commit 44f9091 into main Sep 21, 2023
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialize Message Metadata into Flatbuffers #32

Serialize Message Metadata into Flatbuffers #32

Benjamin-Philip commented Aug 3, 2023

Benjamin-Philip commented Aug 4, 2023

josevalim commented Aug 4, 2023

Benjamin-Philip commented Aug 4, 2023 •

edited

Loading

Benjamin-Philip commented Sep 12, 2023

polvalente Sep 19, 2023

Benjamin-Philip Sep 19, 2023

polvalente Sep 19, 2023

polvalente Sep 19, 2023

Benjamin-Philip Sep 19, 2023 •

edited

Loading

polvalente Sep 19, 2023

Benjamin-Philip Sep 20, 2023

polvalente Sep 20, 2023

Benjamin-Philip Sep 20, 2023

polvalente Sep 21, 2023

Serialize Message Metadata into Flatbuffers #32

Serialize Message Metadata into Flatbuffers #32

Conversation

Benjamin-Philip commented Aug 3, 2023

Benjamin-Philip commented Aug 4, 2023

josevalim commented Aug 4, 2023

Benjamin-Philip commented Aug 4, 2023 • edited Loading

Benjamin-Philip commented Sep 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Benjamin-Philip Sep 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Benjamin-Philip commented Aug 4, 2023 •

edited

Loading

Benjamin-Philip Sep 19, 2023 •

edited

Loading