Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize Message Metadata into Flatbuffers #32

Merged
merged 15 commits into from
Sep 21, 2023
Merged

Conversation

Benjamin-Philip
Copy link
Owner

This commit adds support to serialize message metadata into flatbuffers.

@Benjamin-Philip
Copy link
Owner Author

@josevalim This PR is still a draft...

@josevalim
Copy link
Collaborator

And the plan is looking good so far!

@Benjamin-Philip
Copy link
Owner Author

Benjamin-Philip commented Aug 4, 2023

I don't think I'm going to make any Erlang changes after this, so just as well I guess. :)

This commit adds the implementation and tests to serialize a RecordBatch.
In addition to serializing schema fields, there have been some soft
changes. In creating this commit, I've come to realise that Atoms are a
bad thing to keep in Rust structs. So, in this commit, I've also changed
Name and Dictionary to use enums to represent undefined instead of an
atom.
This commit serializes messages to their planus, or Rust form.
It does not serialize it to proper flatbuffers yet.
This commit adds the final commit required to serialize message metadata
to IPC. Note though, that this function has not been integrated with
`serde_arrow_ipc_message:to_ipc` function.
@Benjamin-Philip Benjamin-Philip marked this pull request as ready for review September 12, 2023 12:40
@Benjamin-Philip
Copy link
Owner Author

@josevalim, There haven't been any Erlang changes since you last reviewed, but you still may want to take a look.

pub type CustomMetadata = Vec<HashMap<String, String>>;

/// Returns an example `Message` of a `Schema`
pub fn schema() -> Message {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally don't like "utils" as a module name because it tends to become a blackhole of all things that don't quite have a place.

If I understood correctly, schema and message_batch a just dummy functions that will be replaced where they are currently called by actual code in the future.

If this is indeed the case, I think it would be best if they were prefixed with dummy_ and set in the message module instead.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are functions which return testing data, and are used solely in testing.
Presently they are are used in the test_decode function in crate, and in various Rust specific tests.

Should I still prefix them with dummy and move them or is keeping them in utils or in a test_utils good enough?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is even more indicative that this module should be something else :)
Maybe test/fixtures?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you won't need to prefix. Not sure if this would work, but maybe calling it as:

use test::fixtures
...
fixtures::schema(...)

would be readable enough

Copy link
Owner Author

@Benjamin-Philip Benjamin-Philip Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks sound good.

However, test_decode is not a Rust specific test - it's included in the NIF, and tested from Erlang which is why I'm not very sure on what to do with it.

This means that schema and record_batch are used out of Rust tests.

The functions prefixed arrow_ can probably go to test/fixtures.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should test_decode actually be exposed in the NIF?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's a function to test if everything decodes properly into Erlang.

This PR only implements the encoding part of this NIF, and I will have to deal with decoding later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that function will be removed as soon as the decoding is implemented, right?
I believe it's fine to leave it where it is then :)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put the arrow_* into a test::fixtures module in lib.rs. I hoping this is what you had in mind when you said test::fixtures? I wasn't able to find a way to keep them in a file in the tests dir and use them in unit tests in the src dir.

Since these are test only functions, it's better to keep them in a test
specific module.
Since the code in schema and record_batch are no longer dead, we need
not allow dead code in the clippy checks.
@@ -49,10 +46,156 @@ mod atoms {
}
}

#[cfg(test)]
pub mod test {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you could have this extracted to a test.rs file in the src directory. This is a non-blocker, though :)

@Benjamin-Philip Benjamin-Philip merged commit 44f9091 into main Sep 21, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants