Documentation examples to also include serializing to Rust examples #1077
Replies: 6 comments
-
Good point. I have been trying improve the docs to achieve that. In main we have something like that already: https://github.com/jorgecarleitao/arrow2/blob/main/src/array/utf8/mod.rs#L37 but improvements are always welcome ^^ |
Beta Was this translation helpful? Give feedback.
-
Awesome! I'll take notes about how I convert them and maybe write a guide, it would be handy for the getting started guide that exists. |
Beta Was this translation helpful? Give feedback.
-
I'd also love to see documentation soon this, particularly how to convert |
Beta Was this translation helpful? Give feedback.
-
@jkcoxson, thanks for your interest! Could you provide the function signature that you are looking for? (e.g. This because the arrow spec is quite extensive and it is a bit difficult to narrow this issue to a concrete usecase. |
Beta Was this translation helpful? Give feedback.
-
I found the guide super helpful for figuring out how to read the underlying data, and the arrow2 code is incredibly helpful for reading the different types. Understanding how each of the different types need to be mapped from Arrow is important so you need to know what to work with. For example, I need to read an Arrow stream file into chunks, then get the arrays out. I don't need all array types Arrow supports, as my implementation only needs UTF8Array, Binary, Boolean, etc. The guide here is a great starting point for reading chunks. Now onto your question, the examples directory should help or reading the comments is super useful (the utf8 code above is fantastic). What I do is loop over chunks, then the arrays: for chunk in chunks {
// I need the array index for something else here, if you don't you could simplify this
for (array_index, array) in chunk.arrays().iter().enumerate() { Then I can match the array type (you can assign it to a variable, do more work with it, whatever): match array.data_type() {
DataType::Int8 => {
array
.as_any()
.downcast_ref::<PrimitiveArray<i8>>()
.unwrap()
.iter()
.collect::<Vec<Option<&i8>>>();
}
... I've found it really fast to convert to Rust types this way, the slowest part of my application in terms of Rust is reading the file into bytes (file size is 10-20mb for me), not the actual processing 🎉. |
Beta Was this translation helpful? Give feedback.
-
Might be worth converting this into a discussion @jorgecarleitao :) |
Beta Was this translation helpful? Give feedback.
-
Hi!
For the arrays there is a great line about conceptualising how an Arrow concept fits into Rust, for example this handy snippet:
Are there thoughts around also providing examples to convert this back to Rust
Vec<Option<String>>
(or whatever the context is) in an efficient/recommended way?Beta Was this translation helpful? Give feedback.
All reactions