Skip to content

Benjamin-Philip/serde_arrow

Repository files navigation

serde_arrow

Erlang implementation of the Apache Arrow in-memory columnar format.

As of right now, serde_arrow only provides serialization (write) of Erlang data structures into to Arrow. Support for deserialization (read) will be added soon.

We provide support for the Apache Arrow Columnar Format and the Apache Arrow IPC Format. Support for Flight RPC, Flight SQL, as well conversion of Arrow into other formats like Apache Parquet, Apache Avro, CSV and JSON is out of the scope of the project.

Build

In addition to an Erlang installation, you will need a Rust installation with cargo. You can then add the following to your rebar.config:

{serde_arrow, {git, "https://github.com/Benjamin-Philip/serde_arrow.git"}}

And compile!

$ rebar3 compile

Format Support

This implementation is still a work in progress. As mentioned earlier, we do not have read functionality as of right now, only write.

We support the following primitive data types:

  • Int 8/16/32/64
  • UInt 8/16/32/64
  • Float 32/64
  • Fixed Size Binary
  • Binary
  • Large Binary

and the following nested data types:

  • Fixed Size List
  • List
  • Large List

support for the other data types (both primitive and nested) will be added soon.

IPC Format Support

Currently we support all the 3 "formats":

  • Encapsulated Message Format
  • Stream Format
  • File Format

and the following message types:

  • Schema
  • RecordBatch

Support for the following will be added shortly:

  • Buffer compression
  • Endianness conversion
  • Custom schema metadata

Support for the following will be added post v0.1.0:

  • Dictionaries
  • Replacement dictionaries
  • Delta dictionaries
  • Tensors
  • Sparse Tensors

Releases

No releases published

Packages

No packages published