Skip to content
This repository has been archived by the owner on Feb 16, 2021. It is now read-only.

uwdata/arquero-arrow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

arquero-arrow

NOTE: This package has been consolidated into the uwdata/arquero package, where future development and issues will be handled. This repository has been archived and is now read-only.


Arrow serialization support for Arquero. The toArrow(data) method encodes either an Arquero table or an array of objects into the Apache Arrow format. This package provides a convenient interface to the apache-arrow JavaScript library, while also providing more performant encoders for standard integer, float, date, boolean, and string dictionary types.

For more examples and context, see the Arquero and Apache Arrow notebook.

API Documentation

# aq.toArrow(input, types) · Source

Create an Apache Arrow table for an input dataset. The input data can be either an Arquero table or an array of standard JavaScript objects. This method will throw an error if type inference fails or if the generated columns have differing lengths.

  • input: An input dataset to convert to Arrow format. If array-valued, the data should consist of an array of objects where each entry represents a row and named properties represent columns. Otherwise, the input data should be an Arquero table.
  • options: Options for Arrow encoding.
    • columns: Ordered list of column names to include. If function-valued, the function should accept the input data as a single argument and return an array of column name strings.

    • limit: The maximum number of rows to include (default Infinity).

    • offset: The row offset indicating how many initial rows to skip (default 0).

    • types: An optional object indicating the Arrow data type to use for named columns. If specified, the input should be an object with column names for keys and Arrow data types for values. If a column's data type is not explicitly provided, type inference will be performed.

      Type values can either be instantiated Arrow DataType instances (for example, new Float64(),new DateMilliseconds(), etc.) or type enum codes (Type.Float64, Type.Date, Type.Dictionary). For convenience, arquero-arrow re-exports the apache-arrow Type enum object (see examples below). High-level types map to specific data type instances as follows:

      • Type.Datenew DateMilliseconds()
      • Type.Dictionarynew Dictionary(new Utf8(), new Int32())
      • Type.Floatnew Float64()
      • Type.Intnew Int32()
      • Type.Intervalnew IntervalYearMonth()
      • Type.Timenew TimeMillisecond()

      Types that require additional parameters (including List, Struct, and Timestamp) can not be specified using type codes. Instead, use data type constructors from apache-arrow, such as new List(new Int32()).

Examples

Encode Arrow data from an input Arquero table:

const { table } = require('arquero');
const { toArrow, Type } = require('arquero-arrow');

// create Arquero table
const dt = table({
  x: [1, 2, 3, 4, 5],
  y: [3.4, 1.6, 5.4, 7.1, 2.9]
});

// encode as an Arrow table (infer data types)
// here, infers Uint8 for 'x' and Float64 for 'y'
const at1 = toArrow(dt);

// encode into Arrow table (set explicit data types)
const at2 = toArrow(dt, {
  types: {
    x: Type.Uint16,
    y: Type.Float32
  }
});

// serialize Arrow table to a transferable byte array
const bytes = at1.serialize();

Register a toArrow() method for all Arquero tables (requires Arquero v2.3 or higher):

const { addPackage, table } = require('arquero');

// install the package bundle exported by arquero-arrow,
// adding a toArrow method to all Arquero tables
addPackage(require('arquero-arrow'));

// create Arquero table, encode as an Arrow table (infer data types)
const at = table({
  x: [1, 2, 3, 4, 5],
  y: [3.4, 1.6, 5.4, 7.1, 2.9]
}).toArrow();

Encode Arrow data from an input object array:

const { toArrow } = require('arquero-arrow');

// encode object array as an Arrow table (infer data types)
const at = toArrow([
  { x: 1, y: 3.4 },
  { x: 2, y: 1.6 },
  { x: 3, y: 5.4 },
  { x: 4, y: 7.1 },
  { x: 5, y: 2.9 }
]);

Build Instructions

To build and develop locally:

About

Arrow serialization support for Arquero.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published