Skip to content

Commit

Permalink
chore(arrow): Support WKT and WKB geoarrow encodings (v4.1) (#2798)
Browse files Browse the repository at this point in the history
Signed-off-by: Xun Li <lixun910@gmail.com>
Co-authored-by: Xun Li <lixun910@gmail.com>
  • Loading branch information
ibgreen and lixun910 authored Nov 21, 2023
1 parent 4dc810f commit 323bc86
Show file tree
Hide file tree
Showing 37 changed files with 291 additions and 142 deletions.
1 change: 1 addition & 0 deletions docs/modules/arrow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ npm install @loaders.gl/core @loaders.gl/arrow
| -------------------------------------------------------------------- |
| [`ArrowLoader`](/docs/modules/arrow/api-reference/arrow-loader) |
| [`ArrowWorkerLoader`](/docs/modules/arrow/api-reference/arrow-loader) |
| [`GeoArrowLoader`](/docs/modules/arrow/api-reference/geoarrow-loader) |

| Writer |
| -------------------------------------------------------------- |
Expand Down
2 changes: 0 additions & 2 deletions docs/modules/arrow/api-reference/arrow-loader.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@

![arrow-logo](../images/apache-arrow-small.png)

> The Arrow loaders are still under development.
The `ArrowLoader` parses the Apache Arrow columnar table format.

| Loader | Characteristic |
Expand Down
29 changes: 29 additions & 0 deletions docs/modules/arrow/api-reference/geoarrow-loader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# GeoArrowLoader

![arrow-logo](../images/apache-arrow-small.png)

The `GeoArrowLoader` parses Apache Arrow columnar table format files, and looks for `GeoArrow` type extensions to parse geometries from the table.

| Loader | Characteristic |
| --------------------- | ------------------------------------------------------------------------- |
| File Format | [IPC: Encapsulated Message Format](https://arrow.apache.org/docs/format/Columnar.html#serialization-and-interprocess-communication-ipc) |
| Data Format | [Geometry Table](/docs/specifications/category-table) |
| File Extension | `.arrow` |
| File Type | Binary |
| Decoder Type | `load`, `parse`, `parseSync`, `parseInBatches` |
| Worker Thread Support | Yes |
| Streaming Support | Yes |

## Usage

```typescript
import {GeoArrowLoader} from '@loaders.gl/arrow';
import {load} from '@loaders.gl/core';

const data = await load(url, GeoArrowLoader, options);
```

## Options

| Option | Type | Default | Description |
| ------ | ---- | ------- | ----------- |
9 changes: 8 additions & 1 deletion docs/upgrade-guide.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
# Upgrade Guide

## Upgrading to loaders.gl v4.0
## Upgrading to v4.1

**@loaders.gl/wkt**

- `WKBLoader`/`TWKBLoader`/`HexWKBLoader` - The default `shape` is now `geojson-geometry` rather than `binary-geometry`. If you were relying on `binary-geometry`, just add add a `shape: 'binary-geometry'` option, as in `load(..., WKBLoader, {wkb: {shape: 'binary-geometry}})`.
- The `geometry` shape is deprecated, and now called `geojson-geometry`.

## Upgrading to v4.0

**Node.js v18+**

Expand Down
9 changes: 9 additions & 0 deletions docs/whats-new.mdx
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# What's New

## v4.1 (In development)

Target Release Date: Early 2024

**@loaders.gl/arrow**

- New [`GeoArrowLoader`](/docs/modules/arrow/api-reference/geoarrow-loader) supports loading [GeoArrow](/docs/modules/arrow/formats/geoarrow) files.
- New documentation for [Arrow](/docs/modules/arrow/formats/arrow) and [GeoArrow](/docs/modules/arrow/formats/geoarrow) formats.

## v4.0

Release Date: Oct 30, 2023
Expand Down
1 change: 0 additions & 1 deletion fruits.parquet

This file was deleted.

1 change: 1 addition & 0 deletions modules/arrow/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@
"@loaders.gl/gis": "4.0.4",
"@loaders.gl/loader-utils": "4.0.4",
"@loaders.gl/schema": "4.0.4",
"@loaders.gl/wkt": "4.0.4",
"@math.gl/polygon": "4.0.0",
"apache-arrow": "^13.0.0"
},
Expand Down
4 changes: 0 additions & 4 deletions modules/arrow/src/geoarrow-loader.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,6 @@ import type {ArrowTable, ArrowTableBatch} from './lib/arrow-table';
import {parseGeoArrowSync} from './parsers/parse-geoarrow-sync';
import {parseGeoArrowInBatches} from './parsers/parse-geoarrow-in-batches';

// __VERSION__ is injected by babel-plugin-version-inline
// @ts-ignore TS2304: Cannot find name '__VERSION__'.
const VERSION = typeof __VERSION__ !== 'undefined' ? __VERSION__ : 'latest';

export type GeoArrowLoaderOptions = LoaderOptions & {
arrow?: {
shape: 'arrow-table' | 'binary-geometry';
Expand Down
Original file line number Diff line number Diff line change
@@ -1,81 +1,95 @@
// loaders.gl, MIT license
// Copyright (c) vis.gl contributors

import * as arrow from 'apache-arrow';
// import * as arrow from 'apache-arrow';
import {
Feature,
MultiPolygon,
Position,
Polygon,
MultiPoint,
Point,
MultiLineString,
LineString
LineString,
Geometry,
BinaryGeometry
} from '@loaders.gl/schema';
import type {GeoArrowEncoding} from '@loaders.gl/gis';

type RawArrowFeature = {
data: arrow.Vector;
encoding?: GeoArrowEncoding;
};
import {binaryToGeometry, type GeoArrowEncoding} from '@loaders.gl/gis';
import {WKBLoader, WKTLoader} from '@loaders.gl/wkt';

/**
* parse geometry from arrow data that is returned from processArrowData()
* NOTE: this function could be duplicated with the binaryToFeature() in deck.gl,
* it is currently only used for picking because currently deck.gl returns only the index of the feature
* So the following functions could be deprecated once deck.gl returns the feature directly for binary geojson layer
* NOTE: this function could be deduplicated with the binaryToFeature() in deck.gl,
* it is currently used for deck.gl picking because currently deck.gl returns only the index of the feature
*
* @param rawData the raw geometry data returned from processArrowData, which is an object with two properties: encoding and data
* @see processArrowData
* @param data data extraced from arrow vector representing a geometry
* @param encoding the geoarrow encoding of the geometry column
* @returns Feature or null
*/
export function parseGeometryFromArrow(rawData: RawArrowFeature): Feature | null {
const encoding = rawData.encoding?.toLowerCase() as typeof rawData.encoding;
const data = rawData.data;
if (!encoding || !data) {
export function parseGeometryFromArrow(
arrowCellValue: any,
encoding?: GeoArrowEncoding
): Geometry | null {
// sanity
encoding = encoding?.toLowerCase() as GeoArrowEncoding;
if (!encoding || !arrowCellValue) {
return null;
}

let geometry;
let geometry: Geometry;

switch (encoding) {
case 'geoarrow.multipolygon':
geometry = arrowMultiPolygonToFeature(data);
geometry = arrowMultiPolygonToFeature(arrowCellValue);
break;
case 'geoarrow.polygon':
geometry = arrowPolygonToFeature(data);
geometry = arrowPolygonToFeature(arrowCellValue);
break;
case 'geoarrow.multipoint':
geometry = arrowMultiPointToFeature(data);
geometry = arrowMultiPointToFeature(arrowCellValue);
break;
case 'geoarrow.point':
geometry = arrowPointToFeature(data);
geometry = arrowPointToFeature(arrowCellValue);
break;
case 'geoarrow.multilinestring':
geometry = arrowMultiLineStringToFeature(data);
geometry = arrowMultiLineStringToFeature(arrowCellValue);
break;
case 'geoarrow.linestring':
geometry = arrowLineStringToFeature(data);
geometry = arrowLineStringToFeature(arrowCellValue);
break;
case 'geoarrow.wkb':
throw Error(`GeoArrow encoding not supported ${encoding}`);
geometry = arrowWKBToFeature(arrowCellValue);
break;
case 'geoarrow.wkt':
throw Error(`GeoArrow encoding not supported ${encoding}`);
geometry = arrowWKTToFeature(arrowCellValue);
break;
default: {
throw Error(`GeoArrow encoding not supported ${encoding}`);
}
}
return {
type: 'Feature',
geometry,
properties: {}
};

return geometry;
}

function arrowWKBToFeature(arrowCellValue: any) {
// The actual WKB array buffer starts from byteOffset and ends at byteOffset + byteLength
const arrayBuffer: ArrayBuffer = arrowCellValue.buffer.slice(
arrowCellValue.byteOffset,
arrowCellValue.byteOffset + arrowCellValue.byteLength
);
const binaryGeometry = WKBLoader.parseSync?.(arrayBuffer)! as BinaryGeometry;
const geometry = binaryToGeometry(binaryGeometry);
return geometry;
}

function arrowWKTToFeature(arrowCellValue: any) {
const string: string = arrowCellValue;
return WKTLoader.parseTextSync?.(string)!;
}

/**
* convert Arrow MultiPolygon to geojson Feature
*/
function arrowMultiPolygonToFeature(arrowMultiPolygon: arrow.Vector): MultiPolygon {
function arrowMultiPolygonToFeature(arrowMultiPolygon: any): MultiPolygon {
const multiPolygon: Position[][][] = [];
for (let m = 0; m < arrowMultiPolygon.length; m++) {
const arrowPolygon = arrowMultiPolygon.get(m);
Expand All @@ -102,7 +116,7 @@ function arrowMultiPolygonToFeature(arrowMultiPolygon: arrow.Vector): MultiPolyg
/**
* convert Arrow Polygon to geojson Feature
*/
function arrowPolygonToFeature(arrowPolygon: arrow.Vector): Polygon {
function arrowPolygonToFeature(arrowPolygon: any): Polygon {
const polygon: Position[][] = [];
for (let i = 0; arrowPolygon && i < arrowPolygon.length; i++) {
const arrowRing = arrowPolygon.get(i);
Expand All @@ -124,7 +138,7 @@ function arrowPolygonToFeature(arrowPolygon: arrow.Vector): Polygon {
/**
* convert Arrow MultiPoint to geojson MultiPoint
*/
function arrowMultiPointToFeature(arrowMultiPoint: arrow.Vector): MultiPoint {
function arrowMultiPointToFeature(arrowMultiPoint: any): MultiPoint {
const multiPoint: Position[] = [];
for (let i = 0; arrowMultiPoint && i < arrowMultiPoint.length; i++) {
const arrowPoint = arrowMultiPoint.get(i);
Expand All @@ -133,29 +147,27 @@ function arrowMultiPointToFeature(arrowMultiPoint: arrow.Vector): MultiPoint {
multiPoint.push(coord);
}
}
const geometry: MultiPoint = {
return {
type: 'MultiPoint',
coordinates: multiPoint
};
return geometry;
}

/**
* convert Arrow Point to geojson Point
*/
function arrowPointToFeature(arrowPoint: arrow.Vector): Point {
function arrowPointToFeature(arrowPoint: any): Point {
const point: Position = Array.from(arrowPoint);
const geometry: Point = {
return {
type: 'Point',
coordinates: point
};
return geometry;
}

/**
* convert Arrow MultiLineString to geojson MultiLineString
*/
function arrowMultiLineStringToFeature(arrowMultiLineString: arrow.Vector): MultiLineString {
function arrowMultiLineStringToFeature(arrowMultiLineString: any): MultiLineString {
const multiLineString: Position[][] = [];
for (let i = 0; arrowMultiLineString && i < arrowMultiLineString.length; i++) {
const arrowLineString = arrowMultiLineString.get(i);
Expand All @@ -169,17 +181,16 @@ function arrowMultiLineStringToFeature(arrowMultiLineString: arrow.Vector): Mult
}
multiLineString.push(lineString);
}
const geometry: MultiLineString = {
return {
type: 'MultiLineString',
coordinates: multiLineString
};
return geometry;
}

/**
* convert Arrow LineString to geojson LineString
*/
function arrowLineStringToFeature(arrowLineString: arrow.Vector): LineString {
function arrowLineStringToFeature(arrowLineString: any): LineString {
const lineString: Position[] = [];
for (let i = 0; arrowLineString && i < arrowLineString.length; i++) {
const arrowCoord = arrowLineString.get(i);
Expand All @@ -188,9 +199,8 @@ function arrowLineStringToFeature(arrowLineString: arrow.Vector): LineString {
lineString.push(coords);
}
}
const geometry: LineString = {
return {
type: 'LineString',
coordinates: lineString
};
return geometry;
}
2 changes: 1 addition & 1 deletion modules/arrow/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ export {

export {updateBoundsFromGeoArrowSamples} from './geoarrow/get-arrow-bounds';

export {parseGeometryFromArrow} from './geoarrow/convert-geoarrow-to-geojson';
export {parseGeometryFromArrow} from './geoarrow/convert-geoarrow-to-geojson-geometry';

export {convertArrowToGeoJSONTable} from './tables/convert-arrow-to-geojson-table';

Expand Down
23 changes: 16 additions & 7 deletions modules/arrow/src/tables/convert-arrow-to-geojson-table.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
// Copyright (c) vis.gl contributors

import type {Feature, GeoJSONTable} from '@loaders.gl/schema';
import type * as arrow from 'apache-arrow';
import * as arrow from 'apache-arrow';
import type {ArrowTable} from '../lib/arrow-table';
import {serializeArrowSchema, parseGeometryFromArrow} from '@loaders.gl/arrow';
import {getGeometryColumnsFromSchema} from '@loaders.gl/gis';
Expand Down Expand Up @@ -34,15 +34,24 @@ export function convertArrowToGeoJSONTable(table: ArrowTable): GeoJSONTable {

const features: Feature[] = [];

for (let row = 0; row < arrowTable.numRows; row++) {
// get first geometry from arrow geometry column
const arrowGeometry = arrowTable.getChild('geometry')?.get(row);
const arrowGeometryObject = {encoding, data: arrowGeometry};
// Remove geometry columns
const propertyColumnNames = arrowTable.schema.fields
.map((field) => field.name)
// TODO - this deletes all geometry columns
.filter((name) => !(name in geometryColumns));
const propertiesTable = arrowTable.select(propertyColumnNames);

const arrowGeometryColumn = arrowTable.getChild('geometry');

for (let row = 0; row < arrowTable.numRows; row++) {
// get the geometry value from arrow geometry column
// Note that type can vary
const arrowGeometry = arrowGeometryColumn?.get(row);
// parse arrow geometry to geojson feature
const feature = parseGeometryFromArrow(arrowGeometryObject);
const feature = parseGeometryFromArrow(arrowGeometry, encoding);
if (feature) {
features.push(feature);
const properties = propertiesTable.get(row)?.toJSON() || {};
features.push({type: 'Feature', geometry: feature, properties});
}
}

Expand Down
18 changes: 18 additions & 0 deletions modules/arrow/test/data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,21 @@
- `dictionary.arrow`, `simple.arrow`, `struct.arrow` - Apache 2 License (copied from https://github.com/wesm/arrow-1)

- `biogrid-nodes.arrow` - from graphistry.


## geoarrow

```sh
ogr2ogr point_wkb.arrow point.arrow -f Arrow -lco COMPRESSION=NONE -lco GEOMETRY_ENCODING=WKB
ogr2ogr line_wkb.arrow line.arrow -f Arrow -lco COMPRESSION=NONE -lco GEOMETRY_ENCODING=WKB
ogr2ogr polygon_wkb.arrow polygon.arrow -f Arrow -lco COMPRESSION=NONE -lco GEOMETRY_ENCODING=WKB
ogr2ogr multipolygon_wkb.arrow multipolygon.arrow -f Arrow -lco COMPRESSION=NONE -lco GEOMETRY_ENCODING=WKB
ogr2ogr multipolygon_hole_wkb.arrow multipolygon_hole.arrow -f Arrow -lco COMPRESSION=NONE -lco GEOMETRY_ENCODING=WKB


ogr2ogr point_wkt.arrow point.arrow -f Arrow -lco COMPRESSION=NONE -lco GEOMETRY_ENCODING=WKT
ogr2ogr line_wkt.arrow line.arrow -f Arrow -lco COMPRESSION=NONE -lco GEOMETRY_ENCODING=WKT
ogr2ogr polygon_wkt.arrow polygon.arrow -f Arrow -lco COMPRESSION=NONE -lco GEOMETRY_ENCODING=WKT
ogr2ogr multipolygon_wkt.arrow multipolygon.arrow -f Arrow -lco COMPRESSION=NONE -lco GEOMETRY_ENCODING=WKT
ogr2ogr multipolygon_hole_wkt.arrow multipolygon_hole.arrow -f Arrow -lco COMPRESSION=NONE -lco GEOMETRY_ENCODING=WKT
```
Binary file added modules/arrow/test/data/geoarrow/line_wkb.arrow
Binary file not shown.
Binary file added modules/arrow/test/data/geoarrow/line_wkt.arrow
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added modules/arrow/test/data/geoarrow/point_wkb.arrow
Binary file not shown.
Binary file added modules/arrow/test/data/geoarrow/point_wkt.arrow
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading

0 comments on commit 323bc86

Please sign in to comment.