Skip to content

Commit

Permalink
feat: add arrow backend
Browse files Browse the repository at this point in the history
uses apache arrow to read parquets and convert them to json
  • Loading branch information
dvirtz committed Jul 16, 2023
1 parent fbd6bcc commit 078103d
Show file tree
Hide file tree
Showing 38 changed files with 8,284 additions and 2,826 deletions.
15 changes: 10 additions & 5 deletions .cspell.json
Original file line number Diff line number Diff line change
@@ -1,14 +1,18 @@
{
"version": "0.1",
"version": "0.2",
"language": "en",
"words": [
"commitlint",
"dvirtz",
"kbajalc",
"prebuilds",
"tempy",
"Yitzchaki",
"commitlint"
"Yitzchaki"
],
"ignorePaths": [
"node_modules/**"
"**/node_modules/**",
"**/build/**",
"**/dist/**"
],
"flagWords": [],
"languageSettings": [
Expand All @@ -24,7 +28,8 @@
{
"languageId": "git",
"ignoreRegExpList": [
"/#\\s.*/" // comments
"/#\\s.*/", // comments
"/`.*?`/" // inline code
]
}
]
Expand Down
40 changes: 40 additions & 0 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
// For format details, see https://aka.ms/devcontainer.json. For config options, see the
// README at: https://github.com/devcontainers/templates/tree/main/src/docker-existing-dockerfile
{
"name": "Existing Dockerfile",
"build": {
// Sets the run context to one level up instead of the .devcontainer folder.
"context": "..",
// Update the 'dockerFile' property if you aren't using the standard 'Dockerfile' filename.
"dockerfile": "../Dockerfile"
},

// Features to add to the dev container. More info: https://containers.dev/features.
// "features": {},

// Use 'forwardPorts' to make a list of ports inside the container available locally.
// "forwardPorts": [],

// Uncomment the next line to run commands after the container is created.
// "postCreateCommand": "cat /etc/os-release",

// Configure tool-specific properties.
// "customizations": {},

// Uncomment to connect as an existing user other than the container default. More info: https://aka.ms/dev-containers-non-root.
"remoteUser": "runneradmin",

"containerEnv": {
"DISPLAY": "${localEnv:DISPLAY}"
},

"mounts": [{
"source": "${localEnv:HOME}/.Xauthority",
"target": "/home/runneradmin/.Xauthority",
"type": "bind"
}],

"runArgs": [
"--network=host"
]
}
14 changes: 8 additions & 6 deletions .eslintrc.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,11 @@
env:
node: true
es2021: true
extends:
- 'eslint:recommended'
- 'plugin:@typescript-eslint/recommended'
parser: '@typescript-eslint/parser'
parserOptions:
ecmaVersion: 12
sourceType: module
project: 'tsconfig.json'
project:
- tsconfig.json
- packages/*/tsconfig.json
plugins:
- '@typescript-eslint'
rules:
Expand All @@ -17,3 +14,8 @@ rules:
- argsIgnorePattern: '^_'
'@typescript-eslint/no-floating-promises':
- warn
ignorePatterns:
- 'packages/*/build/**'
- 'dist/**'
- 'packages/*/dist/**'
- 'jest.config.ts'
64 changes: 58 additions & 6 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,30 +10,49 @@ jobs:
strategy:
matrix:
os: [
# disable mac due to https://github.com/nodejs/node/issues/42154
# macos-latest,
macos-latest,
ubuntu-latest,
windows-latest,
]
env:
CONAN_USER_HOME: "${{ github.workspace }}/conan-cache"
CONAN_USER_HOME_SHORT: "${{ github.workspace }}/conan-cache/short"
runs-on: ${{ matrix.os }}
steps:
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0 # for commit linting and semantic-release
persist-credentials: false
- if: ${{ env.ACT }}
name: Hack container for local development
run: |
curl -fsSL https://deb.nodesource.com/setup_16.x | sudo -E bash -
sudo apt-get install -y nodejs
- name: Install Node.js
uses: actions/setup-node@v3
with:
node-version: 16
cache: npm
- uses: actions/setup-python@v4
with:
python-version: '3.10'
cache: 'pipenv'
- name: Install pipenv
run: |
curl https://raw.githubusercontent.com/pypa/pipenv/master/get-pipenv.py | python
- name: Cache .vscode-test
uses: actions/cache@v3
with:
path: .vscode-test
path: packages/packet-viewer/.vscode-test
key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-
- name: Cache .conan
uses: actions/cache@v3
with:
path: ${{ env.CONAN_USER_HOME }}
key: ${{ runner.os }}-conan-${{ hashFiles('packages/parquet-reader/conanfile.txt') }}
- run: npm ci
- name: Static checks
run: |
Expand All @@ -44,20 +63,53 @@ jobs:
if: matrix.os == 'ubuntu-latest'
- run: npm test
if: matrix.os != 'ubuntu-latest'
- name: Upload prebuilds
if: '!env.ACT'
uses: actions/upload-artifact@v3
with:
name: prebuilds
path: packages/parquet-reader/prebuilds
retention-days: 1

release:
runs-on: ubuntu-latest
needs: build
steps:
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0 # for commit linting and semantic-release
persist-credentials: false
- name: Install Node.js
uses: actions/setup-node@v3
with:
node-version: 16
cache: npm
- uses: actions/setup-python@v4
with:
python-version: '3.10'
cache: 'pipenv'
- name: Install pipenv
run: |
curl https://raw.githubusercontent.com/pypa/pipenv/master/get-pipenv.py | python
- run: npm ci
- uses: actions/download-artifact@v3
with:
name: prebuilds
path: packages/parquet-reader/prebuilds
- name: Release
if: matrix.os == 'ubuntu-latest'
env:
GH_TOKEN: ${{ secrets.RELEASE_PAT }}
VSCE_PAT: ${{ secrets.VSCE_PAT }}
OVSX_PAT: ${{ secrets.OVSX_PAT }}
run: npx semantic-release
- name: Package
if: github.ref != 'refs/heads/master' && matrix.os == 'ubuntu-latest'
if: github.ref != 'refs/heads/master'
uses: lannonbr/vsce-action@master
with:
args: "package"
- name: Upload
if: github.ref != 'refs/heads/master' && matrix.os == 'ubuntu-latest' && !env.ACT
if: github.ref != 'refs/heads/master' && !env.ACT
uses: actions/upload-artifact@v3
with:
name: vscode-package
Expand Down
9 changes: 6 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
# vscode
.vscode-test/
.vscode/
!test/workspace/.vscode
/.vscode

# npm
out
dist/
node_modules

# output
*.vsix

# act
.secrets

# addon
build/
prebuilds/
3 changes: 3 additions & 0 deletions .npmrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
if-present=true
workspaces=true
include-workspace-root=true
13 changes: 10 additions & 3 deletions .vscodeignore
Original file line number Diff line number Diff line change
@@ -1,12 +1,19 @@
# hidden files and folder in root
.*
.*/
**/.*
**/.*/
# typescript files
src/**
test/**
**/tsconfig.json
**/tslint.json
**/*.map
**/*.ts
**/*.tsbuildinfo
# test outputs
out/test/**
dist/test/**
# cmake output
packages/*/build/**
Dockerfile
packages/parquet-reader/*
!packages/parquet-reader/dist/*.js
!packages/parquet-reader/prebuilds
8 changes: 7 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM catthehacker/ubuntu:act-latest
FROM catthehacker/ubuntu:runner-latest

SHELL [ "/bin/bash", "-c" ]

Expand All @@ -10,3 +10,9 @@ RUN sudo apt-get update \
libgbm1 \
libasound2 \
default-jre

# install pipenv
USER runneradmin
ADD --chown=runneradmin https://raw.githubusercontent.com/pypa/pipenv/master/get-pipenv.py /tmp/
RUN python /tmp/get-pipenv.py \
&& rm /tmp/get-pipenv.py
26 changes: 20 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,25 @@ After closing the JSON view, it is possible to reopen it by clicking on the link

![command](images/reopen.gif)

## Requirements
## Backends

The extension used to require [parquet-tools](https://mvnrepository.com/artifact/org.apache.parquet/parquet-tools).
Now the extension uses the [parquets](https://github.com/dvirtz/parquets) TypeScript library to do parse the files.
The extension supports three different backends for parsing the files:

If you still want to use `parquet-tools`, you should set `parquet-viewer.useParquetTools` to `true` and `paruqet-tools` should be in your `PATH`, or pointed by the `parquet-viewer.parquetToolsPath` setting.
### parquets

This is the default backend. It uses the [parquets](https://github.com/dvirtz/parquets) TypeScript library, which is a fork of the unmaintained [kbajalc/parquets](https://github.com/kbajalc/parquets) library with some bug fixes.

It only supports parquet version 1.0.0.

### arrow

This backend is a thin wrapper around the [Apache Arrow C++](https://github.com/apache/arrow/tree/main/cpp) implementation and so should support latest and greatest parquet features.

It is currently in an experimental state. To use, set `parquet-viewer.backend` setting to `arrow`.

### parquet-tools

This is a legacy Java backend, using [parquet-tools](https://mvnrepository.com/artifact/org.apache.parquet/parquet-tools). To use that, you should set `parquet-viewer.backend` to `parquet-tools` and `paruqet-tools` should be in your `PATH`, or pointed by the `parquet-viewer.parquetToolsPath` setting.

![settings](images/settings.png)

Expand All @@ -29,11 +42,12 @@ The following setting options are available:

|name|default|description|
|----|-------|-----------|
|`parquet-viewer.parquetToolsPath`|`parquet-tools`|The name of the parquet-tools executable or a path to the parquet-tools jar|
|`parquet-viewer.useParquetTools`|`false`|Use the legacy `parquet-tools` application for reading the files|
|`parquet-viewer.backend`|`parquets`|Which backend to use for reading the files|
|`parquet-viewer.logging.panel`|`false`|Whether to write diagnostic logs to an output panel|
|`parquet-viewer.logging.folder`|empty|Write diagnostic logs under the given directory|
|`parquet-viewer.logging.level`|info|Diagnostic log level. Choose between: `off`, `fatal`, `error`, `warn`, `info`, `debug` or `trace`|
|`parquet-viewer.jsonSpace`|0|JSON indentation space, passed to `JSON.stringify` as is, see [mdn](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify#parameters) for details. Doesn't apply when `parquet-viewer.backend` is `parquet-tools`.|
|`parquet-viewer.parquetToolsPath`|`parquet-tools`|The name of the parquet-tools executable or a path to the parquet-tools jar|

### What's new

Expand Down
Loading

0 comments on commit 078103d

Please sign in to comment.