Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquetjs browser support #17

Merged
merged 23 commits into from
Jun 30, 2021
Merged

Parquetjs browser support #17

merged 23 commits into from
Jun 30, 2021

Conversation

shannonwells
Copy link
Collaborator

@shannonwells shannonwells commented Jun 23, 2021

Problem

Need to be able to use parquetjs in browser, Fixes #178456668

with @enddynayn , @acruikshank , @aneyzberg

Solution Summary

  1. Configuration for bundling parquetjs that works in browser, via esbuild
  2. Add to build targets
  3. Fix all the things that got broken
  4. Example server showing how it can be used
  5. Updates to README
  6. Disable LZO and Brotli in browser (see notes)
  7. Disable LZO in Node as a consequence of this attempt (see notes)

To verify:

  1. Build parquetjs
  2. Make sure tests all pass
  3. Use the esbuild instructions to serve up the built parquetjs
  4. Fire up the example server as shown in its README and play with it. Be sure to open your console. The only problems you should see are with anything marked "unsupported." Otherwise you should see the data in console.

Notes:

  • We attempted to get Brotli compression working in browser, almost got it, but hit a serious road bump with WebAssembly calls. There was a lot of work involved in the "almost", so since we may wish to get this working at a later date, rather than backing out of these changes and putting back the old brotli library, I have left these changes in just for NodeJS.
  • Turns out LZO is was already horribly broken, so it's disabled for NodeJS too.
  • Strangely, XXhash wasm library works just fine.
  • I had fixed a few other small bugs while trying to get this working, but I took the changes out of this PR and will submit later.

Example server working:

example-server-md

enddynayn and others added 13 commits June 18, 2021 14:15
Use wasm-brotli:
  * Use async import to load wasm before loading compression.js
  * requires async loading to get the wasm instanc
  * bubble up all the asyncs
  * make the tests pass again
Remove compiled bundle.js from repo
Remove browserify tsconfig
disable LZO completely due to overrun error
@shannonwells shannonwells changed the title WIP For browser sew Parquetjs browser support Jun 30, 2021
},
} = columnChunk;
return bloomFilterOffsetBuffer;
if (!columnChunk.column.meta_data.bloom_filter_offset) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the previous form was causing an error in the browser when meta_data was undefined.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should cause an error everywhere if columnChunk.column.meta_data is undefined. Did you mean to say when bloom_filter_offset is undefined? Also this should be redundant to the optional chain below unless you're really checking for falsey rather than undefined here. I would drop it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think it was actually bloom_filter_offset, and somehow the case I was looking at was getting through the if condition.

@shannonwells shannonwells marked this pull request as ready for review June 30, 2021 19:19
Copy link

@acruikshank acruikshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work. I left a few non-blocking requests.

@@ -0,0 +1,99 @@
/**
* Left here in case esbuild stops working for us when we try to re-enable

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is good, but this should be renamed webpack.config.js.example or all the code in this file should be commented out if we're not actually using it. It's likely a future developer will land in the middle of this file by searching or just miss the comment and get really confused about how the module is built. It might also be good to name check the actual build system (esbuild?) in this comment.

examples/server/app.js Outdated Show resolved Hide resolved
},
} = columnChunk;
return bloomFilterOffsetBuffer;
if (!columnChunk.column.meta_data.bloom_filter_offset) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should cause an error everywhere if columnChunk.column.meta_data is undefined. Did you mean to say when bloom_filter_offset is undefined? Also this should be redundant to the optional chain below unless you're really checking for falsey rather than undefined here. I would drop it.

lib/compression.js Outdated Show resolved Hide resolved
lib/reader.js Outdated
@@ -603,13 +601,12 @@ class ParquetEnvelopeReader {
if (colChunk.meta_data.dictionary_page_offset) {
const offset = +colChunk.meta_data.dictionary_page_offset;
const size = Math.min(+this.fileSize - offset, this.default_dictionary_size);
dictionary = this.read(offset, size, colChunk.file_path).then(buffer => decodePage({offset: 0, buffer, size: buffer.length}, opts).dictionary);
const buffer = await this.read(offset, size, colChunk.file_path)
const dict = await decodePage({offset: 0, buffer, size: buffer.length}, opts)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is necessary to read and decode the dictionary if opts.dictionary is already set? It'd be nice to avoid it if we're just going to throw it away, but maybe it's required to start the next read in the right place.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not absolutely required, it just made it a little more readable.

test/integration.js Outdated Show resolved Hide resolved
* in reader.js, don't use throwaway variables, use nested "thens"
* rename webpack.config.js to webpack.config.js.example
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants