Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZSTD compression is not supported #90

Closed
eriviere-b opened this issue Oct 5, 2023 · 22 comments
Closed

ZSTD compression is not supported #90

eriviere-b opened this issue Oct 5, 2023 · 22 comments
Assignees
Labels

Comments

@eriviere-b
Copy link

Currently parquet-viewer does not support ZSTD compression which is a standard compression method.

{"error":"while reading /tmp/SB17-er-ursf_gen-crawls-list.parquet: Error: invalid compression method: ZSTD"}

@dvirtz dvirtz self-assigned this Oct 10, 2023
@dvirtz
Copy link
Owner

dvirtz commented Oct 10, 2023

Thanks for opening the issue.
What backend do you use (parquet-viewer.backend in the settings)?

@eriviere-b
Copy link
Author

I am using the Parquets backend.

@uditrana
Copy link

Ran into this issue today as well 👍🏾.

while reading path/to/file: Error: Failed to open path/to/file: Support for codec 'zstd' not built

Using arrow backend

dvirtz added a commit that referenced this issue Oct 13, 2023
@dvirtz dvirtz closed this as completed in 195a688 Oct 13, 2023
dvirtz pushed a commit that referenced this issue Oct 13, 2023
## [2.4.0](v2.3.5...v2.4.0) (2023-10-13)

### Features

* add zstd support to arrow backend ([195a688](195a688)), closes [#90](#90)
* make arrow the default backend ([547ac84](547ac84))

### Build and continuous integration

* fix conan recipe revisions ([bf15cd1](bf15cd1))
* upgrade use node 18 everywhere ([c50d64d](c50d64d))
@dvirtz
Copy link
Owner

dvirtz commented Oct 13, 2023

🎉 This issue has been resolved in version 2.4.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@dvirtz
Copy link
Owner

dvirtz commented Oct 14, 2023

This is now fixed for the arrow backend.

@eriviere-b
Copy link
Author

When trying to open the Parquet file with the arrow backend, I receive this error:
"Error: cannot find prebuilt arrow module, either build the module or use another backend"
I am on macOS with Apple Silicon M1 processor, how am I supposed to build the arrow module ?

@dvirtz
Copy link
Owner

dvirtz commented Oct 16, 2023

Sorry about that. There's no free M1 VMs currently available for GitHub actions.
You can either try to use the parquet-tools backend or try to build the module as follows:

  1. make sure you have node.js, pipenv and a C++ compiler installed
  2. checkout the extension sources
  3. run npm i
  4. run npm run build
  5. copy the resulting module folder from packages/parquet-reader/prebuilds to <extension folder>/packages/parquet-reader/prebuilds (you can get the extension folder using the Extensions: Open Extensions Folder command)

@dvirtz
Copy link
Owner

dvirtz commented Oct 23, 2023

@eriviere-b the latest release has built-in support for Apple M1.
I'd be happy to get your feedback

@eriviere-b
Copy link
Author

@dvirtz for each engine:

  • parquet-tools: there is a pop-up saying "opening {my filename}. Source: parquet-viewer (Extension)", and a loading bar but nothing happens afterward.
  • parquet: I still have this error: {"error":"while reading /tmp/SB17-er-ursf_gen-crawls-list.parquet: Error: invalid compression method: ZSTD"}
  • arrow: I do not have the C++ compiler installed and I haven't tried to install it.

@dvirtz
Copy link
Owner

dvirtz commented Oct 23, 2023

Thanks for that detailed feedback @eriviere-b
I specifically meant the arrow backend. It shouldn't require having a C++ compiler installed now as the module is prebuilt and packaged with the extension in CI thanks to codemagic.io supporting M1.

@eriviere-b
Copy link
Author

I still receive the same error:
"Error: cannot find prebuilt arrow module, either build the module or use another backend"

@dvirtz
Copy link
Owner

dvirtz commented Oct 25, 2023

Just to make sure you're on version v2.4.1, right?

@eriviere-b
Copy link
Author

Yes, I am loading the latest version of the extension every time I try.
image

@dvirtz
Copy link
Owner

dvirtz commented Oct 25, 2023

Thanks. Can you please tell me what is printed when you run

node -e 'const os = require(\"os\"); console.log(`${os.platform()}-${os.arch()}`)'

@eriviere-b
Copy link
Author

darwin-arm64:
image

@dvirtz
Copy link
Owner

dvirtz commented Oct 25, 2023

That's what expected.
I released a new version v2.4.2 with some more logging if you don't mind trying.
Also if you can turn on the logging to panel option (parquet-viewer.logging.panel) and paste the content of the parquet-viewer output window here.
Thanks for your patient.

@melaanya
Copy link

experienced the same problem today as @eriviere-b being on 2.4.2

@dvirtz
Copy link
Owner

dvirtz commented Oct 26, 2023

I managed to reproduce this on a friend's M1 machine. The error is:

dlopen(/Users/mgunda@roku.com/.vscode/extensions/dvirtz.parquet-viewer-2.4.2/node_modules/parquet-reader/prebuilds/arrow-parquet-reader-darwin-arm64/node-napi-v6.node, 0x0001): tried: '/Users/mgunda@roku.com/.vscode/extensions/dvirtz.parquet-viewer-2.4.2/node_modules/parquet-reader/prebuilds/arrow-parquet-reader-darwin-arm64/node-napi-v6.node' (not a mach-o file)

Not sure how it works on the CI machine.

@dvirtz
Copy link
Owner

dvirtz commented Oct 29, 2023

M1 should be fixed with v2.4.3.

@melaanya
Copy link

I have an M1, and on v2.4.3 currently (just updated and reloaded) and the issue persists:

{"error":"while reading /Users/annaberger/Downloads/data.parquet: Error: cannot find prebuilt arrow module, either build the module or use another backend: Error: dlopen(/Users/annaberger/.vscode/extensions/dvirtz.parquet-viewer-2.4.3/node_modules/parquet-reader/prebuilds/arrow-parquet-reader-darwin-arm64/node-napi-v6.node, 0x0001): tried: '/Users/annaberger/.vscode/extensions/dvirtz.parquet-viewer-2.4.3/node_modules/parquet-reader/prebuilds/arrow-parquet-reader-darwin-arm64/node-napi-v6.node' (not a mach-o file), '/System/Volumes/Preboot/Cryptexes/OS/Users/annaberger/.vscode/extensions/dvirtz.parquet-viewer-2.4.3/node_modules/parquet-reader/prebuilds/arrow-parquet-reader-darwin-arm64/node-napi-v6.node' (no such file), '/Users/annaberger/.vscode/extensions/dvirtz.parquet-viewer-2.4.3/node_modules/parquet-reader/prebuilds/arrow-parquet-reader-darwin-arm64/node-napi-v6.node' (not a mach-o file)"}

@dvirtz
Copy link
Owner

dvirtz commented Oct 30, 2023

Sorry for that, the fix was only integrated in v2.4.4.

@eriviere-b
Copy link
Author

Now it works with the arrow engine. Thanks @dvirtz !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants