-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for reading parquet file thanks to arrow-dataset #576 #577
base: master
Are you sure you want to change the base?
Conversation
158ed95
to
3c6e600
Compare
Hi and thanks for the PR. I have nothing to add to the code. But i get this exception trying to run the test on Linux with both JDK 11 and 17. The issue seems to be on Arrow side. Do you know about any requirements for it to work?
|
Hi @koperagen it seems to be a JNI issue, I just checked and it works well both on my MacBook Pro (M1) and on a PC with Windows 10 (intel core i7). What is the processor architecture on your computer ? Normally |
Yes, i do run them with Gradle. Processor is Intel core i7. I tried to run the test on TeamCity, but there it fails on Linux as well :(
But the library doesn't have a dependency on any protobuf library, so i assume it could be a linkage error on Arrow side.. maybe? Either this or project needs a dependency on native protobuf somehow
|
Effectively, I also reproduced the issue with docker, downgrading arrow dependency to the version |
3c6e600
to
0dd7498
Compare
Can confirm, 14.0.2 works. I tried it, have some requests
Looks like only URL that point to files are valid ones? Can we make this parameter a
At the same time it reads sample file from tests just fine |
Actually URI parsing is done natively by arrow and it supports only few file systems and unfortunately http(s) is not supported yet :
CF arrow source code : https://github.com/apache/arrow/blob/2a87693134135a8af2ae2b6df41980176431b1c0/cpp/src/arrow/filesystem/filesystem.cc#L679 |
I actually tried to read local copy of that file and it failed with Thanks for clarification about URI. Let's change that parameter type to |
I reached the same issue, another problem with JNI (and thread)... |
Hi, thanks to the PR, sorry, I could not understand will it cover any Parquet files or only Parquet files keeping the something in the Arrow format? I will collect a few parquet files and return to you |
I confirm that it should cover every parquet files. We facing to a JNI error with some parquet files (not all). I created an issue on arrow repository: apache/arrow#20379 |
@fb64 we made a decision to not merge it immediately before three things happened:
Thanks again for your help and collaboration! |
No problem ! |
Related Arrow issue for |
71b06f5
to
8b8f706
Compare
for information I just updated this PR with Arrow 16.0.0 that includes fixes for the 2 issues discovered previously :
|
8b8f706
to
5ce70b9
Compare
5ce70b9
to
79fd37d
Compare
Fixes #576