-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][Parquet] Unable to read data from parquet file generated with parquetjs #42868
Comments
Hatem Helal / @hatemhelal: |
Hatem Helal / @hatemhelal: |
Wes McKinney / @wesm: |
Wes McKinney / @wesm: |
Tera G: I see that this fix has been made in arrow's record reader (record_reader.cc). I am using the parquet's low-level API to pull the data from the parquet file in my application. I am facing the exact problem fixed by this Jira while using the Parquet's low level API.(column_reader.cc). As the current fix is not ported to the low level parquet api, I wanted to know if there are any plans to ship these changes to the low-level-api ? Also, @rdmello, can I simply port the fixes you have made in the parquet low-level api ? Will this work ? We are using low-level api as it offers more power to us in terms of predicate push down, filtering and skipping of data. Finally, Is the Open source community's push is to advise developers to use arrow's parquet api or the low level parquet api to access the parquet data ? Thank you in advance for your response. |
Rylan Dmello / @rdmello: I'm not super familiar with the low-level API, but I think a similar set of changes might work for fixing this issue with the low-level API too. If you already have code that fixes this, I'd recommend sending in a pull request for this. Otherwise I can take a closer look at porting this fix to the low-level API tomorrow. |
Rylan Dmello / @rdmello: I just opened a new Jira issue to add basic DataPageV2 support to the low-level API: https://issues.apache.org/jira/browse/PARQUET-1560 . I can add updates to that issue instead of this one, since this is already resolved. I couldn't easily reproduce the issue when using the low-level API to read the 'feeds1kMicros.parquet' file generated by parquetjs. Either this has already been fixed in arrow/master, or I might need to dig deeper to understand the problem. Do you possibly have an example parquet file which isn't readable with the low-level API? If so, feel free to attach it to the new Jira issue I linked. |
See attached file, when I debug:
% ./parquet-reader feed1kMicros.parquet
I see that the
scanner->HasNext()
always returns false.Reporter: Hatem Helal / @hatemhelal
Assignee: Rylan Dmello / @rdmello
Original Issue Attachments:
PRs and other links:
Note: This issue was originally created as PARQUET-1482. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: