Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump parquet2 to 0.10.3 & replace compression algo LZ4 with LZ4Raw #4668

Closed
dantengsky opened this issue Apr 2, 2022 · 7 comments · Fixed by #4726
Closed

Bump parquet2 to 0.10.3 & replace compression algo LZ4 with LZ4Raw #4668

dantengsky opened this issue Apr 2, 2022 · 7 comments · Fixed by #4726
Assignees
Labels
A-storage Area: databend storage C-bug Category: something isn't working

Comments

@dantengsky
Copy link
Member

btw, the parquet created by databend can not parsed by pyarrow ,

import pyarrow.parquet as pq
pq.read_pandas("/usr/local/var/minio/databend/1/4/_b/2fbff1270aec4748941939bdb388be3a_v0.parquet")

got issue,

OSError: Corrupt Lz4 compressed data.

Seems that pyarrow no longer supports LZ4 (which is replaced by LZ4_RAW)

Seems that there are some compatibility issues (pyarrow do support LZ4, as the error message indicates clearly)

https://lists.apache.org/thread/l15qq12v38w9jnkd6p9mdd11kr0nq3gr

maybe LZ4_RAW should be used

https://issues.apache.org/jira/browse/PARQUET-2032

Originally posted by @dantengsky in #4654 (comment)

@dantengsky dantengsky changed the title Bump parquet2 to 10.0.3 & replace compression algo LZ4 with LZ4Raw Bump parquet2 to 0.10.3 & replace compression algo LZ4 with LZ4Raw Apr 2, 2022
@dantengsky dantengsky self-assigned this Apr 2, 2022
@BohuTANG BohuTANG added C-bug Category: something isn't working A-storage Area: databend storage labels Apr 2, 2022
@BohuTANG
Copy link
Member

BohuTANG commented Apr 6, 2022

#4689 have upgrade parquet2 to the latest version:

parquet2 = { version = "0.10.3", default_features = false }

@dantengsky
Copy link
Member Author

#4689 have upgrade parquet2 to the latest version:

parquet2 = { version = "0.10.3", default_features = false }

there are some inconsistencies between the parquet2 v0.10.3 published in crates.io and the version of GitHub (I do not know why), the crates parquet-format-async-temp they depend on are of different versions:

even worse, some data types exported by parquet-format-async-temp is used directly in the code base...

I am trying to resolve the conflicts.

@BohuTANG
Copy link
Member

BohuTANG commented Apr 6, 2022

there are some inconsistencies between the parquet2 v0.10.3 published in crates.io and the version of GitHub (I do not know why), the crates parquet-format-async-temp they depend on are of different versions:

@jorgecarleitao Perhaps can help here, thanks.

@jorgecarleitao
Copy link

jorgecarleitao commented Apr 6, 2022

Thanks for the ping!

It seems that you are depending on the latest parquet2. There are some breaking changes from 0.10 to the current main / future 0.11. That may explain the conflicts.

@dantengsky
Copy link
Member Author

It seems that you are depending on the latest parquet2. There are some breaking changes from 0.10 to the current main / future 0.11. That may explain the conflicts.

@jorgecarleitao Thank you for your detailed explanations!

There are some breaking changes from 0.10 to the current main / future 0.11.

May I ask if there was a near-future plan for version 0.11? and will crate arrow2 be upgraded as well? (We are trying to integrate the Lz4Raw jorgecarleitao/parquet2#95 feature)

@jorgecarleitao
Copy link

Ahhh, got it. I am finishing the necessary changes in arrow2 to add support page filter offsets, since parquet2 adds support for that.

I suspect that it should be ready within 2-3 weeks

@dantengsky
Copy link
Member Author

@jorgecarleitao roger & thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Area: databend storage C-bug Category: something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants