Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Support date32 and decimal statistics in PyArrow writer #577

Closed
wjones127 opened this issue Mar 23, 2022 · 3 comments
Closed

Python: Support date32 and decimal statistics in PyArrow writer #577

wjones127 opened this issue Mar 23, 2022 · 3 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@wjones127
Copy link
Collaborator

Description

In PyArrow, date32 and decimal statistics from parquet metadata aren't translated into appropriate type, but are instead raw bytes. We can either decode them ourselves, or wait for PyArrow to implement that conversion. Upstream issue: https://issues.apache.org/jira/browse/ARROW-7350

Use Case

Related Issue(s)

@wjones127 wjones127 added the enhancement New feature or request label Mar 23, 2022
@wjones127
Copy link
Collaborator Author

wjones127 commented Apr 29, 2022

The upstream issue is solved and will be released soon in PyArrow 8.0.0.

We can modify the conditional to also accept date32 and decimal columns if int(pyarrow.__version__.split('.', 1)[0]) >= 8 in here:

#
if logical_type not in ["STRING", "INT", "TIMESTAMP", "NONE"]:
continue
# import pdb; pdb.set_trace()

Also should remove the import pdb; pdb.set_trace() 🤦

@wjones127 wjones127 added the good first issue Good for newcomers label Apr 29, 2022
@Bernolt
Copy link
Contributor

Bernolt commented Jul 7, 2022

I will try to pick this up.

@wjones127
Copy link
Collaborator Author

Oh sorry @Bernolt I think we've already addressed this in #659

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants