Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Reading Parquet with Null dictionary page #18112

Merged

Conversation

coastalwhite
Copy link
Collaborator

This fixes an issue with some Parquet writers that write dictionary pages for Null arrays (why?? I have no idea?).

Fixes #18085.
Fixes #18079.

Possibly also #18061.

@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Aug 8, 2024
Copy link

codecov bot commented Aug 8, 2024

Codecov Report

Attention: Patch coverage is 57.14286% with 3 lines in your changes missing coverage. Please review.

Project coverage is 80.33%. Comparing base (3dda47e) to head (de3db83).
Report is 10 commits behind head on main.

Files Patch % Lines
...tes/polars-parquet/src/parquet/read/compression.rs 50.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #18112      +/-   ##
==========================================
- Coverage   80.37%   80.33%   -0.04%     
==========================================
  Files        1496     1496              
  Lines      197542   197683     +141     
  Branches     2820     2821       +1     
==========================================
+ Hits       158771   158811      +40     
- Misses      38249    38351     +102     
+ Partials      522      521       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46
Copy link
Member

ritchie46 commented Aug 9, 2024

Nice.

Can you add this test?

import pandas as pd
test = pd.DataFrame([{'A':np.NaN,'B':3,'C':None},{'A':np.NaN,'B':None,'C':None}])

test.to_parquet('~/mre.parquet')

pl.scan_parquet('~/mre.parquet').collect()  

Slightly adapted so that it writes to a bytes buffer.

@coastalwhite
Copy link
Collaborator Author

Did it 👍

This fixes an issue with some Parquet writers that write dictionary pages for Null arrays (why?? I have no idea?).

Fixes pola-rs#18085.
Fixes pola-rs#18079.

Possibly also pola-rs#18061.
@coastalwhite coastalwhite force-pushed the fix-parquet-null-dictionary-page branch from 1c0523d to 3be3ac0 Compare August 9, 2024 12:14
@coastalwhite coastalwhite force-pushed the fix-parquet-null-dictionary-page branch 2 times, most recently from 6756f0c to c1eacf0 Compare August 9, 2024 12:31
@coastalwhite coastalwhite force-pushed the fix-parquet-null-dictionary-page branch from c1eacf0 to 9d2ed1a Compare August 9, 2024 12:34
@ritchie46 ritchie46 merged commit 9dd9569 into pola-rs:main Aug 9, 2024
25 checks passed
@coastalwhite coastalwhite deleted the fix-parquet-null-dictionary-page branch August 9, 2024 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
2 participants