Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(parquet): process preview_offset and preview_nrows [TCTC-9803] #910

Merged
merged 7 commits into from
Nov 22, 2024

Conversation

Fanaen
Copy link
Contributor

@Fanaen Fanaen commented Nov 21, 2024

Change Summary

  • Add a new reader function for parquet (it was previously routed directly into pd.read_parquet)
  • Use instead pyarrow + .to_pandas() to get efficient lazy-loading.
  • Handle previews_nrows, previews_offset and columns

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI and coverage remains at 100%
  • Documentation reflects the changes where applicable

@Fanaen Fanaen marked this pull request as draft November 21, 2024 17:01
@Fanaen Fanaen force-pushed the preview-for-parquet-files branch from 9f4574f to 54850a9 Compare November 21, 2024 17:07
@Fanaen Fanaen marked this pull request as ready for review November 21, 2024 17:09
@Fanaen Fanaen changed the title feat(parquet): process preview_offset and preview_nrows feat(parquet): process preview_offset and preview_nrows [TCTC-9803] Nov 22, 2024
Copy link
Contributor

@lukapeschke lukapeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

peakina/readers/parquet.py Outdated Show resolved Hide resolved
peakina/readers/parquet.py Outdated Show resolved Hide resolved
peakina/readers/parquet.py Show resolved Hide resolved
tests/readers/test_parquet.py Show resolved Hide resolved
@Fanaen Fanaen requested a review from lukapeschke November 22, 2024 10:54
Copy link
Contributor

@lukapeschke lukapeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

@Fanaen Fanaen merged commit efd86e1 into main Nov 22, 2024
8 checks passed
@Fanaen Fanaen deleted the preview-for-parquet-files branch November 22, 2024 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants