Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Projection pushdown for load_cdf #2681

Closed
PDEUXA opened this issue Jul 17, 2024 · 0 comments · Fixed by #2704
Closed

Projection pushdown for load_cdf #2681

PDEUXA opened this issue Jul 17, 2024 · 0 comments · Fixed by #2704
Labels
enhancement New feature or request

Comments

@PDEUXA
Copy link

PDEUXA commented Jul 17, 2024

Description

Use Case

When reading a delta table from package from exemple with (load_cdf), then read_all(), the whole data is read.

It could be interesting to only select some specifics columns, to benefit from pushdown filters (and so i/o). Thus the PyArrow table generated is limited to some fields.

It seems it could be done by the passing arguments (pyarrow.ipc.IpcReadOptions) to RecordBatchFileReader

Related Issue(s)

@PDEUXA PDEUXA added the enhancement New feature or request label Jul 17, 2024
@ion-elgreco ion-elgreco changed the title Read Specific Field to benefit from parquet filter pushdown Slice pushdown for load_cdf Jul 22, 2024
@ion-elgreco ion-elgreco changed the title Slice pushdown for load_cdf Projection pushdown for load_cdf Jul 25, 2024
ion-elgreco added a commit that referenced this issue Jul 25, 2024
# Description
Adds columns field to CDF builder

# Related Issue(s)
- closes #2681
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant