Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): add parameter to DeltaTable.to_pyarrow_dataset() #2465

Merged
merged 5 commits into from
May 5, 2024

Conversation

adriangb
Copy link
Contributor

@adriangb adriangb commented Apr 30, 2024

Otherwise there is no way to union this with another dataset.

@github-actions github-actions bot added the binding/python Issues for the Python package label Apr 30, 2024
@ion-elgreco
Copy link
Collaborator

@adriangb you might just better passthrough as_large_types to the .to_pyarrow()

@adriangb
Copy link
Contributor Author

They both seem useful right? It seems like the as_large_types just blindly makes all types large. I could have a mix of small and large types of which I know the schema, so passing in the schema explicitly (given that it's simple to do so) seems worth having as an option.

@ion-elgreco
Copy link
Collaborator

@adriangb that's true. If you can fix the tests then we can merge

@adriangb
Copy link
Contributor Author

it looks like the test just fails on older pyarrow versions and only for the map type. How about I split it in two and skip the failing one on pyarrow < 10?

@ion-elgreco
Copy link
Collaborator

@adriangb can you fix the tests? Then we can merge it :)

@@ -1022,6 +1022,8 @@ def to_pyarrow_dataset(
partitions: Optional[List[Tuple[str, str, Any]]] = None,
filesystem: Optional[Union[str, pa_fs.FileSystem]] = None,
parquet_read_options: Optional[ParquetReadOptions] = None,
schema: Optional[pyarrow.Schema] = None,
as_large_types: bool = False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc description is missing for this param. I would also mention if the schema is passed that takes precedence over as_large_types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@ion-elgreco ion-elgreco enabled auto-merge (squash) May 5, 2024 22:01
@ion-elgreco
Copy link
Collaborator

Thankss @adriangb

@ion-elgreco ion-elgreco merged commit d0617b5 into delta-io:main May 5, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants