Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for pyarrow.ExtensionType #2885

Merged
merged 3 commits into from
Sep 17, 2024
Merged

Conversation

fecet
Copy link
Contributor

@fecet fecet commented Sep 14, 2024

Description

The description of the main changes of your pull request

Related Issue(s)

Documentation

The current code lacks support for pyarrow's ExtensionType when writing to Delta Lake, which is used by many libraries in the pyarrow ecosystem. This PR aims to improve the Delta Lake Python bindings to enhance support for these third-party ecosystems.

For example, the following code tests that data can be correctly written after adding this PR per Eventual-Inc/Daft#2827 (comment):

import daft

df_dogs = daft.from_pydict(
    {
        "urls": [
            "https://live.staticflickr.com/65535/53671838774_03ba68d203_o.jpg",
            "https://live.staticflickr.com/65535/53671700073_2c9441422e_o.jpg",
            "https://live.staticflickr.com/65535/53670606332_1ea5f2ce68_o.jpg",
            "https://live.staticflickr.com/65535/53671838039_b97411a441_o.jpg",
            "https://live.staticflickr.com/65535/53671698613_0230f8af3c_o.jpg",
        ],
        "full_name": [
            "Ernesto Evergreen",
            "James Jale",
            "Wolfgang Winter",
            "Shandra Shamas",
            "Zaya Zaphora",
        ],
        "dog_name": ["Ernie", "Jackie", "Wolfie", "Shaggie", "Zadie"],
    }
)
df_dogs = df_dogs.with_column(
    "image_bytes", df_dogs["urls"].url.download(on_error="null")
).with_column("image", daft.col("image_bytes").image.decode())
df_dogs.write_deltalake("path/to/file")

@github-actions github-actions bot added the binding/python Issues for the Python package label Sep 14, 2024
@ion-elgreco
Copy link
Collaborator

@fecet can you add some tests please?

@fecet
Copy link
Contributor Author

fecet commented Sep 14, 2024

Thanks for your quick replay, I have add a test at test_schema

Copy link

codecov bot commented Sep 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.63%. Comparing base (73107a7) to head (125f9bf).
Report is 4 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2885   +/-   ##
=======================================
  Coverage   72.63%   72.63%           
=======================================
  Files         131      131           
  Lines       40016    40016           
  Branches    40016    40016           
=======================================
+ Hits        29064    29067    +3     
  Misses       9089     9089           
+ Partials     1863     1860    -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@ion-elgreco ion-elgreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@rtyler rtyler added this pull request to the merge queue Sep 17, 2024
Merged via the queue into delta-io:main with commit 546a344 Sep 17, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants