Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selectively overwrite data with python #1101

Merged
merged 7 commits into from
Feb 25, 2023

Conversation

ismoshkov
Copy link
Contributor

@ismoshkov ismoshkov commented Jan 25, 2023

Description

Currently high-level python writer isn't support partial partition overwrite.
This PR enable usage of partitions filtering for writing data

The functionlity is similar to:
https://docs.databricks.com/delta/selective-overwrite.html

The logic checks that data should contains only partitions that passing filtering.

Documentation

    write_deltalake(
        delta_path,
        sample_data,
        mode="overwrite",
        partitions_filters=[("partition_a", ">", "1")],
    )

@github-actions github-actions bot added the binding/python Issues for the Python package label Jan 25, 2023
Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for writing this! I'd like a few more data types tested, and then this should be good to go. 😄

python/tests/test_writer.py Outdated Show resolved Hide resolved
@ismoshkov ismoshkov requested review from wjones127 and removed request for rtyler, houqp, fvaleye and roeap January 26, 2023 16:07
Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for testing the additional types. I had a few ideas for some quick additional tests, if you could add those too. Then I think this should be ready to go.

python/deltalake/writer.py Outdated Show resolved Hide resolved
python/deltalake/writer.py Outdated Show resolved Hide resolved
python/deltalake/writer.py Show resolved Hide resolved
python/tests/test_writer.py Outdated Show resolved Hide resolved
python/tests/test_writer.py Outdated Show resolved Hide resolved
python/tests/test_writer.py Outdated Show resolved Hide resolved
@ismoshkov ismoshkov force-pushed the python-partition-overwrite branch from ad8e2ff to 108ac88 Compare February 19, 2023 14:22
@ismoshkov ismoshkov requested a review from wjones127 February 19, 2023 14:35
Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for refactoring the tests and adding more. They look good.

I think we can use the existing DeltaJSONEncoder and then fix some other small issues and we are good-to-go.

Could you make sure to rebase and re-run make format. We just changed our linter to ruff so there may be some new rules enabled :)

python/deltalake/writer.py Outdated Show resolved Hide resolved
python/deltalake/writer.py Outdated Show resolved Hide resolved
python/tests/test_writer.py Outdated Show resolved Hide resolved
python/tests/test_writer.py Outdated Show resolved Hide resolved
@ismoshkov ismoshkov force-pushed the python-partition-overwrite branch from 108ac88 to 48e8737 Compare February 25, 2023 09:57
@ismoshkov ismoshkov requested a review from wjones127 February 25, 2023 10:39
Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! I'm excited for when we release this :)
Thanks @ismoshkov!

@wjones127 wjones127 merged commit ac1ce57 into delta-io:main Feb 25, 2023
chitralverma pushed a commit to chitralverma/delta-rs that referenced this pull request Mar 17, 2023
# Description
Currently high-level python writer isn't support partial partition
overwrite.
This PR enable usage of partitions filtering for writing data

The functionlity is similar to:
https://docs.databricks.com/delta/selective-overwrite.html

The logic checks that data should contains only partitions that passing
filtering.

# Documentation
```python
    write_deltalake(
        delta_path,
        sample_data,
        mode="overwrite",
        partitions_filters=[("partition_a", ">", "1")],
    )
```

---------

Co-authored-by: Ilya Moshkov <ilya.moshkov@exosfinancial.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants