Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete operation doesn't do type coercion #1921

Closed
echai58 opened this issue Nov 29, 2023 · 6 comments
Closed

Delete operation doesn't do type coercion #1921

echai58 opened this issue Nov 29, 2023 · 6 comments
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate enhancement New feature or request

Comments

@echai58
Copy link

echai58 commented Nov 29, 2023

Environment

Delta-rs version: 0.13.0

Binding: python bindings

Environment:

  • Cloud provider: n/a
  • OS: ubuntu 22.04
  • Other: testing locally via jupyter notebook

Bug

What happened:
It seems that when a string predicate is passed into DeltaTable.delete, when it gets parsed as a Datafusion Expression, it is not taking into account the schema of the table. For example, if there is a column with type pa.int32(), and you try to use a predicate like price = 100, it raises an error ValueError: Invalid comparison operation: Int32 <= Int64, which I assume is coming from 100 being parsed as a int64. This is supported by if I pass in price = CAST(100 as INT) instead, it works as expected.

What you expected to happen:
The parser should be schema-aware when converting the string predicate to a Datafusion Expression.

How to reproduce it:
This is a minimal reproduction:

import tempfile
import pandas as pd
import pyarrow as pa
from deltalake import DeltaTable
from deltalake.writer import write_deltalake

pandas_data = pd.DataFrame.from_dict(
        {
            "price": [100, 150],
            "qty": [15, 20],
        }
    )

schema = pa.schema(
    [
        ("price", pa.int32()),
        ("qty", pa.int32()),
    ]
)

with tempfile.TemporaryDirectory() as path:
    table = pa.Table.from_pandas(pandas_data, schema)
    write_deltalake(
        table_or_uri=path,
        data=table,
        mode="error",
    )

    delta_table = DeltaTable(path) 

    # this does not work
    delta_table.delete(predicate="price = 100")

    # this works
    delta_table.delete(predicate="price = CAST(100 as INT)")
@echai58 echai58 added the bug Something isn't working label Nov 29, 2023
@ion-elgreco
Copy link
Collaborator

It looks like we are building directly a physical plan here so that's why there is no type coercion.

We probably need to refactor this.

@ion-elgreco ion-elgreco added enhancement New feature or request binding/python Issues for the Python package binding/rust Issues for the Rust crate and removed bug Something isn't working labels Nov 30, 2023
@ion-elgreco ion-elgreco changed the title Delete predicate is not parsing the string based on the table schema Delete operation doesn't do type coercion Nov 30, 2023
@echai58
Copy link
Author

echai58 commented Mar 27, 2024

@ion-elgreco are there any updates on this?

@ion-elgreco
Copy link
Collaborator

@echai58 this PR got merged: f56d8c9, so it should be doable to refactor the delete operation now.

@peter-xu-zhou
Copy link

@ion-elgreco when will the predicate problem for delete/update operations be done ? I also met the problem.

@ion-elgreco
Copy link
Collaborator

@ion-elgreco when will the predicate problem for delete/update operations be done ? I also met the problem.

I dont have any plans to work on this. But feel free to pick it up

@ion-elgreco
Copy link
Collaborator

Resolved by DataFusion 39 it seems apache/datafusion#10716

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants