Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Invalid comparison operation: Utf8 == LargeUtf8 on write_deltalake in overwrite-mode #3139

Closed
gilgeorges opened this issue Jan 17, 2025 · 2 comments · Fixed by #3141
Labels
bug Something isn't working

Comments

@gilgeorges
Copy link

Environment

Delta-rs version: 0.24.0

Binding: python

Environment:

  • Cloud provider: none - local
  • OS: Docker (Debian image) on WSL2 on Windows 10
  • Other: Python 3.13

Bug

What happened:
When attempting a write_deltalake in overwrite-mode with a simple predicate (" = "), if value is an integer, everything works as expected - any existing records under the same label are overwritten. If however value is a string (see code below), write_deltalake (through polars' write_delta) complains of an "Invalid comparison operation: Utf8 == LargeUtf8"

Traceback (most recent call last):
  File "<python-input-11>", line 1, in <module>
    test()
    ~~~~^^
  File "<python-input-3>", line 5, in test
    d.write_delta(tempdir / 'test.delta', mode='overwrite', delta_write_options=dict(
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        engine='rust',
        ^^^^^^^^^^^^^^
        predicate='C = \'a\''
        ^^^^^^^^^^^^^^^^^^^^^
    ))
    ^^
  File "/workspaces/debian-2/.venv/lib/python3.13/site-packages/polars/dataframe/frame.py", line 4305, in write_delta
    write_deltalake(
    ~~~~~~~~~~~~~~~^
        table_or_uri=target,
        ^^^^^^^^^^^^^^^^^^^^
    ...<5 lines>...
        **delta_write_options,
        ^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/workspaces/debian-2/.venv/lib/python3.13/site-packages/deltalake/writer.py", line 323, in write_deltalake
    write_deltalake_rust(
    ~~~~~~~~~~~~~~~~~~~~^
        table_uri=table_uri,
        ^^^^^^^^^^^^^^^^^^^^
    ...<13 lines>...
        post_commithook_properties=post_commithook_properties,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
ValueError: Invalid comparison operation: Utf8 == LargeUtf8

What you expected to happen:
For predicate to also accept strings, or at least the error message being less cryptic.

How to reproduce it:

import polars as pl
from pathlib import Path

tempdir = Path()

d = pl.DataFrame({'A': [1, 2], 'B': ['hi', 'hello'], 'C': ['a', 'b']})
d.write_delta(tempdir / 'test.delta')
d = pl.DataFrame({'A': [10], 'B': ['yeah'], 'C': ['a']})
d.write_delta(
    tempdir / 'test.delta',
    mode='overwrite',
    delta_write_options=dict(
        engine='rust',
        predicate='C = \'a\''
    )
)

More details:
Use case: compiling periodic, discrete data dumps from an operational system into a central deltatable for further analysis. To avoid duplicate records when the same data-dump-file is accidentally processed twice, we added a column to the deltatable with the name of the input file, and tried "overwrite" mode together with the "predicate" argument instead of "append".

@gilgeorges gilgeorges added the bug Something isn't working label Jan 17, 2025
@ion-elgreco
Copy link
Collaborator

This is quite strange; I thought I updated all the places where we did this type of expression coercions. I will dive a bit deeper into this over the weekend

@ion-elgreco
Copy link
Collaborator

@gilgeorges Ok I have a fix ready :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants