Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot merge DeltaTable when predicate is Decimal #20009

Open
2 tasks done
ponychicken opened this issue Nov 26, 2024 · 2 comments
Open
2 tasks done

Cannot merge DeltaTable when predicate is Decimal #20009

ponychicken opened this issue Nov 26, 2024 · 2 comments
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@ponychicken
Copy link

ponychicken commented Nov 26, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
from datetime import datetime, date, timedelta
from deltalake import DeltaTable

# Define schema
schema = {
    "timestamp": pl.Datetime(time_unit="us", time_zone="UTC"),
    "date": pl.Date,
    "lon": pl.Decimal(precision=9, scale=6),
    "lat": pl.Decimal(precision=9, scale=6),
    "altitude": pl.Decimal(precision=6, scale=1),
    "course": pl.Decimal(precision=4, scale=1),
    "heading": pl.Decimal(precision=4, scale=1),
    "speed": pl.Decimal(precision=4, scale=1),
    "name": pl.String,
    "s3_key": pl.String,
}

# Create sample data
data = {
    "timestamp": [datetime(2024, 3, 20, 12, 30, 0)],
    "date": [date(2024, 3, 20)],
    "lon": [122.123456],
    "lat": [41.987654],
    "altitude": [150.5],
    "course": [45.5],
    "heading": [90.0],
    "speed": [12.5],
    "name": ["NAME"],
    "s3_key": ["s3"],
}

# Create DataFrame with sample data and schema
df = pl.DataFrame(data, schema=schema)

# Write DataFrame as Delta table
df.write_delta("B")

# Write again, skipping duplicates
df.write_delta(
    "B",
    mode="merge",
    delta_merge_options={
        "predicate": """
        t.timestamp = s.timestamp 
        AND t.lat = s.lat 
        AND t.lon = s.lon
    """,
        "source_alias": "s",
        "target_alias": "t",
    }
).when_matched_update_all().when_not_matched_insert_all().execute()

Log output

Traceback (most recent call last):
  File "<stdin>", line 13, in <module>
  File ".venv/lib/python3.12/site-packages/deltalake/table.py", line 1800, in execute
    metrics = self._table.merge_execute(self._builder)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_internal.DeltaError: Generic DeltaTable error: Unable to convert expression to string

Issue description

If I modify the predicate to only check the timestamp or check a string, it will succeed

Expected behavior

Write should succeed

Installed versions

--------Version info---------
Polars: 1.15.0
Index type: UInt32
Platform: Linux-6.11
Python: 3.12.7
LTS CPU: False

----Optional dependencies----
adbc_driver_manager
altair
boto3 1.35.69
cloudpickle
connectorx
deltalake 0.21.0
fastexcel
fsspec 2024.10.0
gevent
google.auth
great_tables
matplotlib
nest_asyncio
numpy 2.1.3
openpyxl 3.1.5
pandas 2.2.3
pyarrow 18.1.0
pydantic 2.9.2
pyiceberg
sqlalchemy 2.0.36
torch
xlsx2csv
xlsxwriter

@ponychicken ponychicken added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Nov 26, 2024
@ponychicken
Copy link
Author

This is probably a upstream bug: delta-io/delta-rs#3033

@ion-elgreco
Copy link
Contributor

It seems we are missing a match arm for decimal to allow round tripping through the log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants