Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): allow python objects to be passed as new values in .update() #1749

Merged

Conversation

ion-elgreco
Copy link
Collaborator

@ion-elgreco ion-elgreco commented Oct 21, 2023

Description

A user can now add a new_values dictionary that contains python objects as a value.

Some weird behavior's I noticed, probably related to datafusion, updating a timestamp column has to be done by providing a unix timestamp in microseconds. I personally find this very confusing, I was expecting to be able to pass "2012-10-01" for example in the updates.

Another weird behaviour is with list of string columns. I can pass {"list_of_string_col":"[1,2,3]"} or {"list_of_string_col":"['1','2','3']"} and both will work. I expect the first one to raise an exception on invalid datatypes. Combined datatypes "[1,2,'3']" luckily do raise an error by datafusion.

Related Issue(s)

@github-actions github-actions bot added the binding/python Issues for the Python package label Oct 21, 2023
Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nice interface. I have some suggestions to make some of the parameter handling clearer.

python/deltalake/table.py Outdated Show resolved Hide resolved
python/deltalake/table.py Outdated Show resolved Hide resolved
python/deltalake/table.py Outdated Show resolved Hide resolved
python/deltalake/table.py Show resolved Hide resolved
python/deltalake/table.py Outdated Show resolved Hide resolved
python/deltalake/table.py Outdated Show resolved Hide resolved
@ion-elgreco ion-elgreco changed the title feat: allow python objects to be passed as new values in .update() feat(python): allow python objects to be passed as new values in .update() Nov 2, 2023
ion-elgreco and others added 4 commits November 2, 2023 13:32
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
@ion-elgreco
Copy link
Collaborator Author

@wjones127 I've integrated all your feedback in the code

wjones127
wjones127 previously approved these changes Nov 4, 2023
Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small nit on the docs, otherwise looks good :)

python/deltalake/table.py Outdated Show resolved Hide resolved
@ion-elgreco
Copy link
Collaborator Author

One small nit on the docs, otherwise looks good :)

Your approval got dismissed with the commit 😛

@wjones127 wjones127 enabled auto-merge (squash) November 4, 2023 19:59
@wjones127 wjones127 merged commit 5a5dbcd into delta-io:main Nov 4, 2023
24 checks passed
@franz101
Copy link
Contributor

franz101 commented Mar 3, 2024

Am I correct this does not work with dictionaries?

data = pa.table({"x": [1, 2, 3], "y": [{"a":"a"}, {"a":"b"}, {"a":"c"}]},schema=dt.schema().to_pyarrow(False))
write_deltalake("tmp", data, mode="overwrite", overwrite_schema=True)
dt.update(predicate="id = 2",updates={"y": {"a":"banana"}})

Throws:
DeltaError: Generic DeltaTable error: Schema error: No field named id. Valid fields are x, y.

@ion-elgreco
Copy link
Collaborator Author

ion-elgreco commented Mar 3, 2024

@franz101 if you want to use python types you need to use the new_values parameter however it doesn't supports dictionaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

update don't seems to be working
3 participants