Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update don't seems to be working #1740

Closed
djouallah opened this issue Oct 19, 2023 · 5 comments · Fixed by #1749
Closed

update don't seems to be working #1740

djouallah opened this issue Oct 19, 2023 · 5 comments · Fixed by #1749
Labels
bug Something isn't working

Comments

@djouallah
Copy link

Environment

0.12 , google colab

using this code

import pandas as pd
from deltalake.writer import write_deltalake
df = pd.DataFrame(
    {
        "a": [1, 2, 3, 4, 5],
        "fruits": ["banana", "orange", "mango", "apple", "mandarin"],
    }
)
write_deltalake('./fruits', df,mode='overwrite')
from deltalake import DeltaTable
fruits_dl   = DeltaTable('./fruits')
fruits_dl.update(predicate="a = 5",updates={"fruits": "banana"})

I get this error

DeltaError                                Traceback (most recent call last)
[<ipython-input-28-8d483d464424>](https://localhost:8080/#) in <cell line: 1>()
----> 1 fruits_dl.update(predicate="a = 5",updates={"fruits": "banana"})

[/usr/local/lib/python3.10/dist-packages/deltalake/table.py](https://localhost:8080/#) in update(self, updates, predicate, writer_properties, error_on_type_mismatch)
    481         """
    482 
--> 483         metrics = self._table.update(
    484             updates, predicate, writer_properties, safe_cast=not error_on_type_mismatch
    485         )

DeltaError: Generic DeltaTable error: Schema error: No field named banana. Valid fields are a, fruits.

@djouallah djouallah added the bug Something isn't working label Oct 19, 2023
@wjones127
Copy link
Collaborator

The values of the updates parameter are SQL expressions. banana without any quotes is an identifier, which is why it says it's not a valid column. If you wrap it in single quotes, it will work as expected:

>>> fruits_dl.update(predicate="a = 5",updates={"fruits": "'banana'"})
{'num_added_files': 1, 'num_removed_files': 1, 'num_updated_rows': 1, 'num_copied_rows': 4, 'execution_time_ms': 26, 'scan_time_ms': 17}
>>> fruits_dl.to_pandas()
   a  fruits
0  1  banana
1  2  orange
2  3   mango
3  4   apple
4  5  banana

@wjones127
Copy link
Collaborator

wjones127 commented Oct 19, 2023

It does feel like it would be nice to be able to pass python objects here though. Something like:

table.update(
    predicate="a = 5",
    new_values={
        "fruits": "banana",
        "count": 2,
        "last_updated": datetime.now()
    }
)

But this should be provided by a different parameter. Otherwise we could end up interpreting the SQL strings as straight string values, and that wouldn't be great. cc @ion-elgreco

@ion-elgreco
Copy link
Collaborator

It does feel like it would be nice to be able to pass python objects here though. Something like:

table.update(
    predicate="a = 5",
    new_values={
        "fruits": "banana",
        "count": 2,
        "last_updated": datetime.now()
    }
)

But this should be provided by a different parameter. Otherwise we could end up interpreting the SQL strings as straight string values, and that wouldn't be great. cc @ion-elgreco

That could be useful, and then we could apply this to MERGE as well. I'll pick it up next week!

Want to wrap the vacuum commit thing first :)

@djouallah
Copy link
Author

slightly related, is there any documentation how to use merge in Python ?

@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Oct 21, 2023

slightly related, is there any documentation how to use merge in Python ?

For now you can refer to the examples in under .merge() and TableMerger Class. https://delta-io.github.io/delta-rs/python/api_reference.html#deltalake.table.TableMerger

I will write some better usage documentation tomorrow on this and a blog post.

wjones127 added a commit that referenced this issue Nov 4, 2023
…date()` (#1749)

# Description
A user can now add a new_values dictionary that contains python objects
as a value.


Some weird behavior's I noticed, probably related to datafusion,
updating a timestamp column has to be done by providing a unix timestamp
in microseconds. I personally find this very confusing, I was expecting
to be able to pass "2012-10-01" for example in the updates.

Another weird behaviour is with list of string columns. I can pass
`{"list_of_string_col":"[1,2,3]"}` or
`{"list_of_string_col":"['1','2','3']"}` and both will work. I expect
the first one to raise an exception on invalid datatypes. Combined
datatypes `"[1,2,'3']"` luckily do raise an error by datafusion.



# Related Issue(s)
<!---
For example:

- closes #106
--->
- closes #1740

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants