Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot write to Deltlake : This table’s min_writer_version is 7, but this method only supports version 2. #1535

Closed
yefetBenTili opened this issue Jul 14, 2023 · 9 comments
Labels
enhancement New feature or request

Comments

@yefetBenTili
Copy link

yefetBenTili commented Jul 14, 2023

Environment

Delta-rs version: 0.10.0

Binding:

Environment:

  • Cloud provider: AWS
  • OS: MacOS
  • Other:

Bug

I am trying to write to an already existing deltalake destination in s3 using delta

import pandas as pd
from deltalake.writer import writeèdeltalake


d = {“id” :[“123"] , “user_name” : [“Batman”], “country” : [“Gotham”]}
df = pd.DataFrame(d)

storage_options = {
    “AWS_DEFAULT_REGION”: “eu-central-1",
    “AWS_ACCESS_KEY_ID”: os.environ[“AWS_ACCESS_KEY_ID”],
    “AWS_SECRET_ACCESS_KEY”: os.environ[“AWS_SECRET_ACCESS_KEY”],
    “AWS_S3_ALLOW_UNSAFE_RENAME”: “true”,
}

destination =  "s3://some_s3_location"
write_deltalake(destination,
    df,
    mode=“append”,
    storage_options=storage_options,
    partition_by=[“country”]
)

And I get this error:

-- Upgrades the reader protocol version to 1 and the writer protocol version to 3.
ALTER TABLE <table_identifier> SET TBLPROPERTIES('delta.minReaderVersion' = '1', 'delta.minWriterVersion' = '3')
@yefetBenTili yefetBenTili added the bug Something isn't working label Jul 14, 2023
@roeap
Copy link
Collaborator

roeap commented Jul 15, 2023

Hi @yefetBenTili,

right now delta-rs does not support the mentioned writer version, so this behavior is expected. That said, we are hard at work to extend the protocol support.

@roeap roeap added enhancement New feature or request and removed bug Something isn't working labels Jul 15, 2023
@FlavioDiasPs
Copy link

Is there any chance this next release will target Writer version 7 to enable the new Clustering feature?
The current Writer version is very old, I'm afraid we will remain far behind even after next realease.

@yefetBenTili
Copy link
Author

@roeap Any update on this?

@roeap
Copy link
Collaborator

roeap commented Oct 24, 2023

@yefetBenTili, yes i am almost done with #1756, which updates the type and schema definitions. in some followups i have prepared, we will then improve protocol support. we will have to see which featues are most straight forward to support, but supporting table feature tables seems straught forward (of course this may be a different story foe specific features). my personal priority are deletion vectors right now.

@wolliq
Copy link

wolliq commented Mar 1, 2024

hello @yefetBenTili ,

do you have any update on the protocol support improvement ?
I'm actually having issues using the deltatorch library on recent Databricks runtimes where the feature level is (3,7) like in case of Deletion vectors and I cannot basically read the delta tables as we pass through

        if self.protocol().min_reader_version > MAX_SUPPORTED_READER_VERSION:
            raise DeltaProtocolError(
                f"The table's minimum reader version is {self.protocol().min_reader_version} "
                f"but deltalake only supports up to version {MAX_SUPPORTED_READER_VERSION}."
            )

where MAX_SUPPORTED_READER_VERSION = 1 and self.protocol().min_reader_version = 3 .
Thank you in advance

@ion-elgreco
Copy link
Collaborator

@wolliq higher protocol support is only in the rust engine writer:

Write_deltalake(engine="rust")

@wolliq
Copy link

wolliq commented Mar 1, 2024

@ion-elgreco the use case it's a read operation using deltatoarch

The deltatorch lib reads the delta table and then calls
to_pyarrow_dataset(..)
in the DeltaTable class, into the table.py module

...
        delta_table = self.create_delta_table()
        scanner = delta_table.to_pyarrow_dataset().scanner(
            columns=self.arrow_fields, filter=_filter
        )
 ...

In the to_pyarrow_dataset method we check that the current min reader version in the protocol must be <= 1 to process the data, which is the lowest version we can have and that is incompatible with the most recent features support as detailed here:
https://docs.delta.io/latest/versioning.html#features-by-protocol-version

Why do we need to keep the features support min read version at the lowest ? Is there any development that will fix that for recent features support versions ?

Thx

@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Mar 1, 2024

@wolliq I'll put in a fix, we can support reader version 3 but then we just check if a table feature is enabled or not, of which don't support. However, version 2 we can't support.

@wolliq
Copy link

wolliq commented Mar 1, 2024

@ion-elgreco thank you that will do in my use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants