Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with dynamo locking provider on 0.15 #2034

Closed
acontry opened this issue Jan 4, 2024 · 6 comments
Closed

Error with dynamo locking provider on 0.15 #2034

acontry opened this issue Jan 4, 2024 · 6 comments
Assignees
Labels
binding/python Issues for the Python package bug Something isn't working
Milestone

Comments

@acontry
Copy link

acontry commented Jan 4, 2024

Environment

Delta-rs version: 0.15.0

Binding: python

Environment:

  • Cloud provider: AWS
  • OS: AWS
  • Other: Py3.10

Bug

What happened: When writing to a partitioned deltalake table on s3 with dynamo as the locking provider, the following error is thrown:

Traceback (most recent call last):

...

    write_deltalake(
  File "/Users/alex/.pyenv/versions/3.10.3/envs/hft-3.10/lib/python3.10/site-packages/deltalake/writer.py", line 244, in write_deltalake
    table, table_uri = try_get_table_and_table_uri(table_or_uri, storage_options)
  File "/Users/alex/.pyenv/versions/3.10.3/envs/hft-3.10/lib/python3.10/site-packages/deltalake/writer.py", line 620, in try_get_table_and_table_uri
    table = try_get_deltatable(table_or_uri, storage_options)
  File "/Users/alex/.pyenv/versions/3.10.3/envs/hft-3.10/lib/python3.10/site-packages/deltalake/writer.py", line 633, in try_get_deltatable
    return DeltaTable(table_uri, storage_options=storage_options)
  File "/Users/alex/.pyenv/versions/3.10.3/envs/hft-3.10/lib/python3.10/site-packages/deltalake/table.py", line 319, in __init__
    self._table = RawDeltaTable(
_internal.DeltaError: Generic error: error in DynamoDb

What you expected to happen: Successful table update.

How to reproduce it: I have an s3/dynamo config set up per the production config instructions in the docs.

Pseudocode:

from deltalake.writer import write_deltalake

storage_options = {
  "AWS_REGION": "us-east-1",
  "AWS_ACCESS_KEY_ID": AWS_ACCESS_KEY,
  "AWS_SECRET_ACCESS_KEY": AWS_ACCESS_SECRET,
  "AWS_S3_LOCKING_PROVIDER": "dynamodb",
  "DYNAMO_LOCK_TABLE_NAME": "deltalake-lock",
  "DYNAMO_LOCK_PARTITION_KEY_VALUE": "table1"
}

write_deltalake(
  data_path, 
  df,  # Some dataframe
  mode="append", 
  storage_options=storage_options, 
  partition_by=partition_cols,  # partition columns for dataframe
  schema=table_schema  # schema defined for dataframe
)

The same codepath works in 0.14.

More details: N/A

@acontry acontry added the bug Something isn't working label Jan 4, 2024
@rtyler
Copy link
Member

rtyler commented Jan 5, 2024

Sooooooooo funny story. The instructions you reference are for setting up the DynamoDb table for what is conventionally referred to ta S3DynamoDbLogStore.

The native deltalake python and Rust libraries use a DynamoDb locking implementation that predates S3DynamoDbLogStore, and expects a dynamodb table that looks like this.

However @dispanser did some work to move the native libraries to use the same S3DynamoDbLogStore pattern. Can you share the schema/definition of your table that works with python 0.14? Based on the instructions that you mentioned following, I would expect that the write worked on 0.15, not 0.14, so I'm a bit confused

@rtyler rtyler added the binding/python Issues for the Python package label Jan 5, 2024
@acontry
Copy link
Author

acontry commented Jan 5, 2024

Ah, the one difference I see in dynamo config is that I am using attribute expireTime for TTL, while the attribute specified in the Terraform snippet is now leaseDuration. Will see if this is the culprit.

@acontry
Copy link
Author

acontry commented Jan 5, 2024

Ok I have realized that the different dynamo locking providers and associated docs have confused me quite a bit. Somehow I had previously setup my Dynamo table successfully with locking provider used in 0.14. So everything makes sense!

Now I am working on setting up a test environment to use with 0.15. I started with the same storage_options I listed above, but also hitting Generic error: error in DynamoDb. Digging through the changes to understand if different config is needed, but would welcome any advice from the experts.

@dispanser
Copy link
Contributor

I think the docs are currently incorrect, as they haven't been adapted to reflect the code changes yet. I was surprised by how quicly the release train arrived :-).

  • table name is currently configured via config setting DELTA_DYNAMO_TABLE_NAME.
  • DYNAMO_LOCK_PARTITION_KEY_VALUE has no effect in the new system

In any case, the error message is not very helpful, I'll take a look into making that better.

@rtyler: I have a small docs update in preparation. I'm not sure how the docs release process works, but I hope we can update the docs without creating a new release.

@rtyler rtyler self-assigned this Jan 6, 2024
@rtyler rtyler added this to the Rust v0.17 milestone Jan 7, 2024
@acontry
Copy link
Author

acontry commented Jan 8, 2024

I was able to get things working! Had to read the source code to discover that I needed additional IAM permissions (Query and UpdateItem), in addition to the config changes you pointed me to:

Dynamo permissions:

        {
            "Sid": "DeltalakeDynamoLock",
            "Effect": "Allow",
            "Action": [
                "dynamodb:GetItem",
                "dynamodb:PutItem",
                "dynamodb:DeleteItem",
                "dynamodb:Query",
                "dynamodb:UpdateItem"
            ],
            "Resource": "{my dynamo ARN}"
        }

Config:

  storage_options = {
      "AWS_REGION": "us-east-1",
      "AWS_ACCESS_KEY_ID": AWS_ACCESS_KEY,
      "AWS_SECRET_ACCESS_KEY": AWS_ACCESS_SECRET,
      "AWS_S3_LOCKING_PROVIDER": "dynamodb",
      "DELTA_DYNAMO_TABLE_NAME": "deltalake-lock",
  }

My issue is resolved - feel free to close this issue once you're satisfied.

@rtyler rtyler closed this as completed Jan 8, 2024
@dispanser
Copy link
Contributor

@rtyler : should this be added to the documentation? I can make a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants