Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trailing slash on AWS_ENDPOINT raises S3 Error #2656

Closed
dylanspag-lmco opened this issue Jul 9, 2024 · 2 comments · Fixed by #2775
Closed

Trailing slash on AWS_ENDPOINT raises S3 Error #2656

dylanspag-lmco opened this issue Jul 9, 2024 · 2 comments · Fixed by #2775
Labels
binding/rust Issues for the Rust crate bug Something isn't working good first issue Good for newcomers

Comments

@dylanspag-lmco
Copy link

Environment

Delta-rs version: 0.18.0

Binding: Python

Environment:

  • Cloud provider: Local/On-Prem lakeFS with Minio
  • OS: Ubuntu 24.04 LTS
  • Other: Python 3.10

Bug

What happened:

I added a trailing slash to the AWS_ENDPOINT entry of the storage_options dictionary that gets passed to deltalake.write_deltalake. This raised the following error and prevented any data from being written to the object store.

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[14], line 1
----> 1 deltalake.write_deltalake(table_or_uri=f"s3a://{repo_name}/main/userdata/",
      2                           data = subset,
      3                           mode='overwrite',
      4                           storage_options=storage_options)

File /opt/conda/lib/python3.10/site-packages/deltalake/writer.py:258, in write_deltalake(table_or_uri, data, schema, partition_by, mode, file_options, max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, max_rows_per_group, name, description, configuration, schema_mode, storage_options, partition_filters, predicate, large_dtypes, engine, writer_properties, custom_metadata)
    171 def write_deltalake(
    172     table_or_uri: Union[str, Path, DeltaTable],
    173     data: Union[
   (...)
    201     custom_metadata: Optional[Dict[str, str]] = None,
    202 ) -> None:
    203     """Write to a Delta Lake table
    204
    205     If the table does not already exist, it will be created.
   (...)
    256         custom_metadata: Custom metadata to add to the commitInfo.
    257     """
--> 258     table, table_uri = try_get_table_and_table_uri(table_or_uri, storage_options)
    259     if table is not None:
    260         storage_options = table._storage_options or {}

File /opt/conda/lib/python3.10/site-packages/deltalake/writer.py:673, in try_get_table_and_table_uri(table_or_uri, storage_options)
    670     raise ValueError("table_or_uri must be a str, Path or DeltaTable")
    672 if isinstance(table_or_uri, (str, Path)):
--> 673     table = try_get_deltatable(table_or_uri, storage_options)
    674     table_uri = str(table_or_uri)
    675 else:

File /opt/conda/lib/python3.10/site-packages/deltalake/writer.py:686, in try_get_deltatable(table_uri, storage_options)
    682 def try_get_deltatable(
    683     table_uri: Union[str, Path], storage_options: Optional[Dict[str, str]]
    684 ) -> Optional[DeltaTable]:
    685     try:
--> 686         return DeltaTable(table_uri, storage_options=storage_options)
    687     except TableNotFoundError:
    688         return None

File /opt/conda/lib/python3.10/site-packages/deltalake/table.py:297, in DeltaTable.__init__(self, table_uri, version, storage_options, without_files, log_buffer_size)
    277 """
    278 Create the Delta Table from a path with an optional version.
    279 Multiple StorageBackends are currently supported: AWS S3, Azure Data Lake Storage Gen2, Google Cloud Storage (GCS) and local URI.
   (...)
    294
    295 """
    296 self._storage_options = storage_options
--> 297 self._table = RawDeltaTable(
    298     str(table_uri),
    299     version=version,
    300     storage_options=storage_options,
    301     without_files=without_files,
    302     log_buffer_size=log_buffer_size,
    303 )

OSError: Generic S3 error: Header: Content-Length Header missing from response


What you expected to happen:

deltalake.write_deltalake successfully writes to the object store without error regardless of whether or not there is a trailing slash on the AWS endpoint storage option.

How to reproduce it:

On a machine with docker and docker-compose installed, run the following commands:

git clone https://github.com/treeverse/lakeFS-samples
cd lakeFS-samples
docker compose --profile local-lakefs up

Then open a web browser, and navigate to http://localhost:8888.

In JupyterLab open delta-lake-python.ipynb, and in the first code cell of the Jupyter notebook, replace lakefsEndPoint = 'http://lakefs:8000' with lakefsEndPoint = 'http://lakefs:8000/'.

If you then attempt to run all of the cells, the notebook should fail on the second cell under "Write the test data to the main branch as a Delta table".

Removing the trailing slash and re-running the entire notebook should succeed.

@dylanspag-lmco dylanspag-lmco added the bug Something isn't working label Jul 9, 2024
@rtyler rtyler added good first issue Good for newcomers binding/rust Issues for the Rust crate labels Jul 9, 2024
@omkar-foss
Copy link
Contributor

I'm able to reproduce this issue as per the steps provided above. Will dig deeper and raise a PR with the fix in the next few days.

omkar-foss added a commit to omkar-foss/delta-rs that referenced this issue Aug 14, 2024
)

This trims trailing backslash if present in a storage option's
value if it's key ends with `_URL` (e.g. HOST_URL) or if the value
itself seems to be a url (i.e. if it contains `://`).

This also adds supporting test for this fix.
omkar-foss added a commit to omkar-foss/delta-rs that referenced this issue Aug 14, 2024
)

This trims trailing backslash if present in a storage option's
value if it's key ends with `_URL` (e.g. `HOST_URL`) or if the value
itself seems to be a url (i.e. if it contains `://`).

This also adds supporting test for this fix.
omkar-foss added a commit to omkar-foss/delta-rs that referenced this issue Aug 14, 2024
)

This trims trailing backslash if present in a storage option's
value if it's key ends with `_URL` (e.g. `HOST_URL`) or if the value
itself seems to be a url (i.e. starts with `http://` or `https://`).

This also adds supporting test for this fix.
omkar-foss added a commit to omkar-foss/delta-rs that referenced this issue Aug 14, 2024
This trims trailing backslash if present in a storage option's
value if it's key ends with `_URL` (e.g. `HOST_URL`) or if the value
itself seems to be a url (i.e. starts with `http://` or `https://`).

This also adds supporting test for this fix.
omkar-foss added a commit to omkar-foss/delta-rs that referenced this issue Aug 14, 2024
This trims trailing backslash if present in a storage option's
value if it's key ends with `_URL` (e.g. `HOST_URL`) or if the value
itself seems to be a url (i.e. starts with `http://` or `https://`).

This also adds supporting test for this fix.
@omkar-foss
Copy link
Contributor

omkar-foss commented Aug 14, 2024

PR to close this issue: #2775

omkar-foss added a commit to omkar-foss/delta-rs that referenced this issue Aug 14, 2024
This trims trailing slash (/) if present in a storage option's
value if it's key ends with `_URL` (e.g. `HOST_URL`) or if the value
itself seems to be a url (i.e. starts with `http://` or `https://`).

This also adds supporting test for this fix.
omkar-foss added a commit to omkar-foss/delta-rs that referenced this issue Aug 14, 2024
This trims trailing slash (/) if present in a storage option's
value if it's key ends with `_URL` (e.g. `HOST_URL`) or if the value
itself seems to be a url (i.e. starts with `http://` or `https://`).

This also adds supporting test for this fix.
omkar-foss added a commit to omkar-foss/delta-rs that referenced this issue Aug 15, 2024
This trims trailing slash (/) if present in a storage option's
value if it's key ends with `_URL` (e.g. `HOST_URL`) or if the value
itself seems to be a url (i.e. starts with `http://` or `https://`).

This also adds supporting test for this fix.
omkar-foss added a commit to omkar-foss/delta-rs that referenced this issue Aug 15, 2024
This trims trailing slash if present in a storage option's value
if it's key ends with `_URL` (e.g. `HOST_URL`) or if the value
itself seems to be a url (i.e. starts with `http://` or `https://`).

This also adds supporting test for this fix.
omkar-foss added a commit to omkar-foss/delta-rs that referenced this issue Aug 19, 2024
This trims trailing slash if present in a storage option's value
if it's key ends with `_URL` (e.g. `HOST_URL`) or if the value
itself seems to be a url (i.e. starts with `http://` or `https://`).

This also adds supporting test for this fix.
github-merge-queue bot pushed a commit that referenced this issue Aug 19, 2024
This trims trailing slash if present in a storage option's value
if it's key ends with `_URL` (e.g. `HOST_URL`) or if the value
itself seems to be a url (i.e. starts with `http://` or `https://`).

This also adds supporting test for this fix.
ion-elgreco pushed a commit to ion-elgreco/delta-rs that referenced this issue Aug 21, 2024
This trims trailing slash if present in a storage option's value
if it's key ends with `_URL` (e.g. `HOST_URL`) or if the value
itself seems to be a url (i.e. starts with `http://` or `https://`).

This also adds supporting test for this fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate bug Something isn't working good first issue Good for newcomers
Projects
None yet
3 participants