Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write to Microsoft OneLake failed. #1764

Closed
RobinLin666 opened this issue Oct 24, 2023 · 4 comments
Closed

Write to Microsoft OneLake failed. #1764

RobinLin666 opened this issue Oct 24, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@RobinLin666
Copy link
Contributor

Environment

Delta-rs version:
python-0.12.0

Binding:
python-0.12.0

Environment:

  • Cloud provider: Microsoft
  • OS: Mariner
  • Other:

Bug

What happened:
this simple example does not work.

df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]})
write_deltalake("abfss://xxx@onelake.dfs.fabric.microsoft.com/test.Lakehouse/Tables/sample_table2", df,
 storage_options={"bearer_token": aadToken, "use_fabric_endpoint": "true"})

error:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[48], line 2
      1 df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]})
----> 2 write_deltalake("abfss://xxx@onelake.dfs.fabric.microsoft.com/test.Lakehouse/Tables/sample_table2", df,
      3  storage_options={"bearer_token": aadToken, "use_fabric_endpoint": "true"})

File /nfs4/pyenv-515f53e0-5628-453e-a741-0c6f116d93b7/lib/python3.10/site-packages/deltalake/writer.py:153, in write_deltalake(table_or_uri, data, schema, partition_by, filesystem, mode, file_options, max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, max_rows_per_group, name, description, configuration, overwrite_schema, storage_options, partition_filters, large_dtypes)
    150     else:
    151         data, schema = delta_arrow_schema_from_pandas(data)
--> 153 table, table_uri = try_get_table_and_table_uri(table_or_uri, storage_options)
    155 # We need to write against the latest table version
    156 if table:

File /nfs4/pyenv-515f53e0-5628-453e-a741-0c6f116d93b7/lib/python3.10/site-packages/deltalake/writer.py:417, in try_get_table_and_table_uri(table_or_uri, storage_options)
    414     raise ValueError("table_or_uri must be a str, Path or DeltaTable")
    416 if isinstance(table_or_uri, (str, Path)):
--> 417     table = try_get_deltatable(table_or_uri, storage_options)
    418     table_uri = str(table_or_uri)
    419 else:

File /nfs4/pyenv-515f53e0-5628-453e-a741-0c6f116d93b7/lib/python3.10/site-packages/deltalake/writer.py:430, in try_get_deltatable(table_uri, storage_options)
    426 def try_get_deltatable(
    427     table_uri: Union[str, Path], storage_options: Optional[Dict[str, str]]
    428 ) -> Optional[DeltaTable]:
    429     try:
--> 430         return DeltaTable(table_uri, storage_options=storage_options)
    431     except TableNotFoundError:
    432         return None

File /nfs4/pyenv-515f53e0-5628-453e-a741-0c6f116d93b7/lib/python3.10/site-packages/deltalake/table.py:250, in DeltaTable.__init__(self, table_uri, version, storage_options, without_files, log_buffer_size)
    231 """
    232 Create the Delta Table from a path with an optional version.
    233 Multiple StorageBackends are currently supported: AWS S3, Azure Data Lake Storage Gen2, Google Cloud Storage (GCS) and local URI.
   (...)
    247 
    248 """
    249 self._storage_options = storage_options
--> 250 self._table = RawDeltaTable(
    251     str(table_uri),
    252     version=version,
    253     storage_options=storage_options,
    254     without_files=without_files,
    255     log_buffer_size=log_buffer_size,
    256 )
    257 self._metadata = Metadata(self._table)

OSError: Encountered object with invalid path: Error parsing Path "test.Lakehouse/Tables/sample_table2/_delta_log/_commit_ed2503ff-f28f-40c2-9a41-5be43ede8930.json.tmp#1": Encountered illegal character sequence "#" whilst parsing path segment "_commit_ed2503ff-f28f-40c2-9a41-5be43ede8930.json.tmp#1"

What you expected to happen:

How to reproduce it:

More details:
Link with #1418 (comment)

@RobinLin666 RobinLin666 added the bug Something isn't working label Oct 24, 2023
@djouallah
Copy link

@RobinLin666 how did you get aadToken ?

@RobinLin666
Copy link
Contributor Author

@RobinLin666 how did you get aadToken ?
I use TridentTokenLibrary

from trident_token_library_wrapper import PyTridentTokenLibrary
token = PyTridentTokenLibrary.get_access_token("storage")

@djouallah
Copy link

@RobinLin666 thank you so much it worked for me !!!!!

@RobinLin666
Copy link
Contributor Author

Thank you @djouallah , it works somehow!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants