Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to delete dir and files #2703

Closed
pesmeriz opened this issue Jul 24, 2024 · 3 comments
Closed

Failure to delete dir and files #2703

pesmeriz opened this issue Jul 24, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@pesmeriz
Copy link

pesmeriz commented Jul 24, 2024

Environment

Delta-rs version: 0.18.2

Binding: python

Environment:

  • Cloud provider: azure
  • OS: macOS arm64
  • Other:

Bug

What happened:
I am writting a polars dataset into a deltatable on azure, but I fail to delete the dir and its files.

  credentials = {
        "AZURE_STORAGE_CLIENT_ID": config("AZURE_STORAGE_CLIENT_ID"),
        "AZURE_STORAGE_CLIENT_SECRET": config("AZURE_STORAGE_CLIENT_SECRET"),
        "AZURE_STORAGE_TENANT_ID": config("AZURE_STORAGE_TENANT_ID"),
    }
  path = f'abfss://{config("AZURE_CONTAINER_NAME")}@{config("AZURE_ACCOUNT_NAME")}.dfs.core.windows.net/{path}'
  dataset.write_delta(
      target=path,
      delta_write_options={"partition_by": "date_part"},
      storage_options=credentials
  )
  table = deltalake.DeltaTable(path, storage_options=credentials)
  deltalake.fs.DeltaStorageHandler.from_table(table._table).delete_dir(path=path)

I've also tried

deltalake.fs.DeltaStorageHandler(path).delete_dir(path)

Both return None and neither does the removal of the directory and its files.

What you expected to happen: Have the dir containing all the data for the deltatable removed

@pesmeriz pesmeriz added the bug Something isn't working label Jul 24, 2024
@ion-elgreco
Copy link
Collaborator

@pesmeriz the reason nothing happens, is that the path provided to delete_dir is relative to the table path. So if you want to clear the table, you should provide path=""

@pesmeriz
Copy link
Author

@ion-elgreco perhaps I am missing something, I just tried that and it didn't work on my side:

deltalake.fs.DeltaStorageHandler.from_table(table._table).delete_dir(path="")

@pesmeriz
Copy link
Author

pesmeriz commented Jul 25, 2024

I just tried through azure.identityand azure.storage.filedatalake to check if there were any problems on the permissions or whatever, and I managed to delete the file.

from azure.identity import ClientSecretCredential
from azure.storage.filedatalake import DataLakeServiceClient
credentials = {
    "AZURE_STORAGE_CLIENT_ID": config("AZURE_STORAGE_CLIENT_ID"),
    "AZURE_STORAGE_CLIENT_SECRET": config("AZURE_STORAGE_CLIENT_SECRET"),
    "AZURE_STORAGE_TENANT_ID": config("AZURE_STORAGE_TENANT_ID"),
}
path = f'abfss://{config("AZURE_CONTAINER_NAME")}@{config("AZURE_ACCOUNT_NAME")}.dfs.core.windows.net/{path}'
dataset.write_delta(
    target=path,
    delta_write_options={"partition_by": "date_part"},
    storage_options=credentials
)
delete_credentials = ClientSecretCredential(
    client_id=config("AZURE_STORAGE_CLIENT_ID"),
    client_secret=config("AZURE_STORAGE_CLIENT_SECRET"),
    tenant_id=config("AZURE_STORAGE_TENANT_ID"),
)

service_client = DataLakeServiceClient(account_url = f"https://{config('AZURE_ACCOUNT_NAME')}.dfs.core.windows.net/", credential = delete_credentials)
file_system_client = service_client.get_file_system_client(config('AZURE_CONTAINER_NAME'))
directory_client = file_system_client.get_directory_client(path)
directory_client.delete_directory()

I'll stick to this by now but I'd really like to know how to delete the delta table (dir and contents) through deltalake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants