dbx does not use credential passthrough #864

mathurk1 · 2024-04-26T22:51:56Z

Expected Behavior

I am working with Azure Databricks. I have a cluster with credential passthrough which allows me to read data stored in ADLS gen2 using my own id. I can simply log into databricks workspace, attach a notebook to the cluster and query the delta tables from ADLS gen2 without any setup.

I would expect that when I submit dbx execute --cluster-id cluster123 --job jobABC to the same cluster, it should be able to read those datasets from ADLS gen2 using my ID.

Thanks!

Current Behavior

Currently, the job fails when I dbx execute a job to the cluster with the following error:

Py4JJavaError: An error occurred while calling o469.load.
: com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen2 Token
        at com.databricks.backend.daemon.data.client.adl.AdlGen2UpgradeCredentialContextTokenProvider.$anonfun$getToken$1(AdlGen2UpgradeCredentialContextTokenProvider.scala:37)
        at scala.Option.getOrElse(Option.scala:189)
        at com.databricks.backend.daemon.data.client.adl.AdlGen2UpgradeCredentialContextTokenProvider.getToken(AdlGen2UpgradeCredentialContextTokenProvider.scala:31)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAccessToken(AbfsClient.java:1371)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.executeHttpOperation(AbfsRestOperation.java:306)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.completeExecute(AbfsRestOperation.java:238)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.lambda$execute$0(AbfsRestOperation.java:211)
        at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation(IOStatisticsBinding.java:464)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:209)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAclStatus(AbfsClient.java:1213)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.services.AbfsClient.getAclStatus(AbfsClient.java:1194)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getIsNamespaceEnabled(AzureBlobFileSystemStore.java:437)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus(AzureBlobFileSystemStore.java:1107)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:901)
        at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:891)

From my understanding, it is expecting a service principal or storeage keys to be configured

Steps to Reproduce (for bugs)

clone charming aurora repo - https://github.com/gstaubli/dbx-charming-aurora
setup dbx configure --token to setup link with databricks workspace
add a new job to the conf/deployment.yml file:

      - name: "my-test-job"
        spark_python_task:
          python_file: "file://charming_aurora/tasks/sample_etl_task.py"
          parameters: [ "--conf-file", "file:fuse://conf/tasks/sample_etl_config.yml" ]

update the sample etl task to read a adls delta table - https://github.com/gstaubli/dbx-charming-aurora/blob/main/charming_aurora/tasks/sample_etl_task.py

    def _write_data(self):
        df = (
            self.spark.read.format("delta")
            .load(
                f"abfss://containername@storeageaccount.dfs.core.windows.net/path/to/table"
            )
            .filter(f.col("date") == "2024-01-01")
        )
        print(df.count())

submit job - dbx execute --cluster-id=cluster-id-with-credential-passthrough --job my-test-job

Context

I want to specifically "dbx execute" to my interactive cluster and not create a job cluster.

Your Environment

dbx version used: 0.8.18
Databricks Runtime version: 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dbx does not use credential passthrough #864

dbx does not use credential passthrough #864

mathurk1 commented Apr 26, 2024 •

edited

Loading

dbx does not use credential passthrough #864

dbx does not use credential passthrough #864

Comments

mathurk1 commented Apr 26, 2024 • edited Loading

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

Context

Your Environment

mathurk1 commented Apr 26, 2024 •

edited

Loading