Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Spark job with 8.0 maven jar fails with error commons-logging:commons-logging download failed. #705

Closed
1 of 4 tasks
blrchen opened this issue Sep 26, 2022 · 16 comments
Assignees
Labels
bug Something isn't working

Comments

@blrchen
Copy link
Collaborator

blrchen commented Sep 26, 2022

Willingness to contribute

Yes. I can contribute a fix for this bug independently.

Feathr version

0.8.0

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 20.0):
  • Python version:
  • Spark version, if reporting runtime issue: Databricks 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)

Describe the problem

Running nyc driver sample notebook with 0.8.0 maven jar, spark job will fails DRIVER_LIBRARY_INSTALLATION_FAILURE error.

Note: this error only happens on Azure Databricks. Using maven jar on Synpase or local pyspark is working fine.

This failure is due to Databricks runtime pre-built package has conflict with elastic search dependencies.
To get it work, users will need to exclude packages in Databricks like below:
commons-logging:commons-logging,org.slf4j:slf4j-api,com.google.protobuf:protobuf-java,javax.xml.bind:jaxb-api
image

Workaround

In the YAML config, add a line for feathr_runtime_location, this will make spark cluster use runtime jars from Azure Storage, see the latest line in following example

spark_config:
  # choice for spark runtime. Currently support: azure_synapse, databricks
  # The `databricks` configs will be ignored if `azure_synapse` is set and vice versa.
  spark_cluster: "azure_synapse"
  # configure number of parts for the spark output for feature generation job
  spark_result_output_parts: "1"

  databricks:
    # workspace instance
    workspace_instance_url: 'https://adb-6885802458123232.12.azuredatabricks.net/'
    # config string including run time information, spark version, machine size, etc.
    # the config follows the format in the databricks documentation: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/2.0/jobs#--request-structure-6
    # The fields marked as "FEATHR_FILL_IN" will be managed by Feathr. Other parameters can be customizable. For example, you can customize the node type, spark version, number of workers, instance pools, timeout, etc.
    config_template: '{"run_name":"FEATHR_FILL_IN","new_cluster":{"spark_version":"9.1.x-scala2.12","node_type_id":"Standard_D3_v2","num_workers":1,"spark_conf":{"FEATHR_FILL_IN":"FEATHR_FILL_IN"}},"libraries":[{"jar":"FEATHR_FILL_IN"}],"spark_jar_task":{"main_class_name":"FEATHR_FILL_IN","parameters":["FEATHR_FILL_IN"]}}'
    # workspace dir for storing all the required configuration files and the jar resources. All the feature definitions will be uploaded here
    work_dir: "dbfs:/feathr_getting_started"
    # This is the location of the runtime jar for Spark job submission. If you have compiled the runtime yourself, you need to specify this location.
    # Or use https://azurefeathrstorage.blob.core.windows.net/public/feathr-assembly-LATEST.jar so you don't have to compile the runtime yourself
    # Local path, path starting with `http(s)://` or `dbfs://` are supported. If not specified, the latest jar from Maven would be used
    feathr_runtime_location: "https://azurefeathrstorage.blob.core.windows.net/public/feathr-assembly-LATEST.jar"

Tracking information

2022-09-26 00:10:42.789 | ERROR    | feathr.spark_provider._databricks_submission:wait_for_completion:210 - Feathr job has failed. Please visit this page to view error message: https://adb-1996253548709298.18.azuredatabricks.net/?o=1996253548709298#job/464256084943565/run/253311
2022-09-26 00:10:42.789 | ERROR    | feathr.spark_provider._databricks_submission:wait_for_completion:212 - Error Code: Run result unavailable: job failed with error message
 Library installation failed for library due to user error for maven {
  coordinates: "com.linkedin.feathr:feathr_2.12:0.8.0"
}
. Error messages:
Library installation attempted on the driver node of cluster 0926-000613-mqjvrxxu and failed. Please refer to the following error message to fix the library or contact Databricks support. Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: Library resolution failed. Cause: java.lang.RuntimeException: commons-logging:commons-logging download failed.

Code to reproduce bug

No response

What component(s) does this bug affect?

  • Python Client: This is the client users use to interact with most of our API. Mostly written in Python.
  • Computation Engine: The computation engine that execute the actual feature join and generation work. Mostly in Scala and Spark.
  • Feature Registry API: The frontend API layer supports SQL, Purview(Atlas) as storage. The API layer is in Python(FAST API)
  • Feature Registry Web UI: The Web UI for feature registry. Written in React
@blrchen blrchen added the bug Something isn't working label Sep 26, 2022
@blrchen
Copy link
Collaborator Author

blrchen commented Sep 26, 2022

Seems this is Azure Databricks issue, tried install a public oss maven jar on same Azure Databricks cluster get similar error

image

@blrchen
Copy link
Collaborator Author

blrchen commented Sep 26, 2022

Assigning to @Yuqing-cat as she is working with Azure Support on this.

@Uditanshu1612
Copy link

Hello
I'm also working on the POC of Feathr Feature store using Azure Databricks.
I'm also facing the same issue mentioned above.
While fetching the feature from offline store when doing a point-in time join to generate dataframe I'm also facing the same Error.

image

Please provide some solution to get overcome from the error.

@Yuqing-cat
Copy link
Collaborator

Workaround

Hi @Uditanshu1612 ,
It can be resolved by adding feathr_runtime_location: "https://azurefeathrstorage.blob.core.windows.net/public/feathr-assembly-LATEST.jar" in your config yaml file Databricks section.
Please check the "workaround" section in the bug description for more details.

@Uditanshu1612
Copy link

While adding the feathr_runtime_location in our config yaml file Databricks section, then there is a problem to create a FeathrClient and it gives an error.

image

@Uditanshu1612
Copy link

Hi @blrchen Can you please tell me How you run the notebook of POC of Feathr Feature store local in your pyspark ?

@Yuqing-cat
Copy link
Collaborator

While adding the feathr_runtime_location in our config yaml file Databricks section, then there is a problem to create a FeathrClient and it gives an error.

image

It looks like a config format problem. Could you share your config file (with sensitive info removed) so that I can help to check the root cause?

@Yuqing-cat
Copy link
Collaborator

Hi @blrchen Can you please tell me How you run the notebook of POC of Feathr Feature store local in your pyspark ?

And local spark provider is introduced here: https://feathr-ai.github.io/feathr/how-to-guides/local-spark-provider.html

@Uditanshu1612
Copy link

This is a config file I'm using and I'm running it in local pyspark but it says only Synapse and Databricks is supported while creating a FeathrClient object.

@Uditanshu1612
Copy link

import tempfile
yaml_config = """
Please refer to https://github.com/linkedin/feathr/blob/main/feathr_project/feathrcli/data/feathr_user_workspace/feathr_config.yaml for explanations on the meaning of each field.
api_version: 1
project_config:
project_name: 'feathr_getting_started'
required_environment_variables:
- 'REDIS_PASSWORD': ""
- 'AZURE_CLIENT_ID'
- 'AZURE_TENANT_ID': ""
- 'AZURE_CLIENT_SECRET'
optional_environment_variables:
- ADLS_ACCOUNT: ""
- ADLS_KEY: ""
- WASB_ACCOUNT: ""
- WASB_KEY: ""
- ODBC_DATABASE: ""
- ODBC_USER: ""
- ODBC_PASSWORD: ""
offline_store:
Please set 'enabled' flags as true (false by default) if any of items under the same paths are expected to be visited
adls:
adls_enabled: true
wasb:
wasb_enabled: true
s3:
s3_enabled: false
s3_endpoint: 's3.amazonaws.com'
sql_db:
sql_db_enabled: true
sql_db_endpoint: f("mssql+pyodbc://azureuser:XXXXXXXXXX@syj4rhbt6eci3iuws1.database.windows.net:1433/syj4rhbt6eci3iuws1p1?driver=ODBC+Driver+17+for+SQL+Server&autocommit=True")
odbc:
odbc_enabled: true
odbc_database: 'syj4rhbt6eci3iuws1p1'
snowflake:
snowflake_enabled: true
url: "dqllago-ol19457.snowflakecomputing.com"
user: "feathrintegration"
role: "ACCOUNTADMIN"
spark_config:
spark_cluster: 'local'
spark_result_output_parts: '1'
local:
feathr_runtime_location: None
online_store:
redis:
host: 'feathrazuretest3redis.redis.cache.windows.net'
port: 6380
ssl_enabled: True
feature_registry:
api_endpoint: "https://testapp090922.azurewebsites.net"
"""
tmp = tempfile.NamedTemporaryFile(mode='w', delete=False)
with open(tmp.name, "w") as text_file:
text_file.write(yaml_config)
feathr_output_path = f'abfss://dlssyj4rhbt6eci3iufs1@dlssyj4rhbt6eci3iu.dfs.core.windows.net'

@Uditanshu1612
Copy link

I'm not giving the configuration for S3 and Snowflake because I'm using SQL DB as an Offline store

@Yuqing-cat
Copy link
Collaborator

This is a config file I'm using and I'm running it in local pyspark but it says only Synapse and Databricks is supported while creating a FeathrClient object.

Thanks for your info @Uditanshu1612.
The local test config file we use to run daily test is like:

spark_config:
  # choice for spark runtime. Currently support: azure_synapse, databricks, local
  spark_cluster: 'local'
  spark_result_output_parts: '1'
  local:
    master: 'local[*]'
    feathr_runtime_location:

Your config seems correct. Please double check the indent. And you could also print out the client.spark_runtime and check what the client get.

To improve user experience on this, a PR is submitted to refine the error message: #755. You feedback is important to us :)

@Uditanshu1612
Copy link

Thank-you @Yuqing-cat, I'll try this and if I face any issue I'll ping here.

@Uditanshu1612
Copy link

Hello @Yuqing-cat
I'm facing the issue while importing a feathr package. Till Now I'm not facing this error but currently I'm facing this issue.

Currently My azure-core is a version of 1.26.0 so while executing the module using this version of azure-core it throws me an error.

image

So here by observing the error I get to know that module requires an azure-core of version <=1.22.1. So, if I downgrade the version to 1.22.1 or less than that, then I'm facing an error.

image

How to resolve the error and successfully import the module?

@Yuqing-cat
Copy link
Collaborator

Hello @Yuqing-cat I'm facing the issue while importing a feathr package. Till Now I'm not facing this error but currently I'm facing this issue.

Currently My azure-core is a version of 1.26.0 so while executing the module using this version of azure-core it throws me an error.

image

So here by observing the error I get to know that module requires an azure-core of version <=1.22.1. So, if I downgrade the version to 1.22.1 or less than that, then I'm facing an error.

image

How to resolve the error and successfully import the module?

Hi @Uditanshu1612 , the azure-core dependency has some known issue that may fail in certain environment, e.g. AML or Ubuntu VM. We have separate PRs to make is more robust in different platform, like #763.

I highly recommend you to join the Feathr slack channel where you will get more instant help and updated with the latest notification: https://join.slack.com/t/feathrai/shared_invite/zt-1hy8m4def-w8w6SYNFxvTAuuihTvohVw

@blrchen
Copy link
Collaborator Author

blrchen commented Jan 5, 2023

Closing as fixed in 0.9.0

@blrchen blrchen closed this as completed Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants