Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: 'JavaPackage' object is not callable error despite linking jars into spark succesfully #9

Open
NatMzk opened this issue Jun 7, 2022 · 7 comments

Comments

@NatMzk
Copy link

NatMzk commented Jun 7, 2022

I have run the link_pmml4s_jars_into_spark.py script succesfully
image

and pmml4s jar files are present in SPARK_HOME location
image

However, TypeError: 'JavaPackage' object is not callable still occurs
image

I am running Java Version=1.8.0_302 and Spark Version=3.2.1.

I would kindly appreciate any suggestion what is missing.

@scorebot
Copy link
Member

scorebot commented Jun 8, 2022

@NatMzk The ScoreModel.fromFile() expects a local pathname of the model, could you use other methods like fromBytes or fromString to load the model? so first you should read the model from the path dbfs:/... by yourself.

@NatMzk
Copy link
Author

NatMzk commented Jun 8, 2022

from my understanding dbfs path is databricks's local path where the XML pmml model is located. I tried using fromBytes and fromString methods but it results with the same error.

@scorebot
Copy link
Member

scorebot commented Jun 8, 2022

@NatMzk Could you provide the full stack of the exception above? and try restarting the kernel before load model

@NatMzk
Copy link
Author

NatMzk commented Jun 8, 2022

I restarted kernel by Deattaching & Reattaching notebook with no results. Error trace is as following:

image

I am running Databricks Runtime Version 10.4 LTS on single node cluster (not pure spark).
Apache Spark=3.2.1
Java Version=1.8.0_302 (Azul Systems, Inc.)

@scorebot
Copy link
Member

I don't have the Databricks Runtime, but when I remove the links created by the script link_pmml4s_jars_into_spark.py, I can reproduce the same error on my side, so I guess your issue could be caused by the same reason that those dependent jars of pmml4s are not found by Spark, there some several ways to try:

For details about the following configurations, see the official doc: https://spark.apache.org/docs/latest/configuration.html

All those ones can be specified by the conf file or the command line, check the doc for your eivironment. Take the command line to launch pyspark as an example:

  1. set spark.jars,
pyspark --conf spark.jars="$(echo /Path/To/pypmml_spark/jars/*.jar | tr ' ' ',')"
  1. set spark.jars.packages
pyspark --conf spark.jars.packages=org.pmml4s:pmml4s_2.12:0.9.16,org.pmml4s:pmml4s-spark_2.12:0.9.16,io.spray:spray-json_2.12:1.3.5,org.apache.commons:commons-math3:3.6.1
  1. set spark.driver.extraClassPath and spark.executor.extraClassPath
pyspark --conf spark.driver.extraClassPath="/Path/To/pypmml_spark/jars/*" --conf spark.executor.extraClassPath="/Path/To/pypmml_spark/jars/*"

Recommend the options 1 and 2

@scorebot
Copy link
Member

@NatMzk Did the methods above resolve your issue?

@jrauch-pros
Copy link

Another relatively simple way for Databricks is to copy the jar files to /databricks/jars, for example in a cluster install script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants