We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Env: Databricks 10.4ML LTS with 22.06GA jar.
Below sample Pandas UDF(note: this is NOT cuDF Pandas UDF) got hang:
df = spark.createDataFrame( [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)], ("id", "v")) def subtract_mean(pdf): # pdf is a pandas.DataFrame v = pdf.v return pdf.assign(v=v - v.mean()) df.groupby("id").applyInPandas(subtract_mean, schema="id long, v double").show()
I also tried the below parameters set in Spark configs before starting the cluster, but it is failing the query saying "no module for cudf":
spark.rapids.sql.python.gpu.enabled true spark.python.daemon.module rapids.daemon_databricks spark.executorEnv.PYTHONPATH /databricks/jars/rapids-4-spark_2.12-22.06.0.jar:/databricks/spark/python
The text was updated successfully, but these errors were encountered:
Expected, cuDF python module is requred to run cudf UDF.
Sorry, something went wrong.
The IT runs the udf_test nightly, and does not get this issue. But I can reproduce it on DB node.
A quick WAR is to add a config as below. spark.conf.set("spark.databricks.execution.pandasZeroConfConversion.groupbyApply.enabled", "false")
spark.conf.set("spark.databricks.execution.pandasZeroConfConversion.groupbyApply.enabled", "false")
I am still debugging it. Not sure if the protocal was changed for pandasZeroConfConversion being true.
pandasZeroConfConversion
firestarman
Successfully merging a pull request may close this issue.
Env:
Databricks 10.4ML LTS with 22.06GA jar.
Below sample Pandas UDF(note: this is NOT cuDF Pandas UDF) got hang:
I also tried the below parameters set in Spark configs before starting the cluster, but it is failing the query saying "no module for cudf":
The text was updated successfully, but these errors were encountered: