Put DF_UDF plugin code into the main uber jar. #11634

revans2 · 2024-10-18T20:01:17Z

This puts the data frame UDF plugin code into the uber jar. It should now work just like our regular SQL plugin to get DF_UDF functionality.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

revans2 · 2024-10-18T20:01:24Z

build

jlowe

Is it necessary to move the code under sql-plugin? I thought we would just have the aggregator project add the df_udf project as a dependency to pull it into the dist jar, similar to how we treat the shuffle and udf plugin projects.

DF_UDF_README.md

abellina · 2024-10-18T20:21:33Z

DF_UDF_README.md

-To do this include com.nvidia:df_udf_plugin as a dependency for your project and also include it on the 
-classpath for your Apache Spark environment. Then include `com.nvidia.spark.DFUDFPlugin` in the config 
-`spark.sql.extensions`. Now you can implement a UDF in terms of Dataframe operations.
+Accelerator for Apache Spark. As such you will need to select a scala version 2.12 or 2.13 that matches the


It feels something is missing from this sentence. It should at least be "RAPIDS Accelerator for Apache Spark", but my guess is there is a copy and paste error.

Thanks I totally missed that.

revans2 · 2024-10-21T13:33:40Z

@jlowe I put it under the SQL plugin because I heard some push back on creating another API jar in addition to a shimed implementation jar. I am fine with doing whatever people want me to do, but we just need to decide what that is. This patch was by far the simplest way to get something working so I went with this.

jlowe · 2024-10-21T13:49:59Z

This patch was by far the simplest way to get something working so I went with this.

What I'm proposing is even simpler. Given it's essentially a standalone component, we could leave everything in the existing df_udf project. We add the df_udf project to the aggregator project so it ends up being part of the dist jar. The df_udf project would depend on the sql-plugin-api so it can get access to ShimLoader, and we unshim the "front door" classes for df_udf. That would let us keep the code in a separate project which would make it easier to separate if we ever decide to do that.

I'm fine with it being part of sql-plugin if that's the way you want to go. I was wondering if you considered the aggregator approach and eliminated it for some reason.

revans2 · 2024-10-21T14:44:24Z

@jlowe I thought about it, but I wasn't sure how the shimming would work. I am happy to give it a try and get back to you on it.

revans2 · 2024-10-21T17:36:42Z

build

revans2 · 2024-10-21T17:38:17Z

@jlowe and @abellina please take another look. I tried to do what Jason Suggested, but I ran into some issue with the public APIs not working as I expected. I think I could make it work, but this is already working and I don't want to spend too much more time on it right now.

Put DF_UDF plugin code into the main uber jar.

8f29f05

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

revans2 requested review from jlowe, tgravescs, GaryShen2008, NvTimLiu and gerashegalov as code owners October 18, 2024 20:01

jlowe reviewed Oct 18, 2024

View reviewed changes

DF_UDF_README.md Outdated Show resolved Hide resolved

abellina reviewed Oct 18, 2024

View reviewed changes

sameerz added the feature request New feature or request label Oct 19, 2024

revans2 added 2 commits October 21, 2024 09:29

Merge branch 'branch-24.12' into shim_df_udf_better

c4aec8f

Review comments

a4f7135

revans2 added 2 commits October 21, 2024 11:37

Fix 2.13 pom files

fc6f250

Merge branch 'branch-24.12' into shim_df_udf_better

767e739

jlowe approved these changes Oct 21, 2024

View reviewed changes

abellina approved these changes Oct 21, 2024

View reviewed changes

revans2 merged commit 05f40b5 into NVIDIA:branch-24.12 Oct 24, 2024
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Put DF_UDF plugin code into the main uber jar. #11634

Put DF_UDF plugin code into the main uber jar. #11634

revans2 commented Oct 18, 2024

revans2 commented Oct 18, 2024

jlowe left a comment

abellina Oct 18, 2024

revans2 Oct 21, 2024

revans2 commented Oct 21, 2024

jlowe commented Oct 21, 2024

revans2 commented Oct 21, 2024

revans2 commented Oct 21, 2024

revans2 commented Oct 21, 2024

Put DF_UDF plugin code into the main uber jar. #11634

Put DF_UDF plugin code into the main uber jar. #11634

Conversation

revans2 commented Oct 18, 2024

revans2 commented Oct 18, 2024

jlowe left a comment

Choose a reason for hiding this comment

abellina Oct 18, 2024

Choose a reason for hiding this comment

revans2 Oct 21, 2024

Choose a reason for hiding this comment

revans2 commented Oct 21, 2024

jlowe commented Oct 21, 2024

revans2 commented Oct 21, 2024

revans2 commented Oct 21, 2024

revans2 commented Oct 21, 2024