Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put DF_UDF plugin code into the main uber jar. #11634

Merged
merged 5 commits into from
Oct 24, 2024

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Oct 18, 2024

This puts the data frame UDF plugin code into the uber jar. It should now work just like our regular SQL plugin to get DF_UDF functionality.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
@revans2
Copy link
Collaborator Author

revans2 commented Oct 18, 2024

build

Copy link
Contributor

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to move the code under sql-plugin? I thought we would just have the aggregator project add the df_udf project as a dependency to pull it into the dist jar, similar to how we treat the shuffle and udf plugin projects.

DF_UDF_README.md Outdated Show resolved Hide resolved
DF_UDF_README.md Outdated
To do this include com.nvidia:df_udf_plugin as a dependency for your project and also include it on the
classpath for your Apache Spark environment. Then include `com.nvidia.spark.DFUDFPlugin` in the config
`spark.sql.extensions`. Now you can implement a UDF in terms of Dataframe operations.
Accelerator for Apache Spark. As such you will need to select a scala version 2.12 or 2.13 that matches the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels something is missing from this sentence. It should at least be "RAPIDS Accelerator for Apache Spark", but my guess is there is a copy and paste error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I totally missed that.

@sameerz sameerz added the feature request New feature or request label Oct 19, 2024
@revans2
Copy link
Collaborator Author

revans2 commented Oct 21, 2024

@jlowe I put it under the SQL plugin because I heard some push back on creating another API jar in addition to a shimed implementation jar. I am fine with doing whatever people want me to do, but we just need to decide what that is. This patch was by far the simplest way to get something working so I went with this.

@jlowe
Copy link
Contributor

jlowe commented Oct 21, 2024

This patch was by far the simplest way to get something working so I went with this.

What I'm proposing is even simpler. Given it's essentially a standalone component, we could leave everything in the existing df_udf project. We add the df_udf project to the aggregator project so it ends up being part of the dist jar. The df_udf project would depend on the sql-plugin-api so it can get access to ShimLoader, and we unshim the "front door" classes for df_udf. That would let us keep the code in a separate project which would make it easier to separate if we ever decide to do that.

I'm fine with it being part of sql-plugin if that's the way you want to go. I was wondering if you considered the aggregator approach and eliminated it for some reason.

@revans2
Copy link
Collaborator Author

revans2 commented Oct 21, 2024

@jlowe I thought about it, but I wasn't sure how the shimming would work. I am happy to give it a try and get back to you on it.

@revans2
Copy link
Collaborator Author

revans2 commented Oct 21, 2024

build

@revans2
Copy link
Collaborator Author

revans2 commented Oct 21, 2024

@jlowe and @abellina please take another look. I tried to do what Jason Suggested, but I ran into some issue with the public APIs not working as I expected. I think I could make it work, but this is already working and I don't want to spend too much more time on it right now.

@revans2 revans2 merged commit 05f40b5 into NVIDIA:branch-24.12 Oct 24, 2024
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants