-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Put DF_UDF plugin code into the main uber jar. #11634
Conversation
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to move the code under sql-plugin? I thought we would just have the aggregator project add the df_udf project as a dependency to pull it into the dist jar, similar to how we treat the shuffle and udf plugin projects.
DF_UDF_README.md
Outdated
To do this include com.nvidia:df_udf_plugin as a dependency for your project and also include it on the | ||
classpath for your Apache Spark environment. Then include `com.nvidia.spark.DFUDFPlugin` in the config | ||
`spark.sql.extensions`. Now you can implement a UDF in terms of Dataframe operations. | ||
Accelerator for Apache Spark. As such you will need to select a scala version 2.12 or 2.13 that matches the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels something is missing from this sentence. It should at least be "RAPIDS Accelerator for Apache Spark", but my guess is there is a copy and paste error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks I totally missed that.
@jlowe I put it under the SQL plugin because I heard some push back on creating another API jar in addition to a shimed implementation jar. I am fine with doing whatever people want me to do, but we just need to decide what that is. This patch was by far the simplest way to get something working so I went with this. |
What I'm proposing is even simpler. Given it's essentially a standalone component, we could leave everything in the existing df_udf project. We add the df_udf project to the aggregator project so it ends up being part of the dist jar. The df_udf project would depend on the sql-plugin-api so it can get access to ShimLoader, and we unshim the "front door" classes for df_udf. That would let us keep the code in a separate project which would make it easier to separate if we ever decide to do that. I'm fine with it being part of sql-plugin if that's the way you want to go. I was wondering if you considered the aggregator approach and eliminated it for some reason. |
@jlowe I thought about it, but I wasn't sure how the shimming would work. I am happy to give it a try and get back to you on it. |
build |
This puts the data frame UDF plugin code into the uber jar. It should now work just like our regular SQL plugin to get DF_UDF functionality.