Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25560][SQL] Allow FunctionInjection in SparkExtensions #22576

Closed
wants to merge 1 commit into from
Closed

[SPARK-25560][SQL] Allow FunctionInjection in SparkExtensions #22576

wants to merge 1 commit into from

Conversation

RussellSpitzer
Copy link
Member

This allows an implementer of Spark Session Extensions to utilize a
method "injectFunction" which will add a new function to the default
Spark Session Catalogue.

What changes were proposed in this pull request?

Adds a new function to SparkSessionExtensions

def injectFunction(functionDescription: FunctionDescription)

Where function description is a new type

type FunctionDescription = (FunctionIdentifier, FunctionBuilder)

The functions are loaded in BaseSessionBuilder when the function registry does not have a parent
function registry to get loaded from.

How was this patch tested?

New unit tests are added for the extension in SparkSessionExtensionSuite

@RussellSpitzer
Copy link
Member Author

@hvanhovell Made a full PR for the change we discussed. Also updated the signature to match the new defined types for the registry and Identifier.

@SparkQA
Copy link

SparkQA commented Sep 28, 2018

Test build #96728 has finished for PR 22576 at commit 5fdca38.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 28, 2018

Test build #96757 has finished for PR 22576 at commit 450dd73.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@RussellSpitzer
Copy link
Member Author

RussellSpitzer commented Sep 28, 2018 via email

@RussellSpitzer
Copy link
Member Author

Ah I was registering functions with the built-in registry which is not reset. I've changed it to register only with a clone of the built-in registry. This would allow multiple extensions, or some sessions to use extensions and others not to

@SparkQA
Copy link

SparkQA commented Sep 28, 2018

Test build #96769 has finished for PR 22576 at commit 718c8ec.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

private[this] val injectedFunctions =
mutable.Buffer.empty[FunctionDescription]

private[sql] def registerFunctions(functionRegistry: FunctionRegistry) = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fix the indentation for the next 14 lines?

@@ -95,7 +95,8 @@ abstract class BaseSessionStateBuilder(
* This either gets cloned from a pre-existing version or cloned from the built-in registry.
*/
protected lazy val functionRegistry: FunctionRegistry = {
parentState.map(_.functionRegistry).getOrElse(FunctionRegistry.builtin).clone()
parentState.map(_.functionRegistry.clone())
.getOrElse{extensions.registerFunctions(FunctionRegistry.builtin.clone())}
Copy link
Contributor

@hvanhovell hvanhovell Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Use parenthesis instead of curly braces?

@@ -168,4 +173,22 @@ class SparkSessionExtensions {
def injectParser(builder: ParserBuilder): Unit = {
parserBuilders += builder
}

private[this] val injectedFunctions =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: no new line?

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of style remarks, but looks good in general.

This allows an implementer of Spark Session Extensions to utilize a
method "injectFunction" which will add a new function to the default
Spark Session Catalog.
@RussellSpitzer
Copy link
Member Author

Cleaned up

@SparkQA
Copy link

SparkQA commented Oct 16, 2018

Test build #97462 has finished for PR 22576 at commit 32e0a78.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Oct 19, 2018

Test build #97576 has finished for PR 22576 at commit 32e0a78.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

private[this] val injectedFunctions = mutable.Buffer.empty[FunctionDescription]

private[sql] def registerFunctions(functionRegistry: FunctionRegistry) = {
for ((name, expressionInfo, function) <- injectedFunctions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the stuff that changes the FunctionRegistry into the BaseSessionStateBuilder and just make this return the Seq[FunctionDescription]? The return type of this function a FunctionRegistry sort of implies that you are getting back a new registry instead of a mutated one. If we are mutating then I prefer to do that in the BaseSessionBuilder so it is obvious that this is safe to do because we mutating a clone. It also makes this code more inline with the rest of the extension class (not mutating). Sorry for the late change of heart.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha we just changed a function in the opposite direction on my other commit. The project should probably pick one dorm and put it in the style guide. I'll make the chznge

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@hvanhovell
Copy link
Contributor

@RussellSpitzer I am merging this, can you address my comment in a follow up? Thanks!

@asfgit asfgit closed this in 6e0fc8b Oct 19, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
This allows an implementer of Spark Session Extensions to utilize a
method "injectFunction" which will add a new function to the default
Spark Session Catalogue.

## What changes were proposed in this pull request?

Adds a new function to SparkSessionExtensions

    def injectFunction(functionDescription: FunctionDescription)

Where function description is a new type

  type FunctionDescription = (FunctionIdentifier, FunctionBuilder)

The functions are loaded in BaseSessionBuilder when the function registry does not have a parent
function registry to get loaded from.

## How was this patch tested?

New unit tests are added for the extension in SparkSessionExtensionSuite

Closes apache#22576 from RussellSpitzer/SPARK-25560.

Authored-by: Russell Spitzer <Russell.Spitzer@gmail.com>
Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants