-
Notifications
You must be signed in to change notification settings - Fork 425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-7364][CORE] Simplify the RuleInjector #7365
Conversation
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
ping @zhouyuan @zzcclp @liuneng1994 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@beliefer Overall lgtm except for some minors. Thanks,
@deprecated("This class is deprecated and will be removed in future versions.", since = "1.3.0") | ||
class SparkInjector private[injector] (val extensions: SparkSessionExtensions) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just remove minor unused APIs like this one, as Gluten doesn't officially guarantee API level backward compatibility yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As expected.
class RuleInjector(extensions: SparkSessionExtensions) { | ||
val spark: SparkInjector = new SparkInjector(extensions) | ||
val gluten: GlutenInjector = new GlutenInjector() | ||
|
||
private[extension] def inject(extensions: SparkSessionExtensions): Unit = { | ||
spark.inject(extensions) | ||
// The regular Spark rules already injected with the `injectRules` of `RuleApi` directly. | ||
// Only inject the Spark columnar rule here. | ||
gluten.inject(extensions) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both constructor and method #inject
accepts a extensions: SparkSessionExtensions
. Can we remove the one in #inject
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @beliefer, just some suggestions about naming left.
Though overall I will be +0 to this. Do you have some specific reasons to make the change?
The reason why we split the planning phase into spark
and gluten
is that we wanted to isolate Gluten's query planer with Spark, to the maximum extent. And we wanted Gluten to inject only one single built-in Spark ColumnarRule
through GlutenInjector
, so we hided #injectColumnar
method by wrapping SparkSessionExtensions
with SparkInjector
. If we expose that API, we may need to pay more attention on Gluten PRs that add rules to make sure they do things right.
injector.injectOptimizerRule(CollectRewriteRule.apply) | ||
injector.injectOptimizerRule(HLLRewriteRule.apply) | ||
injector.injectPostHocResolutionRule(ArrowConvertorRule.apply) | ||
def injectSpark(injector: RuleInjector): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it
def injectSpark(injector: RuleInjector): Unit = { | |
def injectSpark(exts: SparkSessionExtensions): Unit = { |
? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about extensions
?
@@ -19,12 +19,12 @@ package org.apache.gluten.extension.injector | |||
import org.apache.spark.sql.SparkSessionExtensions | |||
|
|||
/** Injector used to inject query planner rules into Spark and Gluten. */ | |||
class RuleInjector { | |||
val spark: SparkInjector = new SparkInjector() | |||
class RuleInjector(val extensions: SparkSessionExtensions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class RuleInjector(val extensions: SparkSessionExtensions) { | |
class RuleInjector(val spark: SparkSessionExtensions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the world of Spark, spark
usually means SparkSession
.
For consistent, let's use extensions
.
The rules for isolating Spark and Gluten are not a problem. As you said, Gluten to inject only one single built-in Spark |
Run Gluten Clickhouse CI |
I meant after the PR, developer could use API |
Got it. We can reserve the |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
@zhztheplayer @zhouyuan Thank you! |
What changes were proposed in this pull request?
This PR proposes to simplify the
RuleInjector
.The change increases the readability. There are two change.
SparkSessionExtensions
and simplify theSparkInjector
. The relatedRuleApi
also updated. In fact, we can remove theSparkInjector
, but I suspect it already became the public API.GlutenInjector
.How was this patch tested?
integration tests