Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-7364][CORE] Simplify the RuleInjector #7365

Merged
merged 1 commit into from
Sep 29, 2024

Conversation

beliefer
Copy link
Contributor

@beliefer beliefer commented Sep 26, 2024

What changes were proposed in this pull request?

This PR proposes to simplify the RuleInjector.
The change increases the readability. There are two change.

  1. Avoid copy code from SparkSessionExtensions and simplify the SparkInjector. The related RuleApi also updated. In fact, we can remove the SparkInjector, but I suspect it already became the public API.
  2. Simplify the GlutenInjector.

How was this patch tested?

integration tests

@github-actions github-actions bot added CORE works for Gluten Core VELOX CLICKHOUSE labels Sep 26, 2024
Copy link

#7364

Copy link

Run Gluten Clickhouse CI

@beliefer beliefer changed the title [GLUTEN-7364] Simplify the RuleInjector [GLUTEN-7364][VL][CH] Simplify the RuleInjector Sep 26, 2024
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@FelixYBW FelixYBW changed the title [GLUTEN-7364][VL][CH] Simplify the RuleInjector [GLUTEN-7364][CORE] Simplify the RuleInjector Sep 26, 2024
@beliefer
Copy link
Contributor Author

ping @zhouyuan @zzcclp @liuneng1994

@beliefer
Copy link
Contributor Author

cc @zhztheplayer @FelixYBW

zhouyuan
zhouyuan previously approved these changes Sep 27, 2024
Copy link
Contributor

@zhouyuan zhouyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beliefer Overall lgtm except for some minors. Thanks,

Comment on lines 22 to 23
@deprecated("This class is deprecated and will be removed in future versions.", since = "1.3.0")
class SparkInjector private[injector] (val extensions: SparkSessionExtensions) {}
Copy link
Member

@zhztheplayer zhztheplayer Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just remove minor unused APIs like this one, as Gluten doesn't officially guarantee API level backward compatibility yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As expected.

Comment on lines 22 to 30
class RuleInjector(extensions: SparkSessionExtensions) {
val spark: SparkInjector = new SparkInjector(extensions)
val gluten: GlutenInjector = new GlutenInjector()

private[extension] def inject(extensions: SparkSessionExtensions): Unit = {
spark.inject(extensions)
// The regular Spark rules already injected with the `injectRules` of `RuleApi` directly.
// Only inject the Spark columnar rule here.
gluten.inject(extensions)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both constructor and method #inject accepts a extensions: SparkSessionExtensions. Can we remove the one in #inject?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

Copy link

Run Gluten Clickhouse CI

zhztheplayer
zhztheplayer previously approved these changes Sep 27, 2024
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @beliefer, just some suggestions about naming left.

Though overall I will be +0 to this. Do you have some specific reasons to make the change?

The reason why we split the planning phase into spark and gluten is that we wanted to isolate Gluten's query planer with Spark, to the maximum extent. And we wanted Gluten to inject only one single built-in Spark ColumnarRule through GlutenInjector, so we hided #injectColumnar method by wrapping SparkSessionExtensions with SparkInjector. If we expose that API, we may need to pay more attention on Gluten PRs that add rules to make sure they do things right.

injector.injectOptimizerRule(CollectRewriteRule.apply)
injector.injectOptimizerRule(HLLRewriteRule.apply)
injector.injectPostHocResolutionRule(ArrowConvertorRule.apply)
def injectSpark(injector: RuleInjector): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it

Suggested change
def injectSpark(injector: RuleInjector): Unit = {
def injectSpark(exts: SparkSessionExtensions): Unit = {

? Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about extensions?

@@ -19,12 +19,12 @@ package org.apache.gluten.extension.injector
import org.apache.spark.sql.SparkSessionExtensions

/** Injector used to inject query planner rules into Spark and Gluten. */
class RuleInjector {
val spark: SparkInjector = new SparkInjector()
class RuleInjector(val extensions: SparkSessionExtensions) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class RuleInjector(val extensions: SparkSessionExtensions) {
class RuleInjector(val spark: SparkSessionExtensions) {

Copy link
Contributor Author

@beliefer beliefer Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the world of Spark, spark usually means SparkSession.
For consistent, let's use extensions.

@beliefer
Copy link
Contributor Author

The reason why we split the planning phase into spark and gluten is that we wanted to isolate Gluten's query planer with Spark, to the maximum extent. And we wanted Gluten to inject only one single built-in Spark ColumnarRule through GlutenInjector, so we hided #injectColumnar method by wrapping SparkSessionExtensions with SparkInjector. If we expose that API, we may need to pay more attention on Gluten PRs that add rules to make sure they do things right.

The rules for isolating Spark and Gluten are not a problem. As you said, Gluten to inject only one single built-in Spark ColumnarRule through GlutenInjector, so we hided #injectColumnar method by wrapping SparkSessionExtensions with GlutenInjector. But SparkInjector only copy the API from SparkSessionExtensions, so this PR doesn't change the GlutenInjector and only remove the SparkInjector. This PR doesn't break what the Gluten want.

Copy link

Run Gluten Clickhouse CI

@zhztheplayer
Copy link
Member

But SparkInjector only copy the API from SparkSessionExtensions, so this PR doesn't change the GlutenInjector and only remove the SparkInjector. This PR doesn't break what the Gluten want.

I meant after the PR, developer could use API injector.extensions.injectColumnar to inject another ColumnarRule in VeloxRuleApi / CHRuleAPI, since we expose SparkSessionExtensions to developer. Before this PR, one cannot access SparkSessionExtensions in VeloxRuleApi / CHRuleAPI. Just to make sure we are on the same page here.

@beliefer
Copy link
Contributor Author

But SparkInjector only copy the API from SparkSessionExtensions, so this PR doesn't change the GlutenInjector and only remove the SparkInjector. This PR doesn't break what the Gluten want.

I meant after the PR, developer could use API injector.extensions.injectColumnar to inject another ColumnarRule in VeloxRuleApi / CHRuleAPI, since we expose SparkSessionExtensions to developer. Before this PR, one cannot access SparkSessionExtensions in VeloxRuleApi / CHRuleAPI. Just to make sure we are on the same page here.

Got it. We can reserve the SparkInjector. Let me refactor it.

zhztheplayer
zhztheplayer previously approved these changes Sep 29, 2024
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@zhztheplayer zhztheplayer merged commit 35366f4 into apache:main Sep 29, 2024
44 checks passed
@beliefer
Copy link
Contributor Author

@zhztheplayer @zhouyuan Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLICKHOUSE CORE works for Gluten Core VELOX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants