-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-6963] Fix class conflict of CreateIndex from Spark3.3 #9895
Conversation
cc @codope |
| IN | ||
| INDEX | ||
| INDEXES | ||
| IF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need some of these tokens for other SQL statements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I checked and only remove unneeded.
|
||
override def run(sparkSession: SparkSession): Seq[Row] = { | ||
val tableId = table.identifier | ||
val metaClient = createHoodieTableMetaClient(tableId, sparkSession) | ||
val columnsMap: java.util.LinkedHashMap[String, java.util.Map[String, String]] = | ||
new util.LinkedHashMap[String, java.util.Map[String, String]]() | ||
columns.map(c => columnsMap.put(c._1.name, c._2.asJava)) | ||
columns.map(c => columnsMap.put(c._1.mkString("."), c._2.asJava)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change this? for nested fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now columns' name is Seq[String]
instead of Attribute
(UnresolvedAttribute
to be more specific), since this pr will try to resolve column names
Line 208 in 6eb38c5
case (u: UnresolvedFieldName, prop) => resolveFieldNames(cmd.table, u.name, u) -> prop |
UnresolvedAttribute
is no need anymore, so here I directly use Seq[String]
to represent the column name here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
@@ -149,144 +149,4 @@ class HoodieSqlCommonAstBuilder(session: SparkSession, delegate: ParserInterface | |||
private def typedVisit[T](ctx: ParseTree): T = { | |||
ctx.accept(this).asInstanceOf[T] | |||
} | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all of this code copied from Spark 3.3+? Wondering if the original PR introducing this intends to support CreateIndex for Spark 3.2 and below. cc @huberylee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These codes are introduced from #5761, maybe it's only for supporting Index commands
| ) | ||
| partitioned by(ts) | ||
| location '$basePath' | ||
if (HoodieSparkUtils.gteqSpark3_2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like TestSecondaryIndex
should also have a precondition on the spark version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, will fix it as well
@@ -3317,6 +3317,145 @@ class HoodieSpark3_2ExtendedSqlAstBuilder(conf: SQLConf, delegate: ParserInterfa | |||
position = Option(ctx.colPosition).map(pos => | |||
UnresolvedFieldPosition(typedVisit[ColumnPosition](pos)))) | |||
} | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. So at least CreateIndex is still supported in Spark 3.2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
@@ -29,5 +29,12 @@ statement | |||
| createTableHeader ('(' colTypeList ')')? tableProvider? | |||
createTableClauses | |||
(AS? query)? #createTable | |||
| CREATE INDEX (IF NOT EXISTS)? identifier ON TABLE? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we still maintain the grammar in a single place for all Spark versions, but fail the logical plan of INDEX SQL statement in Spark 3.1 and below, so the grammar can be easily maintained?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot put the grammar in HoodieSqlCommon.g4
, since we have to parse it in HoodieSqlCommonAstBuilder
to build CreateIndex
, while one param FieldName
doesn't exist below Spark3.
We already have this for time travel
, and grammar is rarely modified, so changing it like so is acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. @codope is working on functional indexes in Hudi and may extend the grammar here. So I'm thinking if it's not too hard to put the grammar in a common place, we should do that.
@@ -3327,6 +3327,145 @@ class HoodieSpark3_3ExtendedSqlAstBuilder(conf: SQLConf, delegate: ParserInterfa | |||
position = Option(ctx.colPosition).map(pos => | |||
UnresolvedFieldPosition(typedVisit[ColumnPosition](pos)))) | |||
} | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the SQL parsing of INDEX SQL statement should not be different across Spark versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, it should be same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Change Logs
CreateIndex is added in HUDI-4165, and spark 3.3 also include this in SPARK-36895. Since
CreateIndex
uses same packageorg.apache.spark.sql.catalyst.plans.logical
in HUDI and Spark3.3, but params are not same. So it could introduce class conflict issues if we use it.One simple modification to fix this issue is we directly move
CreateIndex
to a different package path, but I think it's not proper to do, givenCreateIndex
org.apache.spark.sql.catalyst.plans.logical
for logical plan(likeMergeInto
)So here I still keep the same package path with Spark, but change
org.apache.spark.sql.catalyst.analysis.FieldName
butCreateIndex
requires, so we only support Index related commands from Spark3.2(I can also add support for Spark2 and Spark3.0, spark3.1, but not sure it's still needed or not)Impact
Describe any public API or user-facing feature change or any performance impact.
Index related commands are not supported anymore below Spark3.2
Risk level (write none, low medium or high below)
If medium or high, explain what verification was done to mitigate the risks.
low
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist