-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-27108][SQL] Add parsed SQL plans for create, CTAS. #24029
[SPARK-27108][SQL] Add parsed SQL plans for create, CTAS. #24029
Conversation
This comment has been minimized.
This comment has been minimized.
@cloud-fan, this is needed to add the v2 create and CTAS plans. We can get a start while waiting for the catalog identifiers to be committed. |
cc @jzhuge, @mccheah, @gatorsmile |
3a77141
to
9bb101f
Compare
This comment has been minimized.
This comment has been minimized.
9bb101f
to
9e0913f
Compare
This comment has been minimized.
This comment has been minimized.
...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/ParsedLogicalPlan.scala
Outdated
Show resolved
Hide resolved
* Parsed logical plans are located in Catalyst so that as much SQL parsing logic as possible is be | ||
* kept in a [[org.apache.spark.sql.catalyst.parser.AbstractSqlParser]]. | ||
*/ | ||
private[sql] abstract class ParsedLogicalPlan extends LogicalPlan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this is useful hierarchy, but if yes we should document more clearly this should not survive analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be useful to add a special check for this, rather than relying on resolved only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 - @rdblue these should only be inputs to the analyzer, not outputs. Would be helpful to write specific JavaDoc on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is included above: "Parsed logical plans are not resolved because they must be converted to concrete logical plans."
Do you think that should be rephrased to be more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defer to @rxin but I'm ok with merging with the current docs. We can rephrase in a follow-up if our contributors have trouble with this wording.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/CreateTable.scala
Outdated
Show resolved
Hide resolved
* }}} | ||
* TODO: Remove this. It is used because CreateTempViewUsing is not a Catalyst plan. | ||
* Either move CreateTempViewUsing into catalyst as a parsed logical plan, or remove it because | ||
* it is deprecated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to double check, when did we deprecate CreateTempViewUsing
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it happened in 5effc01, 3 years ago.
We can always create a parsed plan for CreateTempViewUsing
so that it can move to catalyst as well, but I thought that we can do it later, and only need to if this isn't going to be removed in 3.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, so we deprecated CREATE TABLE USING
, but not CREATE TEMP VIEW USING
@@ -1860,4 +1861,193 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging | |||
val structField = StructField(identifier.getText, typedVisit(dataType), nullable = true) | |||
if (STRING == null) structField else structField.withComment(string(STRING)) | |||
} | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review this file carefully, assuming it's just copy-paste code from SparkSqlAstBuilder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is moving what is needed from SparkSqlAstBuilder
. The only real changes are in the visitCreateTable
method.
These rules should have already been in the abstract class. I'm not sure why they were in SparkSqlAstBuilder
other than that was the easiest place to put them when they were added.
* converting that v1 metadata to the v2 equivalent, the sql [[CreateTable]] plan is produced by | ||
* the parser and converted once into both implementations. | ||
* | ||
* Parsed logical plans are not resolved because they must be converted to concrete logical plans. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it really necessary to have this parent class just to set the resolved
bit? I think we can just put override lazy val resolved = false
in the new CreateTable
and CreateTableAsSelect
classes, with classdoc saying that these 2 classes will be replaced by what concrete plans during analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of this class is that it identifies the set of logical plans that correspond directly to what was parsed from SQL. When someone working on a plan sees ParsedLogicalPlan
as an ancestor in Scaladoc, it signals what is explained here: that parser produces ParsedLogicalPlan
nodes without translating what was parsed, then those plans get translated into real plans in the analyzer.
With that information, it is easy to see what changes need to be made. If the parsed plan doesn't include an option, then the parser and parsed plan needs to be updated. If it does include an option, then the analyzer and downstream plans need to be updated.
Also keep in mind that this is the first two subclasses of ParsedLogicalPlan
. To implement v2 along-side v1, we are going to be adding more of them. So it is valuable that we don't need to remember to set resolved
to false in every plan.
@cloud-fan, I've responded to the review comments and implemented fixes. Also, I moved the new resolution rules into At a minimum, these needed to be separated into different classes, but I think it is also more correct for resolution rules to run in the resolution batch so that other resolution rules can run on the plans produced by these rules. This is the same reason why we added the I think that the resolution rules in |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Retest this please. |
Test build #103578 has finished for PR 24029 at commit
|
Test build #103579 has finished for PR 24029 at commit
|
retest this please |
LGTM if tests pass |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
import org.apache.spark.sql.sources.v2.TableProvider | ||
import org.apache.spark.sql.types.StructType | ||
|
||
case class DataSourceResolution(conf: SQLConf) extends Rule[LogicalPlan] with CastSupport { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we call it DDLResolution
? It's not very related to data source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This resolves parsed plans to execution.datasources
plans. This isn't just for DDL. We are starting with CreateTable
and CreateTableAsSelect
, but there will be more parsed plans that get converted to datasource plans in this rule. That's why I think DataSourceResolution
is an appropriate name.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/CreateTable.scala
Outdated
Show resolved
Hide resolved
* Make ParsedLogicalPlan.resovled final. * Add docs to CreateTable to be clear that it is metadata-only.
* Move CreateTable and CreateTableAsSelect tests to catalyst * Add PlanResolutionSuite to test parsing and resolution
65c5cd5
to
5054334
Compare
@mccheah, @cloud-fan, I've rebased on master, fixed a minor conflict, and added the |
This comment has been minimized.
This comment has been minimized.
5054334
to
6c9b9dc
Compare
Test build #103831 has finished for PR 24029 at commit
|
thanks, merging to master! |
Thanks @cloud-fan! And thanks to all the reviewers also! |
This moves parsing `CREATE TABLE ... USING` statements into catalyst. Catalyst produces logical plans with the parsed information and those plans are converted to v1 `DataSource` plans in `DataSourceAnalysis`. This prepares for adding v2 create plans that should receive the information parsed from SQL without being translated to v1 plans first. This also makes it possible to parse in catalyst instead of breaking the parser across the abstract `AstBuilder` in catalyst and `SparkSqlParser` in core. For more information, see the [mailing list thread](https://lists.apache.org/thread.html/54f4e1929ceb9a2b0cac7cb058000feb8de5d6c667b2e0950804c613%3Cdev.spark.apache.org%3E). This uses existing tests to catch regressions. This introduces no behavior changes. Closes apache#24029 from rdblue/SPARK-27108-add-parsed-create-logical-plans. Authored-by: Ryan Blue <blue@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This moves parsing logic for `ALTER TABLE` into Catalyst and adds parsed logical plans for alter table changes that use multi-part identifiers. This PR is similar to SPARK-27108, PR apache#24029, that created parsed logical plans for create and CTAS. * Create parsed logical plans * Move parsing logic into Catalyst's AstBuilder * Convert to DataSource plans in DataSourceResolution * Parse `ALTER TABLE ... SET LOCATION ...` separately from the partition variant * Parse `ALTER TABLE ... ALTER COLUMN ... [TYPE dataType] [COMMENT comment]` [as discussed on the dev list](http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Syntax-for-table-DDL-td25197.html#a25270) * Parse `ALTER TABLE ... RENAME COLUMN ... TO ...` * Parse `ALTER TABLE ... DROP COLUMNS ...` * Added new tests in Catalyst's `DDLParserSuite` * Moved converted plan tests from SQL `DDLParserSuite` to `PlanResolutionSuite` * Existing tests for regressions Closes apache#24723 from rdblue/SPARK-27857-add-alter-table-statements-in-catalyst. Authored-by: Ryan Blue <blue@apache.org> Signed-off-by: gatorsmile <gatorsmile@gmail.com>
## What changes were proposed in this pull request? This moves parsing logic for `ALTER TABLE` into Catalyst and adds parsed logical plans for alter table changes that use multi-part identifiers. This PR is similar to SPARK-27108, PR apache#24029, that created parsed logical plans for create and CTAS. * Create parsed logical plans * Move parsing logic into Catalyst's AstBuilder * Convert to DataSource plans in DataSourceResolution * Parse `ALTER TABLE ... SET LOCATION ...` separately from the partition variant * Parse `ALTER TABLE ... ALTER COLUMN ... [TYPE dataType] [COMMENT comment]` [as discussed on the dev list](http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Syntax-for-table-DDL-td25197.html#a25270) * Parse `ALTER TABLE ... RENAME COLUMN ... TO ...` * Parse `ALTER TABLE ... DROP COLUMNS ...` ## How was this patch tested? * Added new tests in Catalyst's `DDLParserSuite` * Moved converted plan tests from SQL `DDLParserSuite` to `PlanResolutionSuite` * Existing tests for regressions Closes apache#24723 from rdblue/SPARK-27857-add-alter-table-statements-in-catalyst. Authored-by: Ryan Blue <blue@apache.org> Signed-off-by: gatorsmile <gatorsmile@gmail.com>
What changes were proposed in this pull request?
This moves parsing
CREATE TABLE ... USING
statements into catalyst. Catalyst produces logical plans with the parsed information and those plans are converted to v1DataSource
plans inDataSourceAnalysis
.This prepares for adding v2 create plans that should receive the information parsed from SQL without being translated to v1 plans first.
This also makes it possible to parse in catalyst instead of breaking the parser across the abstract
AstBuilder
in catalyst andSparkSqlParser
in core.For more information, see the mailing list thread.
How was this patch tested?
This uses existing tests to catch regressions. This introduces no behavior changes.