[SPARK-27108][SQL] Add parsed SQL plans for create, CTAS. #24029

rdblue · 2019-03-08T20:34:07Z

What changes were proposed in this pull request?

This moves parsing CREATE TABLE ... USING statements into catalyst. Catalyst produces logical plans with the parsed information and those plans are converted to v1 DataSource plans in DataSourceAnalysis.

This prepares for adding v2 create plans that should receive the information parsed from SQL without being translated to v1 plans first.

This also makes it possible to parse in catalyst instead of breaking the parser across the abstract AstBuilder in catalyst and SparkSqlParser in core.

For more information, see the mailing list thread.

How was this patch tested?

This uses existing tests to catch regressions. This introduces no behavior changes.

rdblue · 2019-03-08T20:49:43Z

@cloud-fan, this is needed to add the v2 create and CTAS plans. We can get a start while waiting for the catalog identifiers to be committed.

rdblue · 2019-03-08T20:50:11Z

cc @jzhuge, @mccheah, @gatorsmile

...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/ParsedLogicalPlan.scala

rxin · 2019-03-11T23:15:45Z

...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/ParsedLogicalPlan.scala

+ * Parsed logical plans are located in Catalyst so that as much SQL parsing logic as possible is be
+ * kept in a [[org.apache.spark.sql.catalyst.parser.AbstractSqlParser]].
+ */
+private[sql] abstract class ParsedLogicalPlan extends LogicalPlan {


not sure if this is useful hierarchy, but if yes we should document more clearly this should not survive analysis.

might be useful to add a special check for this, rather than relying on resolved only.

+1 - @rdblue these should only be inputs to the analyzer, not outputs. Would be helpful to write specific JavaDoc on this.

That is included above: "Parsed logical plans are not resolved because they must be converted to concrete logical plans."

Do you think that should be rephrased to be more clear?

Defer to @rxin but I'm ok with merging with the current docs. We can rephrase in a follow-up if our contributors have trouble with this wording.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/CreateTable.scala

cloud-fan · 2019-03-12T03:46:00Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

-   * }}}
+   * TODO: Remove this. It is used because CreateTempViewUsing is not a Catalyst plan.
+   * Either move CreateTempViewUsing into catalyst as a parsed logical plan, or remove it because
+   * it is deprecated.


just to double check, when did we deprecate CreateTempViewUsing?

Looks like it happened in 5effc01, 3 years ago.

We can always create a parsed plan for CreateTempViewUsing so that it can move to catalyst as well, but I thought that we can do it later, and only need to if this isn't going to be removed in 3.0.

ah, so we deprecated CREATE TABLE USING, but not CREATE TEMP VIEW USING

cloud-fan · 2019-03-12T03:47:00Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala

@@ -1860,4 +1861,193 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
    val structField = StructField(identifier.getText, typedVisit(dataType), nullable = true)
    if (STRING == null) structField else structField.withComment(string(STRING))
  }
+
+  /**


I didn't review this file carefully, assuming it's just copy-paste code from SparkSqlAstBuilder

Yes, this is moving what is needed from SparkSqlAstBuilder. The only real changes are in the visitCreateTable method.

These rules should have already been in the abstract class. I'm not sure why they were in SparkSqlAstBuilder other than that was the easiest place to put them when they were added.

cloud-fan · 2019-03-12T03:55:15Z

...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/ParsedLogicalPlan.scala

+ * converting that v1 metadata to the v2 equivalent, the sql [[CreateTable]] plan is produced by
+ * the parser and converted once into both implementations.
+ *
+ * Parsed logical plans are not resolved because they must be converted to concrete logical plans.


is it really necessary to have this parent class just to set the resolved bit? I think we can just put override lazy val resolved = false in the new CreateTable and CreateTableAsSelect classes, with classdoc saying that these 2 classes will be replaced by what concrete plans during analysis.

The value of this class is that it identifies the set of logical plans that correspond directly to what was parsed from SQL. When someone working on a plan sees ParsedLogicalPlan as an ancestor in Scaladoc, it signals what is explained here: that parser produces ParsedLogicalPlan nodes without translating what was parsed, then those plans get translated into real plans in the analyzer.

With that information, it is easy to see what changes need to be made. If the parsed plan doesn't include an option, then the parser and parsed plan needs to be updated. If it does include an option, then the analyzer and downstream plans need to be updated.

Also keep in mind that this is the first two subclasses of ParsedLogicalPlan. To implement v2 along-side v1, we are going to be adding more of them. So it is valuable that we don't need to remember to set resolved to false in every plan.

rdblue · 2019-03-13T22:59:27Z

@cloud-fan, I've responded to the review comments and implemented fixes.

Also, I moved the new resolution rules into DataSourceResolution to fix the test failures. The new rules needed to be added to extendedResolutionRules instead of postHocResolutionRules because the post-hoc rules are only run once. This resolution needs to be done before the post-hoc rules defined in DataSourceAnalysis run.

At a minimum, these needed to be separated into different classes, but I think it is also more correct for resolution rules to run in the resolution batch so that other resolution rules can run on the plans produced by these rules. This is the same reason why we added the ResolveOutputRelation rule to the resolution batch.

I think that the resolution rules in DataSourceAnalysis should also move into the resolution batch, but that should be done as a separate change.

rdblue · 2019-03-16T23:46:06Z

Retest this please.

SparkQA · 2019-03-17T01:54:31Z

Test build #103578 has finished for PR 24029 at commit 651e5b5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-03-17T03:27:18Z

Test build #103579 has finished for PR 24029 at commit 7beff42.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-03-18T10:45:30Z

retest this please

cloud-fan · 2019-03-18T10:46:21Z

LGTM if tests pass

cloud-fan · 2019-03-19T06:45:01Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceResolution.scala

+import org.apache.spark.sql.sources.v2.TableProvider
+import org.apache.spark.sql.types.StructType
+
+case class DataSourceResolution(conf: SQLConf) extends Rule[LogicalPlan] with CastSupport  {


shall we call it DDLResolution? It's not very related to data source.

This resolves parsed plans to execution.datasources plans. This isn't just for DDL. We are starting with CreateTable and CreateTableAsSelect, but there will be more parsed plans that get converted to datasource plans in this rule. That's why I think DataSourceResolution is an appropriate name.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/CreateTable.scala

* Make ParsedLogicalPlan.resovled final. * Add docs to CreateTable to be clear that it is metadata-only.

* Move CreateTable and CreateTableAsSelect tests to catalyst * Add PlanResolutionSuite to test parsing and resolution

rdblue · 2019-03-22T16:15:58Z

@mccheah, @cloud-fan, I've rebased on master, fixed a minor conflict, and added the Statement suffix to the new plans. I think this should be good to go once tests pass.

SparkQA · 2019-03-22T20:50:21Z

Test build #103831 has finished for PR 24029 at commit 6c9b9dc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-03-22T20:59:46Z

thanks, merging to master!

rdblue · 2019-03-22T21:33:56Z

Thanks @cloud-fan! And thanks to all the reviewers also!

This moves parsing `CREATE TABLE ... USING` statements into catalyst. Catalyst produces logical plans with the parsed information and those plans are converted to v1 `DataSource` plans in `DataSourceAnalysis`. This prepares for adding v2 create plans that should receive the information parsed from SQL without being translated to v1 plans first. This also makes it possible to parse in catalyst instead of breaking the parser across the abstract `AstBuilder` in catalyst and `SparkSqlParser` in core. For more information, see the [mailing list thread](https://lists.apache.org/thread.html/54f4e1929ceb9a2b0cac7cb058000feb8de5d6c667b2e0950804c613%3Cdev.spark.apache.org%3E). This uses existing tests to catch regressions. This introduces no behavior changes. Closes apache#24029 from rdblue/SPARK-27108-add-parsed-create-logical-plans. Authored-by: Ryan Blue <blue@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

This moves parsing logic for `ALTER TABLE` into Catalyst and adds parsed logical plans for alter table changes that use multi-part identifiers. This PR is similar to SPARK-27108, PR apache#24029, that created parsed logical plans for create and CTAS. * Create parsed logical plans * Move parsing logic into Catalyst's AstBuilder * Convert to DataSource plans in DataSourceResolution * Parse `ALTER TABLE ... SET LOCATION ...` separately from the partition variant * Parse `ALTER TABLE ... ALTER COLUMN ... [TYPE dataType] [COMMENT comment]` [as discussed on the dev list](http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Syntax-for-table-DDL-td25197.html#a25270) * Parse `ALTER TABLE ... RENAME COLUMN ... TO ...` * Parse `ALTER TABLE ... DROP COLUMNS ...` * Added new tests in Catalyst's `DDLParserSuite` * Moved converted plan tests from SQL `DDLParserSuite` to `PlanResolutionSuite` * Existing tests for regressions Closes apache#24723 from rdblue/SPARK-27857-add-alter-table-statements-in-catalyst. Authored-by: Ryan Blue <blue@apache.org> Signed-off-by: gatorsmile <gatorsmile@gmail.com>

## What changes were proposed in this pull request? This moves parsing logic for `ALTER TABLE` into Catalyst and adds parsed logical plans for alter table changes that use multi-part identifiers. This PR is similar to SPARK-27108, PR apache#24029, that created parsed logical plans for create and CTAS. * Create parsed logical plans * Move parsing logic into Catalyst's AstBuilder * Convert to DataSource plans in DataSourceResolution * Parse `ALTER TABLE ... SET LOCATION ...` separately from the partition variant * Parse `ALTER TABLE ... ALTER COLUMN ... [TYPE dataType] [COMMENT comment]` [as discussed on the dev list](http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Syntax-for-table-DDL-td25197.html#a25270) * Parse `ALTER TABLE ... RENAME COLUMN ... TO ...` * Parse `ALTER TABLE ... DROP COLUMNS ...` ## How was this patch tested? * Added new tests in Catalyst's `DDLParserSuite` * Moved converted plan tests from SQL `DDLParserSuite` to `PlanResolutionSuite` * Existing tests for regressions Closes apache#24723 from rdblue/SPARK-27857-add-alter-table-statements-in-catalyst. Authored-by: Ryan Blue <blue@apache.org> Signed-off-by: gatorsmile <gatorsmile@gmail.com>

This comment has been minimized.

Sign in to view

rdblue force-pushed the SPARK-27108-add-parsed-create-logical-plans branch from 3a77141 to 9bb101f Compare March 8, 2019 20:51

This comment has been minimized.

Sign in to view

rdblue force-pushed the SPARK-27108-add-parsed-create-logical-plans branch from 9bb101f to 9e0913f Compare March 9, 2019 00:00

This comment has been minimized.

Sign in to view

rxin reviewed Mar 11, 2019

View reviewed changes

...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/ParsedLogicalPlan.scala Outdated Show resolved Hide resolved

rxin reviewed Mar 11, 2019

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/CreateTable.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Mar 12, 2019

View reviewed changes

This comment has been minimized.

Sign in to view

rdblue mentioned this pull request Mar 17, 2019

[SPARK-27181][SQL]: Add public transform API #24117

Closed

HyukjinKwon changed the title ~~SPARK-27108: Add parsed SQL plans for create, CTAS.~~ [SPARK-27108][SQL] Add parsed SQL plans for create, CTAS. Mar 18, 2019

This comment has been minimized.

Sign in to view

cloud-fan reviewed Mar 19, 2019

View reviewed changes

mccheah reviewed Mar 21, 2019

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/CreateTable.scala Outdated Show resolved Hide resolved

rdblue added 11 commits March 22, 2019 09:05

SPARK-27108: Add parsed SQL plans for create, CTAS.

aedce48

Update for review comments.

4a89dd7

* Make ParsedLogicalPlan.resovled final. * Add docs to CreateTable to be clear that it is metadata-only.

Move new DataSource resolution rules into DataSourceResolution.

69f9a29

Fix CreateTableAsSelect conversion to CatalogTable.

e676095

Apply the v1 source writer list.

32551be

Fix V1WriteProvider extraction for Hive sources.

64c1f01

Fix v1 override check to handle case sensitivity.

f3fab1c

Fix DDLParserSuite.

62ccd4d

* Move CreateTable and CreateTableAsSelect tests to catalyst * Add PlanResolutionSuite to test parsing and resolution

Fix password redaction tests.

e3444d8

Move another test into catalyst DDLParserSuite.

0594a82

Fix redaction matcher for type erasure.

e0eebd0

rdblue force-pushed the SPARK-27108-add-parsed-create-logical-plans branch from 65c5cd5 to 5054334 Compare March 22, 2019 16:13

This comment has been minimized.

Sign in to view

mccheah approved these changes Mar 22, 2019

View reviewed changes

Add Statement suffix to new parsed plans.

6c9b9dc

rdblue force-pushed the SPARK-27108-add-parsed-create-logical-plans branch from 5054334 to 6c9b9dc Compare March 22, 2019 16:20

cloud-fan approved these changes Mar 22, 2019

View reviewed changes

cloud-fan closed this in 34e3cc7 Mar 22, 2019

This was referenced May 27, 2019

[SPARK-27813][SQL] DataSourceV2: Add DropTable logical operation #24686

Closed

[SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst #24723

Closed

[SPARK-27108][SQL] Add parsed SQL plans for create, CTAS. #24029

[SPARK-27108][SQL] Add parsed SQL plans for create, CTAS. #24029

Conversation

rdblue commented Mar 8, 2019

What changes were proposed in this pull request?

How was this patch tested?

This comment has been minimized.

rdblue commented Mar 8, 2019

rdblue commented Mar 8, 2019

This comment has been minimized.

This comment has been minimized.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdblue Mar 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdblue commented Mar 13, 2019 • edited Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

rdblue commented Mar 16, 2019

SparkQA commented Mar 17, 2019

SparkQA commented Mar 17, 2019

cloud-fan commented Mar 18, 2019

cloud-fan commented Mar 18, 2019

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdblue commented Mar 22, 2019

This comment has been minimized.

SparkQA commented Mar 22, 2019

cloud-fan commented Mar 22, 2019

rdblue commented Mar 22, 2019

rdblue Mar 21, 2019 •

edited

Loading

rdblue commented Mar 13, 2019 •

edited

Loading