Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources #41448

Closed
wants to merge 2 commits into from

Conversation

aokolnychyi
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds RewriteMergeIntoTable, similar to RewriteUpdateTable and RewriteDeleteFromTable, to handle MERGE commands for delta-based sources. Support for group-based sources will come in a follow-up PR.

Implementation notes:

  • RewriteMergeIntoTable is an analyzer rule that acts similar to existing rules for deletes and updates.
  • MergeRows is a new logical node that holds a set of instructions to apply on a joined target and source datasets.
    • Instruction is a parent trait for all instructions.
    • Keep means a joined row is part of the result of this MERGE operation.
    • Split means a joined row is part of the result but must be split into two rows (update into delete and insert).
  • MergeRowsExec is a new physical node that is responsible for merging rows and producing a delta of rows. It also performs the MERGE cardinality check if needed.
  • NO_BROADCAST_AND_REPLICATION is a new internal join hint to prohibit broadcasting and replicating the target table to perform the cardinality check in MERGE operations, as required by the SQL standard.

Why are the changes needed?

These changes are needed per SPIP SPARK-35801.

Does this PR introduce any user-facing change?

This PR adds a new SQL config to enable/disable MERGE cardinality check required by the SQL standard.

How was this patch tested?

This PR comes with tests. There are more tests in AlignMergeAssignmentsSuite, which was merged earlier.

"The ON search condition of the MERGE statement matched a single row from the target table with multiple rows of the source table.",
"This could result in the target row being operated on more than once with an update or delete operation and is not allowed."
],
"sqlState" : "23000"
Copy link
Contributor Author

@aokolnychyi aokolnychyi Jun 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used 23 class as it is constraint violation but wasn't sure about the subclass. It is not defined in the SQL standard so I used 000, meaning no subclass. I am not sure how Spark assigns subclasses in these cases.

Here is an example of this error in SAP docs:
https://dcx.sap.com/sqla170/en/html/80ca9fd06ce21014bc30ac05c444ee4d.html

Here is the original JIRA for this error in Hive:
https://issues.apache.org/jira/browse/HIVE-14949

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, it sounds tricky. As you know, currently we use 23505 for two error cases, DUPLICATED_MAP_KEY and DUPLICATE_KEY. Just a question, if there is no other reference, do you want to use 23509 like SAP instead?

Copy link
Contributor Author

@aokolnychyi aokolnychyi Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, the SQL standard reserves only a few subclasses and all 5XX subclasses are custom so we are probably free to pick either one. In that case, let's use 23509 like in SAP. I'll also check more systems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed we have README.md and 23509 is mentioned as a different error from DB2 .

|23509 |23 |Constraint Violation |509 |The owner of the package has constrained its use to environments which do not include that of the application process.|DB2 |N |DB2 |

Copy link
Member

@dongjoon-hyun dongjoon-hyun Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we cannot choose 23509 because of the conflicts between DB2 and SAP, let's add a new one as Spark's new error code. According to the above README.md, we can claim 'K**' subclass range.

So, let's use 23K01 here and add a new line to README.md like the following.

|23K01 |23 |Merge Cardinality Violation |K01 |your full description   |Spark   |N   |Spark     

Copy link
Contributor Author

@aokolnychyi aokolnychyi Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my thinking as well. Let's use 23K01. I did not find a sqlstate we can borrow from other systems.

private final val ROW_FROM_TARGET = "__row_from_target"

override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
case m @ MergeIntoTable(aliasedTable, source, cond, matchedActions, notMatchedActions,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a special case when there is only one NOT MATCHED action.

m
}

case m @ MergeIntoTable(aliasedTable, source, cond, matchedActions, notMatchedActions,
Copy link
Contributor Author

@aokolnychyi aokolnychyi Jun 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a special case when there are only NOT MATCHED actions (having just 1 such action is handled above).

}

// build a rewrite plan for sources that support row deltas
private def buildWriteDeltaPlan(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to buildWriteDeltaPlan in RewriteUpdateTable.

(readRelation, cond)
}

val checkCardinality = shouldCheckCardinality(matchedActions)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More about the cardinality check in MergeRowsExec.

@@ -167,4 +183,36 @@ trait RewriteRowLevelCommand extends Rule[LogicalPlan] {
private def findColOrdinal(plan: LogicalPlan, name: String): Int = {
plan.output.indexWhere(attr => conf.resolver(attr.name, name))
}

protected def buildOriginalRowIdValues(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from RewriteUpdateTable to reuse in both rules.

instructions: Seq[InstructionExec]): InternalRow = {

for (instruction <- instructions) {
if (instruction.condition.eval(row)) {
Copy link
Contributor Author

@aokolnychyi aokolnychyi Jun 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not using find to avoid extra calls as it is called for every row. Using a simple for loop instead.
Not using Option for the same reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding https://github.com/databricks/scala-style-guide#traversal-and-zipwithindex

while is preferred, I'm not sure if it still suitable for the current Scala compiler

Copy link
Contributor Author

@aokolnychyi aokolnychyi Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me dig into the generated bytecode while adding a benchmark as part of SPARK-44013.
I'll cc you on that PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks, @aokolnychyi

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -92,6 +93,67 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSparkPlan
operators.head
}

test("NO_BROADCAST_AND_REPLICATION hint is respected in cross joins") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 3 tests cover scenarios when it is not safe to broadcast or replicate the target table to perform the cardinality check. The newly added internal hint handles this. There are MERGE tests for this too.

@aokolnychyi
Copy link
Contributor Author

The test failures don't seem related. I'll need to take a closer look at what happened in sql - other tests, though.

@aokolnychyi aokolnychyi force-pushed the spark-43885 branch 2 times, most recently from 51b23cc to 1a2211d Compare June 5, 2023 23:53
@aokolnychyi
Copy link
Contributor Author

The test job seems to just hang, I've also seen the same in other PRs. No relevant test failures.

2023-06-06T08:21:31.5118813Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[34mTest run �[0mtest.org.apache.spark.sql.�[33mJavaRowSuite�[0m�[34m finished: �[0m�[34m0 failed�[0m�[34m, �[0m�[34m0 ignored�[0m�[34m, 2 total, 0.002s�[0m�[0m
2023-06-06T12:05:34.0724172Z 
2023-06-06T12:05:36.0713172Z Session terminated, killing shell... ...killed.
2023-06-06T12:05:36.1056661Z ##[error]The operation was canceled.

This change is ready for a detailed review round, @dongjoon-hyun @viirya @huaxingao @cloud-fan @sunchao.

@@ -4280,6 +4280,20 @@ object SQLConf {
.checkValue(_ >= 0, "The threshold of cached local relations must not be negative")
.createWithDefault(64 * 1024 * 1024)

val MERGE_CARDINALITY_CHECK_ENABLED =
buildConf("spark.sql.merge.cardinalityCheck.enabled")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, if there is no other confs in this namespace, spark.sql.merge.*, we had better avoid introducing new namespace like the following.

buildConf("spark.sql.merge.cardinalityCheck.enabled")
buildConf("spark.sql.mergeCardinalityCheck.enabled")

"operations to ensure the integrity and accuracy of data. The ON search condition in " +
"MERGE must match a single row from the target table with at most one incoming row. " +
"If this assumption is violated, Spark must throw an exception. Otherwise, this may " +
"lead to data corruption as Spark could operate on the same target row more than once. " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a correctness issue, I'd not add this config.

"lead to data corruption as Spark could operate on the same target row more than once. " +
"The cardinality check can be disabled to avoid the computational overhead, " +
"but doing so is highly discouraged and can corrupt the underlying table. In most cases, " +
"the overhead should be negligible.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this analysis, could you remove this config fro this PR (if you don't mind)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following what Hive did as it offers the hive.merge.cardinality.check flag. The way we do the cardinality check in Spark should be cheaper than in Hive, so it is less critical. It seems Snowflake also has a similar parameter that can be disabled.

That said, I am OK removing it for now to be safe and consider adding in the future, if necessary. I'll make the change.

* by some rules where broadcasting or replicating a particular side of the join is not permitted,
* such as the MERGE cardinality check.
*/
case object NO_BROADCAST_AND_REPLICATION extends JoinStrategyHint {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we spin off this NO_BROADCAST_AND_REPLICATION as an independent PR? The feature itself looks like standalone although this PR uses this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me do that. I included it initially here to show how it is being used. Will create a separate PR in a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created PR #41499.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finished my first round reviews. Mostly, looks reasonable. Here are my two main comments.

  • It would be better avoid correctness issue by not providing the new configuration.
  • It would be great if we can spin-off new HINT PR from this.

dongjoon-hyun pushed a commit that referenced this pull request Jun 8, 2023
…ne side of join

### What changes were proposed in this pull request?

This PR adds a new internal join hint to disable broadcasting and replicating one side of join.

### Why are the changes needed?

These changes are needed to disable broadcasting and replicating one side of join when it is not permitted, such as the cardinality check in MERGE operations in PR #41448.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This PR comes with tests. More tests are in #41448.

Closes #41499 from aokolnychyi/spark-44000.

Authored-by: aokolnychyi <aokolnychyi@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun
Copy link
Member

Hi, @aokolnychyi . #41499 is merged. Could you rebase this PR to master branch?

"The ON search condition of the MERGE statement matched a single row from the target table with multiple rows of the source table.",
"This could result in the target row being operated on more than once with an update or delete operation and is not allowed."
],
"sqlState" : "23K01"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updates. We can update README.txt later. (#41448 (comment))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM from my side.

@dongjoon-hyun
Copy link
Member

Also, cc @cloud-fan, @viirya, @huaxingao, @sunchao once more.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 8, 2023

Could you fix this test failure by adding new error code to README.md, @aokolnychyi ?

[info] SparkThrowableSuite:
[info] - No duplicate error classes (30 milliseconds)
[info] - Error classes are correctly formatted (29 milliseconds)
[info] - SQLSTATE invariants *** FAILED *** (25 milliseconds)

@github-actions github-actions bot added the DOCS label Jun 9, 2023
@aokolnychyi
Copy link
Contributor Author

Fixed, tested SparkThrowableSuite locally.

@aokolnychyi
Copy link
Contributor Author

Also created SPARK-44013 to add a benchmark. Will be used to measure the impact of adding codegen later.

@aokolnychyi
Copy link
Contributor Author

Tests failed because of sql - other tests:

2023-06-09T08:18:09.3443155Z Session terminated, killing shell...
2023-06-09T08:18:09.5476503Z ##[error]The operation was canceled.

Triggered again.

czxm pushed a commit to czxm/spark that referenced this pull request Jun 12, 2023
…ne side of join

### What changes were proposed in this pull request?

This PR adds a new internal join hint to disable broadcasting and replicating one side of join.

### Why are the changes needed?

These changes are needed to disable broadcasting and replicating one side of join when it is not permitted, such as the cardinality check in MERGE operations in PR apache#41448.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This PR comes with tests. More tests are in apache#41448.

Closes apache#41499 from aokolnychyi/spark-44000.

Authored-by: aokolnychyi <aokolnychyi@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM again with the updated README.md.
In the master branch, we are trying to stabilize sql - others pipeline (933dfd9), it's still flaky.

Thank you for triggering multiple time and your patience. I'll merge this for Apache Spark 3.5.0.

@aokolnychyi
Copy link
Contributor Author

Thanks for reviewing, @dongjoon-hyun @pan3793!

dongjoon-hyun pushed a commit that referenced this pull request Jun 14, 2023
…ed sources

### What changes were proposed in this pull request?

This PR adds support for group-based data sources in `RewriteMergeIntoTable`. This PR builds on top of PR #41448 and earlier PRs that added `RewriteDeleteFromTable`.

### Why are the changes needed?

These changes are needed per SPIP SPARK-35801.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This PR comes with tests. There are more tests in `AlignMergeAssignmentsSuite`, which was merged earlier.

Closes #41577 from aokolnychyi/spark-43963.

Authored-by: aokolnychyi <aokolnychyi@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
import org.apache.spark.sql.catalyst.util.truncatedString
import org.apache.spark.sql.types.DataType

case class MergeRows(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a classdoc to explain what this new logical plan does? The name MergeRows seems a bit confusing as it does not merge anything. Looking at the physical implementation, it turns one input row into one or two output rows if the condition is met.

Copy link
Contributor Author

@aokolnychyi aokolnychyi Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me add it. The reason I called it MergeRows because it takes a plan where the target and source rows are joined and then applies MATCHED/NOT MATCHED/NOT MATCHED BY SOURCE to derive an output row. I call the latter merging because the output row is derived using values in target and source relation, so matched rows are kind of merged into one and this node is only used in MERGE commands.

I am open to any other names. Any suggestions, @cloud-fan?

czxm pushed a commit to czxm/spark that referenced this pull request Jun 19, 2023
…ed sources

### What changes were proposed in this pull request?

This PR adds `RewriteMergeIntoTable`, similar to `RewriteUpdateTable` and `RewriteDeleteFromTable`, to handle MERGE commands for delta-based sources. Support for group-based sources will come in a follow-up PR.

Implementation notes:

- `RewriteMergeIntoTable` is an analyzer rule that acts similar to existing rules for deletes and updates.
- `MergeRows` is a new logical node that holds a set of instructions to apply on a joined target and source datasets.
    - `Instruction` is a parent trait for all instructions.
    - `Keep` means a joined row is part of the result of this MERGE operation.
    - `Split` means a joined row is part of the result but must be split into two rows (update into delete and insert).
- `MergeRowsExec` is a new physical node that is responsible for merging rows and producing a delta of rows. It also performs the MERGE cardinality check if needed.
- `NO_BROADCAST_AND_REPLICATION` is a new internal join hint to prohibit broadcasting and replicating the target table to perform the cardinality check in MERGE operations, as required by the SQL standard.

### Why are the changes needed?

These changes are needed per SPIP SPARK-35801.

### Does this PR introduce _any_ user-facing change?

This PR adds a new SQL config to enable/disable MERGE cardinality check required by the SQL standard.

### How was this patch tested?

This PR comes with tests. There are more tests in `AlignMergeAssignmentsSuite`, which was merged earlier.

Closes apache#41448 from aokolnychyi/spark-43885.

Authored-by: aokolnychyi <aokolnychyi@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
czxm pushed a commit to czxm/spark that referenced this pull request Jun 19, 2023
…ed sources

### What changes were proposed in this pull request?

This PR adds support for group-based data sources in `RewriteMergeIntoTable`. This PR builds on top of PR apache#41448 and earlier PRs that added `RewriteDeleteFromTable`.

### Why are the changes needed?

These changes are needed per SPIP SPARK-35801.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This PR comes with tests. There are more tests in `AlignMergeAssignmentsSuite`, which was merged earlier.

Closes apache#41577 from aokolnychyi/spark-43963.

Authored-by: aokolnychyi <aokolnychyi@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
cloud-fan added a commit that referenced this pull request Aug 15, 2023
### What changes were proposed in this pull request?

This is a followup of #41448 . As an optimizer rule, the produced plan should be resolved and resolved expressions should be able to report data type. The `Instruction` expression fails to report data type and may break external optimizer rules. This PR makes it to return dummy NullType.

### Why are the changes needed?

to not break external optimizer rules.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #42482 from cloud-fan/merge.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan added a commit that referenced this pull request Aug 15, 2023
### What changes were proposed in this pull request?

This is a followup of #41448 . As an optimizer rule, the produced plan should be resolved and resolved expressions should be able to report data type. The `Instruction` expression fails to report data type and may break external optimizer rules. This PR makes it to return dummy NullType.

### Why are the changes needed?

to not break external optimizer rules.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #42482 from cloud-fan/merge.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit c9ff702)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
valentinp17 pushed a commit to valentinp17/spark that referenced this pull request Aug 24, 2023
### What changes were proposed in this pull request?

This is a followup of apache#41448 . As an optimizer rule, the produced plan should be resolved and resolved expressions should be able to report data type. The `Instruction` expression fails to report data type and may break external optimizer rules. This PR makes it to return dummy NullType.

### Why are the changes needed?

to not break external optimizer rules.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes apache#42482 from cloud-fan/merge.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
viirya pushed a commit to viirya/spark-1 that referenced this pull request Oct 19, 2023
…ne side of join

### What changes were proposed in this pull request?

This PR adds a new internal join hint to disable broadcasting and replicating one side of join.

### Why are the changes needed?

These changes are needed to disable broadcasting and replicating one side of join when it is not permitted, such as the cardinality check in MERGE operations in PR apache#41448.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This PR comes with tests. More tests are in apache#41448.

Closes apache#41499 from aokolnychyi/spark-44000.

Authored-by: aokolnychyi <aokolnychyi@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit d88633a)
viirya pushed a commit to viirya/spark-1 that referenced this pull request Oct 19, 2023
…ed sources

This PR adds `RewriteMergeIntoTable`, similar to `RewriteUpdateTable` and `RewriteDeleteFromTable`, to handle MERGE commands for delta-based sources. Support for group-based sources will come in a follow-up PR.

Implementation notes:

- `RewriteMergeIntoTable` is an analyzer rule that acts similar to existing rules for deletes and updates.
- `MergeRows` is a new logical node that holds a set of instructions to apply on a joined target and source datasets.
    - `Instruction` is a parent trait for all instructions.
    - `Keep` means a joined row is part of the result of this MERGE operation.
    - `Split` means a joined row is part of the result but must be split into two rows (update into delete and insert).
- `MergeRowsExec` is a new physical node that is responsible for merging rows and producing a delta of rows. It also performs the MERGE cardinality check if needed.
- `NO_BROADCAST_AND_REPLICATION` is a new internal join hint to prohibit broadcasting and replicating the target table to perform the cardinality check in MERGE operations, as required by the SQL standard.

These changes are needed per SPIP SPARK-35801.

This PR adds a new SQL config to enable/disable MERGE cardinality check required by the SQL standard.

This PR comes with tests. There are more tests in `AlignMergeAssignmentsSuite`, which was merged earlier.

Closes apache#41448 from aokolnychyi/spark-43885.

Authored-by: aokolnychyi <aokolnychyi@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit daee47a)
viirya pushed a commit to viirya/spark-1 that referenced this pull request Oct 19, 2023
…ed sources

This PR adds support for group-based data sources in `RewriteMergeIntoTable`. This PR builds on top of PR apache#41448 and earlier PRs that added `RewriteDeleteFromTable`.

These changes are needed per SPIP SPARK-35801.

No.

This PR comes with tests. There are more tests in `AlignMergeAssignmentsSuite`, which was merged earlier.

Closes apache#41577 from aokolnychyi/spark-43963.

Authored-by: aokolnychyi <aokolnychyi@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 305506e)
ragnarok56 pushed a commit to ragnarok56/spark that referenced this pull request Mar 2, 2024
### What changes were proposed in this pull request?

This is a followup of apache#41448 . As an optimizer rule, the produced plan should be resolved and resolved expressions should be able to report data type. The `Instruction` expression fails to report data type and may break external optimizer rules. This PR makes it to return dummy NullType.

### Why are the changes needed?

to not break external optimizer rules.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes apache#42482 from cloud-fan/merge.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants