Skip to content

Commit

Permalink
Except Logical Operator
Browse files Browse the repository at this point in the history
  • Loading branch information
jaceklaskowski committed Feb 17, 2024
1 parent e0c7e55 commit d701292
Showing 1 changed file with 37 additions and 48 deletions.
85 changes: 37 additions & 48 deletions docs/logical-operators/Except.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,61 +4,48 @@ title: Except

# Except Logical Operator

`Except` is a spark-sql-LogicalPlan.md#BinaryNode[binary logical operator] that represents the following high-level operators in a logical plan:
`Except` is a `SetOperation` binary logical operator that represents the following high-level operators (in a logical plan):

* `EXCEPT [ DISTINCT | ALL ]` and `MINUS [ DISTINCT | ALL ]` SQL statements (cf. sql/AstBuilder.md#visitSetOperation[AstBuilder])
* `EXCEPT [ DISTINCT | ALL ]` and `MINUS [ DISTINCT | ALL ]` SQL statements (cf. [AstBuilder](../sql/AstBuilder.md#visitSetOperation))
* [Dataset.except](../Dataset.md#except) and [Dataset.exceptAll](../Dataset.md#exceptAll)

* spark-sql-dataset-operators.md#except[Dataset.except] and spark-sql-dataset-operators.md#exceptAll[Dataset.exceptAll]
## Creating Instance

`Except` is supposed to be resolved (_optimized_) to <<logical-conversions, other logical commands>> at logical optimization phase (i.e. `Except` should not be part of a logical plan after logical optimization). [BasicOperators](../execution-planning-strategies/BasicOperators.md) execution planning strategy throws an `IllegalStateException` if conversions did not happen.

[[logical-conversions]]
.Except's Logical Resolutions (Conversions)
[cols="30,70",options="header",width="100%"]
|===
| Target Logical Operators
| Optimization Rules and Demos

| Left-Anti Join.md[Join]
| `Except` (DISTINCT) in ReplaceExceptWithAntiJoin.md[ReplaceExceptWithAntiJoin] logical optimization rule

Consult <<demo-left-anti-join, Demo: Except Operator Replaced with Left-Anti Join>>

| `Filter`
| `Except` (DISTINCT) in [ReplaceExceptWithFilter](../logical-optimizations/ReplaceExceptWithFilter.md) logical optimization rule

Consult <<demo-except-filter, Demo: Except Operator Replaced with Filter Operator>>
`Except` takes the following to be created:

| `Union`, Aggregate.md[Aggregate] and Generate.md[Generate]
| `Except` (ALL) in RewriteExceptAll.md[RewriteExceptAll] logical optimization rule
* <span id="left"> Left [logical operator](LogicalPlan.md)
* <span id="right"> Right [logical operator](LogicalPlan.md)
* <span id="isAll"> `isAll` flag for `DISTINCT` (`false`) or `ALL` (`true`)

Consult <<demo-except-all, Demo: Except (All) Operator Replaced with Union, Aggregate and Generate Operators>>
`Except` is created when:

|===
* `AstBuilder` is requested to [visit a SetOperation](../sql/AstBuilder.md#visitSetOperation) (`EXCEPT` and `MINUS` operators)
* [Dataset.except](../Dataset.md#except) and [Dataset.exceptAll](../Dataset.md#exceptAll) operators are used
* Catalyst DSL's [except](../catalyst-dsl/DslLogicalPlan.md#except) operator is used

The types of the <<left, left>> and <<right, right>> logical (sub)operators can be widen in `WidenSetOperationTypes` logical analysis type-coercion rule.
## Logical Optimization

=== [[creating-instance]] Creating Except Instance
`Except` is supposed to be resolved (_optimized_) to other logical commands at logical optimization phase (i.e. `Except` should not be part of a logical plan after logical optimization).

`Except` takes the following to be created:
[BasicOperators](../execution-planning-strategies/BasicOperators.md) execution planning strategy throws an `IllegalStateException` if conversions did not happen.

* [[left]] Left spark-sql-LogicalPlan.md[logical operator]
* [[right]] Right spark-sql-LogicalPlan.md[logical operator]
* [[isAll]] `isAll` flag for `DISTINCT` (`false`) or `ALL` (`true`)
Target Logical Operators | Optimization Rules and Demos
-------------------------|-----------------------------
Left-Anti [Join](Join.md) | `Except` (DISTINCT) in [ReplaceExceptWithAntiJoin](../logical-optimizations/ReplaceExceptWithAntiJoin.md) logical optimization rule<p>Demo: [Except Operator Replaced with Left-Anti Join](#demo-left-anti-join)
`Filter` | `Except` (DISTINCT) in [ReplaceExceptWithFilter](../logical-optimizations/ReplaceExceptWithFilter.md) logical optimization rule<p>Demo: [Except Operator Replaced with Filter Operator](#demo-except-filter)
`Union`, [Aggregate](Aggregate.md) and [Generate](Generate.md) | `Except` (ALL) in [RewriteExceptAll](../logical-optimizations/RewriteExceptAll.md) logical optimization rule<p>Demo: [Except (All) Operator Replaced with Union, Aggregate and Generate Operators](#demo-except-all)

=== [[catalyst-dsl]] Catalyst DSL -- `except` Operator
## Catalyst DSL

[source, scala]
----
```scala
except(
otherPlan: LogicalPlan,
isAll: Boolean): LogicalPlan
----
```

[Catalyst DSL](../catalyst-dsl/index.md) defines [except](../catalyst-dsl/index.md#except) extension method to create an `Except` logical operator, e.g. for testing or Spark SQL internals exploration.
[Catalyst DSL](../catalyst-dsl/index.md) defines [except](../catalyst-dsl/index.md#except) extension method to create an `Except` logical operator (e.g. for testing or Spark SQL internals exploration).

[source, plaintext]
----
```text
import org.apache.spark.sql.catalyst.dsl.plans._
val plan = table("a").except(table("b"), isAll = false)
scala> println(plan.numberedTreeString)
Expand All @@ -69,13 +56,13 @@ scala> println(plan.numberedTreeString)
import org.apache.spark.sql.catalyst.plans.logical.Except
val op = plan.p(0)
assert(op.isInstanceOf[Except])
----
```

=== [[CheckAnalysis]] Except Only on Relations with Same Number of Columns
## Except Only on Relations with Same Number of Columns { #CheckAnalysis }

`Except` logical operator can only be performed on CheckAnalysis.md#checkAnalysis[tables with the same number of columns].
`Except` logical operator can only be performed on [tables with the same number of columns](../CheckAnalysis.md#checkAnalysis).

```
```text
scala> left.except(right)
org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the same number of columns, but the first table has 3 columns and the second table has 4 columns;;
'Except false
Expand All @@ -91,9 +78,11 @@ org.apache.spark.sql.AnalysisException: Except can only be performed on tables w
...
```

=== [[demo-left-anti-join]] Demo: Except Operator Replaced with Left-Anti Join
## Demo

```
### Except Operator Replaced with Left-Anti Join { #demo-left-anti-join }

```text
Seq((0, "zero", "000"), (1, "one", "111"))
.toDF("id", "name", "triple")
.write
Expand Down Expand Up @@ -121,9 +110,9 @@ scala> println(q.queryExecution.optimizedPlan.numberedTreeString)
03 +- Relation[id#209,name#210,triple#211] parquet
```

=== [[demo-except-filter]] Demo: Except Operator Replaced with Filter Operator
### Except Operator Replaced with Filter Operator { #demo-except-filter }

```
```text
Seq((0, "zero", "000"), (1, "one", "111"))
.toDF("id", "name", "triple")
.write
Expand All @@ -142,9 +131,9 @@ scala> println(q.queryExecution.optimizedPlan.numberedTreeString)
02 +- Relation[id#16,name#17,triple#18] parquet
```

=== [[demo-except-all]] Demo: Except (All) Operator Replaced with Union, Aggregate and Generate Operators
### Except (All) Operator Replaced with Union, Aggregate and Generate Operators { #demo-except-all }

```
```text
Seq((0, "zero", "000"), (1, "one", "111"))
.toDF("id", "name", "triple")
.write
Expand Down

0 comments on commit d701292

Please sign in to comment.