Skip to content

Commit

Permalink
[SPARK-35349][SQL] Add code-gen for left/right outer sort merge join
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

This PR is to add code-gen support for LEFT OUTER / RIGHT OUTER sort merge join. Currently sort merge join only supports inner join type (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala#L374 ). There's no fundamental reason why we cannot support code-gen for other join types. Here we add code-gen for LEFT OUTER / RIGHT OUTER join. Will submit followup PRs to add LEFT SEMI, LEFT ANTI and FULL OUTER code-gen separately.

The change is to extend current sort merge join logic to work with LEFT OUTER and RIGHT OUTER (should work with LEFT SEMI/ANTI as well, but FULL OUTER join needs some other more code change). Replace left/right with streamed/buffered to make code extendable to other join types besides inner join.

Example query:

```
val df1 = spark.range(10).select($"id".as("k1"), $"id".as("k3"))
val df2 = spark.range(4).select($"id".as("k2"), $"id".as("k4"))
df1.join(df2.hint("SHUFFLE_MERGE"), $"k1" === $"k2" && $"k3" + 1 < $"k4", "left_outer").explain("codegen")
```

Example generated code:

```
== Subtree 5 / 5 (maxMethodCodeSize:396; maxConstantPoolSize:159(0.24% used); numInnerClasses:0) ==
*(5) SortMergeJoin [k1#2L], [k2#8L], LeftOuter, ((k3#3L + 1) < k4#9L)
:- *(2) Sort [k1#2L ASC NULLS FIRST], false, 0
:  +- Exchange hashpartitioning(k1#2L, 5), ENSURE_REQUIREMENTS, [id=#26]
:     +- *(1) Project [id#0L AS k1#2L, id#0L AS k3#3L]
:        +- *(1) Range (0, 10, step=1, splits=2)
+- *(4) Sort [k2#8L ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(k2#8L, 5), ENSURE_REQUIREMENTS, [id=#32]
      +- *(3) Project [id#6L AS k2#8L, id#6L AS k4#9L]
         +- *(3) Range (0, 4, step=1, splits=2)

Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIteratorForCodegenStage5(references);
/* 003 */ }
/* 004 */
/* 005 */ // codegenStageId=5
/* 006 */ final class GeneratedIteratorForCodegenStage5 extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 007 */   private Object[] references;
/* 008 */   private scala.collection.Iterator[] inputs;
/* 009 */   private scala.collection.Iterator smj_streamedInput_0;
/* 010 */   private scala.collection.Iterator smj_bufferedInput_0;
/* 011 */   private InternalRow smj_streamedRow_0;
/* 012 */   private InternalRow smj_bufferedRow_0;
/* 013 */   private long smj_value_2;
/* 014 */   private org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray smj_matches_0;
/* 015 */   private long smj_value_3;
/* 016 */   private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[] smj_mutableStateArray_0 = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[1];
/* 017 */
/* 018 */   public GeneratedIteratorForCodegenStage5(Object[] references) {
/* 019 */     this.references = references;
/* 020 */   }
/* 021 */
/* 022 */   public void init(int index, scala.collection.Iterator[] inputs) {
/* 023 */     partitionIndex = index;
/* 024 */     this.inputs = inputs;
/* 025 */     smj_streamedInput_0 = inputs[0];
/* 026 */     smj_bufferedInput_0 = inputs[1];
/* 027 */
/* 028 */     smj_matches_0 = new org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray(2147483632, 2147483647);
/* 029 */     smj_mutableStateArray_0[0] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(4, 0);
/* 030 */
/* 031 */   }
/* 032 */
/* 033 */   private boolean findNextJoinRows(
/* 034 */     scala.collection.Iterator streamedIter,
/* 035 */     scala.collection.Iterator bufferedIter) {
/* 036 */     smj_streamedRow_0 = null;
/* 037 */     int comp = 0;
/* 038 */     while (smj_streamedRow_0 == null) {
/* 039 */       if (!streamedIter.hasNext()) return false;
/* 040 */       smj_streamedRow_0 = (InternalRow) streamedIter.next();
/* 041 */       long smj_value_0 = smj_streamedRow_0.getLong(0);
/* 042 */       if (false) {
/* 043 */         if (!smj_matches_0.isEmpty()) {
/* 044 */           smj_matches_0.clear();
/* 045 */         }
/* 046 */         return false;
/* 047 */
/* 048 */       }
/* 049 */       if (!smj_matches_0.isEmpty()) {
/* 050 */         comp = 0;
/* 051 */         if (comp == 0) {
/* 052 */           comp = (smj_value_0 > smj_value_3 ? 1 : smj_value_0 < smj_value_3 ? -1 : 0);
/* 053 */         }
/* 054 */
/* 055 */         if (comp == 0) {
/* 056 */           return true;
/* 057 */         }
/* 058 */         smj_matches_0.clear();
/* 059 */       }
/* 060 */
/* 061 */       do {
/* 062 */         if (smj_bufferedRow_0 == null) {
/* 063 */           if (!bufferedIter.hasNext()) {
/* 064 */             smj_value_3 = smj_value_0;
/* 065 */             return !smj_matches_0.isEmpty();
/* 066 */           }
/* 067 */           smj_bufferedRow_0 = (InternalRow) bufferedIter.next();
/* 068 */           long smj_value_1 = smj_bufferedRow_0.getLong(0);
/* 069 */           if (false) {
/* 070 */             smj_bufferedRow_0 = null;
/* 071 */             continue;
/* 072 */           }
/* 073 */           smj_value_2 = smj_value_1;
/* 074 */         }
/* 075 */
/* 076 */         comp = 0;
/* 077 */         if (comp == 0) {
/* 078 */           comp = (smj_value_0 > smj_value_2 ? 1 : smj_value_0 < smj_value_2 ? -1 : 0);
/* 079 */         }
/* 080 */
/* 081 */         if (comp > 0) {
/* 082 */           smj_bufferedRow_0 = null;
/* 083 */         } else if (comp < 0) {
/* 084 */           if (!smj_matches_0.isEmpty()) {
/* 085 */             smj_value_3 = smj_value_0;
/* 086 */             return true;
/* 087 */           } else {
/* 088 */             return false;
/* 089 */           }
/* 090 */         } else {
/* 091 */           smj_matches_0.add((UnsafeRow) smj_bufferedRow_0);
/* 092 */           smj_bufferedRow_0 = null;
/* 093 */         }
/* 094 */       } while (smj_streamedRow_0 != null);
/* 095 */     }
/* 096 */     return false; // unreachable
/* 097 */   }
/* 098 */
/* 099 */   protected void processNext() throws java.io.IOException {
/* 100 */     while (smj_streamedInput_0.hasNext()) {
/* 101 */       findNextJoinRows(smj_streamedInput_0, smj_bufferedInput_0);
/* 102 */       long smj_value_4 = -1L;
/* 103 */       long smj_value_5 = -1L;
/* 104 */       boolean smj_loaded_0 = false;
/* 105 */       smj_value_5 = smj_streamedRow_0.getLong(1);
/* 106 */       scala.collection.Iterator<UnsafeRow> smj_iterator_0 = smj_matches_0.generateIterator();
/* 107 */       boolean smj_foundMatch_0 = false;
/* 108 */
/* 109 */       // the last iteration of this loop is to emit an empty row if there is no matched rows.
/* 110 */       while (smj_iterator_0.hasNext() || !smj_foundMatch_0) {
/* 111 */         InternalRow smj_bufferedRow_1 = smj_iterator_0.hasNext() ?
/* 112 */         (InternalRow) smj_iterator_0.next() : null;
/* 113 */         boolean smj_isNull_5 = true;
/* 114 */         long smj_value_9 = -1L;
/* 115 */         if (smj_bufferedRow_1 != null) {
/* 116 */           long smj_value_8 = smj_bufferedRow_1.getLong(1);
/* 117 */           smj_isNull_5 = false;
/* 118 */           smj_value_9 = smj_value_8;
/* 119 */         }
/* 120 */         if (smj_bufferedRow_1 != null) {
/* 121 */           boolean smj_isNull_6 = true;
/* 122 */           boolean smj_value_10 = false;
/* 123 */           long smj_value_11 = -1L;
/* 124 */
/* 125 */           smj_value_11 = smj_value_5 + 1L;
/* 126 */
/* 127 */           if (!smj_isNull_5) {
/* 128 */             smj_isNull_6 = false; // resultCode could change nullability.
/* 129 */             smj_value_10 = smj_value_11 < smj_value_9;
/* 130 */
/* 131 */           }
/* 132 */           if (smj_isNull_6 || !smj_value_10) {
/* 133 */             continue;
/* 134 */           }
/* 135 */         }
/* 136 */         if (!smj_loaded_0) {
/* 137 */           smj_loaded_0 = true;
/* 138 */           smj_value_4 = smj_streamedRow_0.getLong(0);
/* 139 */         }
/* 140 */         boolean smj_isNull_3 = true;
/* 141 */         long smj_value_7 = -1L;
/* 142 */         if (smj_bufferedRow_1 != null) {
/* 143 */           long smj_value_6 = smj_bufferedRow_1.getLong(0);
/* 144 */           smj_isNull_3 = false;
/* 145 */           smj_value_7 = smj_value_6;
/* 146 */         }
/* 147 */         smj_foundMatch_0 = true;
/* 148 */         ((org.apache.spark.sql.execution.metric.SQLMetric) references[0] /* numOutputRows */).add(1);
/* 149 */
/* 150 */         smj_mutableStateArray_0[0].reset();
/* 151 */
/* 152 */         smj_mutableStateArray_0[0].zeroOutNullBytes();
/* 153 */
/* 154 */         smj_mutableStateArray_0[0].write(0, smj_value_4);
/* 155 */
/* 156 */         smj_mutableStateArray_0[0].write(1, smj_value_5);
/* 157 */
/* 158 */         if (smj_isNull_3) {
/* 159 */           smj_mutableStateArray_0[0].setNullAt(2);
/* 160 */         } else {
/* 161 */           smj_mutableStateArray_0[0].write(2, smj_value_7);
/* 162 */         }
/* 163 */
/* 164 */         if (smj_isNull_5) {
/* 165 */           smj_mutableStateArray_0[0].setNullAt(3);
/* 166 */         } else {
/* 167 */           smj_mutableStateArray_0[0].write(3, smj_value_9);
/* 168 */         }
/* 169 */         append((smj_mutableStateArray_0[0].getRow()).copy());
/* 170 */
/* 171 */       }
/* 172 */       if (shouldStop()) return;
/* 173 */     }
/* 174 */     ((org.apache.spark.sql.execution.joins.SortMergeJoinExec) references[1] /* plan */).cleanupResources();
/* 175 */   }
/* 176 */
/* 177 */ }
```

### Why are the changes needed?

Improve query CPU performance. Example micro benchmark below showed 10% run-time improvement.

```
def sortMergeJoinWithDuplicates(): Unit = {
    val N = 2 << 20
    codegenBenchmark("sort merge join with duplicates", N) {
      val df1 = spark.range(N)
        .selectExpr(s"(id * 15485863) % ${N*10} as k1", "id as k3")
      val df2 = spark.range(N)
        .selectExpr(s"(id * 15485867) % ${N*10} as k2", "id as k4")
      val df = df1.join(df2, col("k1") === col("k2") && col("k3") * 3 < col("k4"), "left_outer")
      assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortMergeJoinExec]).isDefined)
      df.noop()
    }
 }
```

```
Running benchmark: sort merge join with duplicates
  Running case: sort merge join with duplicates outer-smj-codegen off
  Stopped after 2 iterations, 2696 ms
  Running case: sort merge join with duplicates outer-smj-codegen on
  Stopped after 5 iterations, 6058 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
sort merge join with duplicates:                       Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------------------------------------------------
sort merge join with duplicates outer-smj-codegen off           1333           1348          21          1.6         635.7       1.0X
sort merge join with duplicates outer-smj-codegen on            1169           1212          47          1.8         557.4       1.1X
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added unit test in `WholeStageCodegenSuite.scala` and `WholeStageCodegenSuite.scala`.

Closes #32476 from c21/smj-outer-codegen.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
  • Loading branch information
c21 authored and cloud-fan committed May 12, 2021
1 parent b52d47a commit 7bcaded
Show file tree
Hide file tree
Showing 40 changed files with 452 additions and 253 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -354,7 +354,8 @@ case class SortMergeJoinExec(
}

private lazy val ((streamedPlan, streamedKeys), (bufferedPlan, bufferedKeys)) = joinType match {
case _: InnerLike => ((left, leftKeys), (right, rightKeys))
case _: InnerLike | LeftOuter => ((left, leftKeys), (right, rightKeys))
case RightOuter => ((right, rightKeys), (left, leftKeys))
case x =>
throw new IllegalArgumentException(
s"SortMergeJoin.streamedPlan/bufferedPlan should not take $x as the JoinType")
Expand All @@ -363,8 +364,9 @@ case class SortMergeJoinExec(
private lazy val streamedOutput = streamedPlan.output
private lazy val bufferedOutput = bufferedPlan.output

override def supportCodegen: Boolean = {
joinType.isInstanceOf[InnerLike]
override def supportCodegen: Boolean = joinType match {
case _: InnerLike | LeftOuter | RightOuter => true
case _ => false
}

override def inputRDDs(): Seq[RDD[InternalRow]] = {
Expand Down Expand Up @@ -431,6 +433,69 @@ case class SortMergeJoinExec(
// Copy the streamed keys as class members so they could be used in next function call.
val matchedKeyVars = copyKeys(ctx, streamedKeyVars)

// Handle the case when streamed rows has any NULL keys.
val handleStreamedAnyNull = joinType match {
case _: InnerLike =>
// Skip streamed row.
s"""
|$streamedRow = null;
|continue;
""".stripMargin
case LeftOuter | RightOuter =>
// Eagerly return streamed row. Only call `matches.clear()` when `matches.isEmpty()` is
// false, to reduce unnecessary computation.
s"""
|if (!$matches.isEmpty()) {
| $matches.clear();
|}
|return false;
""".stripMargin
case x =>
throw new IllegalArgumentException(
s"SortMergeJoin.genScanner should not take $x as the JoinType")
}

// Handle the case when streamed keys has no match with buffered side.
val handleStreamedWithoutMatch = joinType match {
case _: InnerLike =>
// Skip streamed row.
s"$streamedRow = null;"
case LeftOuter | RightOuter =>
// Eagerly return with streamed row.
"return false;"
case x =>
throw new IllegalArgumentException(
s"SortMergeJoin.genScanner should not take $x as the JoinType")
}

// Generate a function to scan both streamed and buffered sides to find a match.
// Return whether a match is found.
//
// `streamedIter`: the iterator for streamed side.
// `bufferedIter`: the iterator for buffered side.
// `streamedRow`: the current row from streamed side.
// When `streamedIter` is empty, `streamedRow` is null.
// `matches`: the rows from buffered side already matched with `streamedRow`.
// `matches` is buffered and reused for all `streamedRow`s having same join keys.
// If there is no match with `streamedRow`, `matches` is empty.
// `bufferedRow`: the current matched row from buffered side.
//
// The function has the following step:
// - Step 1: Find the next `streamedRow` with non-null join keys.
// For `streamedRow` with null join keys (`handleStreamedAnyNull`):
// 1. Inner join: skip the row. `matches` will be cleared later when hitting the
// next `streamedRow` with non-null join keys.
// 2. Left/Right Outer join: clear the previous `matches` if needed, keep the row,
// and return false.
//
// - Step 2: Find the `matches` from buffered side having same join keys with `streamedRow`.
// Clear `matches` if we hit a new `streamedRow`, as we need to find new matches.
// Use `bufferedRow` to iterate buffered side to put all matched rows into
// `matches`. Return true when getting all matched rows.
// For `streamedRow` without `matches` (`handleStreamedWithoutMatch`):
// 1. Inner join: skip the row.
// 2. Left/Right Outer join: keep the row and return false (with `matches` being
// empty).
ctx.addNewFunction("findNextJoinRows",
s"""
|private boolean findNextJoinRows(
Expand All @@ -443,8 +508,7 @@ case class SortMergeJoinExec(
| $streamedRow = (InternalRow) streamedIter.next();
| ${streamedKeyVars.map(_.code).mkString("\n")}
| if ($streamedAnyNull) {
| $streamedRow = null;
| continue;
| $handleStreamedAnyNull
| }
| if (!$matches.isEmpty()) {
| ${genComparison(ctx, streamedKeyVars, matchedKeyVars)}
Expand Down Expand Up @@ -475,8 +539,9 @@ case class SortMergeJoinExec(
| if (!$matches.isEmpty()) {
| ${matchedKeyVars.map(_.code).mkString("\n")}
| return true;
| } else {
| $handleStreamedWithoutMatch
| }
| $streamedRow = null;
| } else {
| $matches.add((UnsafeRow) $bufferedRow);
| $bufferedRow = null;
Expand All @@ -501,7 +566,7 @@ case class SortMergeJoinExec(
ctx: CodegenContext,
streamedRow: String): (Seq[ExprCode], Seq[String]) = {
ctx.INPUT_ROW = streamedRow
left.output.zipWithIndex.map { case (a, i) =>
streamedPlan.output.zipWithIndex.map { case (a, i) =>
val value = ctx.freshName("value")
val valueCode = CodeGenerator.getValue(streamedRow, a.dataType, i.toString)
val javaType = CodeGenerator.javaType(a.dataType)
Expand Down Expand Up @@ -569,7 +634,15 @@ case class SortMergeJoinExec(

val iterator = ctx.freshName("iterator")
val numOutput = metricTerm(ctx, "numOutputRows")
val resultVars = streamedVars ++ bufferedVars
val resultVars = joinType match {
case _: InnerLike | LeftOuter =>
streamedVars ++ bufferedVars
case RightOuter =>
bufferedVars ++ streamedVars
case x =>
throw new IllegalArgumentException(
s"SortMergeJoin.doProduce should not take $x as the JoinType")
}

val (beforeLoop, condCheck) = if (condition.isDefined) {
// Split the code of creating variables based on whether it's used by condition or not.
Expand All @@ -580,21 +653,27 @@ case class SortMergeJoinExec(
ctx.currentVars = resultVars
val cond = BindReferences.bindReference(condition.get, output).genCode(ctx)
// evaluate the columns those used by condition before loop
val before = s"""
val before =
s"""
|boolean $loaded = false;
|$streamedBefore
""".stripMargin

val checking = s"""
|$bufferedBefore
|${cond.code}
|if (${cond.isNull} || !${cond.value}) continue;
|if (!$loaded) {
| $loaded = true;
| $streamedAfter
|}
|$bufferedAfter
""".stripMargin
val checking =
s"""
|$bufferedBefore
|if ($bufferedRow != null) {
| ${cond.code}
| if (${cond.isNull} || !${cond.value}) {
| continue;
| }
|}
|if (!$loaded) {
| $loaded = true;
| $streamedAfter
|}
|$bufferedAfter
""".stripMargin
(before, checking)
} else {
(evaluateVariables(streamedVars), "")
Expand All @@ -603,21 +682,55 @@ case class SortMergeJoinExec(
val thisPlan = ctx.addReferenceObj("plan", this)
val eagerCleanup = s"$thisPlan.cleanupResources();"

s"""
|while (findNextJoinRows($streamedInput, $bufferedInput)) {
| ${streamedVarDecl.mkString("\n")}
| ${beforeLoop.trim}
| scala.collection.Iterator<UnsafeRow> $iterator = $matches.generateIterator();
| while ($iterator.hasNext()) {
| InternalRow $bufferedRow = (InternalRow) $iterator.next();
| ${condCheck.trim}
| $numOutput.add(1);
| ${consume(ctx, resultVars)}
| }
| if (shouldStop()) return;
|}
|$eagerCleanup
lazy val innerJoin =
s"""
|while (findNextJoinRows($streamedInput, $bufferedInput)) {
| ${streamedVarDecl.mkString("\n")}
| ${beforeLoop.trim}
| scala.collection.Iterator<UnsafeRow> $iterator = $matches.generateIterator();
| while ($iterator.hasNext()) {
| InternalRow $bufferedRow = (InternalRow) $iterator.next();
| ${condCheck.trim}
| $numOutput.add(1);
| ${consume(ctx, resultVars)}
| }
| if (shouldStop()) return;
|}
|$eagerCleanup
""".stripMargin

lazy val outerJoin = {
val hasOutputRow = ctx.freshName("hasOutputRow")
s"""
|while ($streamedInput.hasNext()) {
| findNextJoinRows($streamedInput, $bufferedInput);
| ${streamedVarDecl.mkString("\n")}
| ${beforeLoop.trim}
| scala.collection.Iterator<UnsafeRow> $iterator = $matches.generateIterator();
| boolean $hasOutputRow = false;
|
| // the last iteration of this loop is to emit an empty row if there is no matched rows.
| while ($iterator.hasNext() || !$hasOutputRow) {
| InternalRow $bufferedRow = $iterator.hasNext() ?
| (InternalRow) $iterator.next() : null;
| ${condCheck.trim}
| $hasOutputRow = true;
| $numOutput.add(1);
| ${consume(ctx, resultVars)}
| }
| if (shouldStop()) return;
|}
|$eagerCleanup
""".stripMargin
}

joinType match {
case _: InnerLike => innerJoin
case LeftOuter | RightOuter => outerJoin
case x =>
throw new IllegalArgumentException(
s"SortMergeJoin.doProduce should not take $x as the JoinType")
}
}

override protected def withNewChildrenInternal(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ TakeOrderedAndProject (36)
: :- * Project (20)
: : +- * BroadcastHashJoin Inner BuildRight (19)
: : :- * Project (13)
: : : +- SortMergeJoin LeftOuter (12)
: : : +- * SortMergeJoin LeftOuter (12)
: : : :- * Sort (5)
: : : : +- Exchange (4)
: : : : +- * Filter (3)
Expand Down Expand Up @@ -86,7 +86,7 @@ Arguments: hashpartitioning(cr_order_number#9, cr_item_sk#8, 5), ENSURE_REQUIREM
Input [3]: [cr_item_sk#8, cr_order_number#9, cr_refunded_cash#10]
Arguments: [cr_order_number#9 ASC NULLS FIRST, cr_item_sk#8 ASC NULLS FIRST], false, 0

(12) SortMergeJoin
(12) SortMergeJoin [codegen id : 8]
Left keys [2]: [cs_order_number#3, cs_item_sk#2]
Right keys [2]: [cr_order_number#9, cr_item_sk#8]
Join condition: None
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ TakeOrderedAndProject [w_state,i_item_id,sales_before,sales_after]
Project [cs_warehouse_sk,cs_sales_price,cs_sold_date_sk,cr_refunded_cash,i_item_id]
BroadcastHashJoin [cs_item_sk,i_item_sk]
Project [cs_warehouse_sk,cs_item_sk,cs_sales_price,cs_sold_date_sk,cr_refunded_cash]
InputAdapter
SortMergeJoin [cs_order_number,cs_item_sk,cr_order_number,cr_item_sk]
SortMergeJoin [cs_order_number,cs_item_sk,cr_order_number,cr_item_sk]
InputAdapter
WholeStageCodegen (2)
Sort [cs_order_number,cs_item_sk]
InputAdapter
Expand All @@ -25,6 +25,7 @@ TakeOrderedAndProject [w_state,i_item_id,sales_before,sales_after]
Scan parquet default.catalog_sales [cs_warehouse_sk,cs_item_sk,cs_order_number,cs_sales_price,cs_sold_date_sk]
SubqueryBroadcast [d_date_sk] #1
ReusedExchange [d_date_sk,d_date] #3
InputAdapter
WholeStageCodegen (4)
Sort [cr_order_number,cr_item_sk]
InputAdapter
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ TakeOrderedAndProject (36)
: :- * Project (19)
: : +- * BroadcastHashJoin Inner BuildRight (18)
: : :- * Project (13)
: : : +- SortMergeJoin LeftOuter (12)
: : : +- * SortMergeJoin LeftOuter (12)
: : : :- * Sort (5)
: : : : +- Exchange (4)
: : : : +- * Filter (3)
Expand Down Expand Up @@ -86,7 +86,7 @@ Arguments: hashpartitioning(cr_order_number#9, cr_item_sk#8, 5), ENSURE_REQUIREM
Input [3]: [cr_item_sk#8, cr_order_number#9, cr_refunded_cash#10]
Arguments: [cr_order_number#9 ASC NULLS FIRST, cr_item_sk#8 ASC NULLS FIRST], false, 0

(12) SortMergeJoin
(12) SortMergeJoin [codegen id : 8]
Left keys [2]: [cs_order_number#3, cs_item_sk#2]
Right keys [2]: [cr_order_number#9, cr_item_sk#8]
Join condition: None
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ TakeOrderedAndProject [w_state,i_item_id,sales_before,sales_after]
Project [cs_item_sk,cs_sales_price,cs_sold_date_sk,cr_refunded_cash,w_state]
BroadcastHashJoin [cs_warehouse_sk,w_warehouse_sk]
Project [cs_warehouse_sk,cs_item_sk,cs_sales_price,cs_sold_date_sk,cr_refunded_cash]
InputAdapter
SortMergeJoin [cs_order_number,cs_item_sk,cr_order_number,cr_item_sk]
SortMergeJoin [cs_order_number,cs_item_sk,cr_order_number,cr_item_sk]
InputAdapter
WholeStageCodegen (2)
Sort [cs_order_number,cs_item_sk]
InputAdapter
Expand All @@ -25,6 +25,7 @@ TakeOrderedAndProject [w_state,i_item_id,sales_before,sales_after]
Scan parquet default.catalog_sales [cs_warehouse_sk,cs_item_sk,cs_order_number,cs_sales_price,cs_sold_date_sk]
SubqueryBroadcast [d_date_sk] #1
ReusedExchange [d_date_sk,d_date] #3
InputAdapter
WholeStageCodegen (4)
Sort [cr_order_number,cr_item_sk]
InputAdapter
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ TakeOrderedAndProject (80)
+- Exchange (78)
+- * HashAggregate (77)
+- * Project (76)
+- SortMergeJoin LeftOuter (75)
+- * SortMergeJoin LeftOuter (75)
:- * Sort (68)
: +- Exchange (67)
: +- * Project (66)
Expand Down Expand Up @@ -410,7 +410,7 @@ Arguments: hashpartitioning(cr_item_sk#43, cr_order_number#44, 5), ENSURE_REQUIR
Input [2]: [cr_item_sk#43, cr_order_number#44]
Arguments: [cr_item_sk#43 ASC NULLS FIRST, cr_order_number#44 ASC NULLS FIRST], false, 0

(75) SortMergeJoin
(75) SortMergeJoin [codegen id : 20]
Left keys [2]: [cs_item_sk#4, cs_order_number#6]
Right keys [2]: [cr_item_sk#43, cr_order_number#44]
Join condition: None
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ TakeOrderedAndProject [total_cnt,i_item_desc,w_warehouse_name,d_week_seq,no_prom
WholeStageCodegen (20)
HashAggregate [i_item_desc,w_warehouse_name,d_week_seq] [count,count]
Project [w_warehouse_name,i_item_desc,d_week_seq]
InputAdapter
SortMergeJoin [cs_item_sk,cs_order_number,cr_item_sk,cr_order_number]
SortMergeJoin [cs_item_sk,cs_order_number,cr_item_sk,cr_order_number]
InputAdapter
WholeStageCodegen (17)
Sort [cs_item_sk,cs_order_number]
InputAdapter
Expand Down Expand Up @@ -121,6 +121,7 @@ TakeOrderedAndProject [total_cnt,i_item_desc,w_warehouse_name,d_week_seq,no_prom
ColumnarToRow
InputAdapter
Scan parquet default.promotion [p_promo_sk]
InputAdapter
WholeStageCodegen (19)
Sort [cr_item_sk,cr_order_number]
InputAdapter
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ TakeOrderedAndProject (74)
+- Exchange (72)
+- * HashAggregate (71)
+- * Project (70)
+- SortMergeJoin LeftOuter (69)
+- * SortMergeJoin LeftOuter (69)
:- * Sort (62)
: +- Exchange (61)
: +- * Project (60)
Expand Down Expand Up @@ -380,7 +380,7 @@ Arguments: hashpartitioning(cr_item_sk#41, cr_order_number#42, 5), ENSURE_REQUIR
Input [2]: [cr_item_sk#41, cr_order_number#42]
Arguments: [cr_item_sk#41 ASC NULLS FIRST, cr_order_number#42 ASC NULLS FIRST], false, 0

(69) SortMergeJoin
(69) SortMergeJoin [codegen id : 14]
Left keys [2]: [cs_item_sk#4, cs_order_number#6]
Right keys [2]: [cr_item_sk#41, cr_order_number#42]
Join condition: None
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ TakeOrderedAndProject [total_cnt,i_item_desc,w_warehouse_name,d_week_seq,no_prom
WholeStageCodegen (14)
HashAggregate [i_item_desc,w_warehouse_name,d_week_seq] [count,count]
Project [w_warehouse_name,i_item_desc,d_week_seq]
InputAdapter
SortMergeJoin [cs_item_sk,cs_order_number,cr_item_sk,cr_order_number]
SortMergeJoin [cs_item_sk,cs_order_number,cr_item_sk,cr_order_number]
InputAdapter
WholeStageCodegen (11)
Sort [cs_item_sk,cs_order_number]
InputAdapter
Expand Down Expand Up @@ -103,6 +103,7 @@ TakeOrderedAndProject [total_cnt,i_item_desc,w_warehouse_name,d_week_seq,no_prom
ColumnarToRow
InputAdapter
Scan parquet default.promotion [p_promo_sk]
InputAdapter
WholeStageCodegen (13)
Sort [cr_item_sk,cr_order_number]
InputAdapter
Expand Down
Loading

0 comments on commit 7bcaded

Please sign in to comment.