"INSERT OVERWRITE x SELECT /+ REPARTITION(2) / * FROM y LIMIT 2" drains 4 rows into table x using Arrow write extension #909

zhztheplayer · 2022-05-10T02:47:35Z

No description provided.

zhztheplayer · 2022-05-10T02:50:29Z

Was not able to reproduce the bug.

Code I am using:

  test("TEST") {
    spark.sql("CREATE TABLE test(a string, b string) using parquet;")
    spark.sql("INSERT INTO test VALUES (\"0\", \"0\"), (\"1\", \"1\"), (\"2\", \"2\"), (\"3\", " +
        "\"3\"), (\"4\", \"4\"), (\"5\", \"5\");")
    spark.sql("CREATE TABLE test1(a string, b string) using arrow;")
    val frame = spark.sql("INSERT OVERWRITE test1 SELECT /*+ REPARTITION(2) */ * FROM test LIMIT " +
        "2;")
    frame.explain()
    frame.show()
    spark.sql("SELECT COUNT(*) FROM test1").show()
  }

Output:

== Physical Plan ==
Execute InsertIntoHadoopFsRelationCommand file:/opt/code/native-sql-engine/spark-warehouse/com.intel.oap.spark.sql.execution.datasources.arrow.ArrowDataSourceTest/test1, false, com.intel.oap.spark.sql.execution.datasources.arrow.ArrowFileFormat@af96ac9, Map(path -> file:/opt/code/native-sql-engine/spark-warehouse/com.intel.oap.spark.sql.execution.datasources.arrow.ArrowDataSourceTest/test1), Overwrite, CatalogTable(
Database: default
Table: test1
Created Time: Mon May 09 19:44:54 PDT 2022
Last Access: UNKNOWN
Created By: Spark 3.1.1
Type: MANAGED
Provider: arrow
Location: file:/opt/code/native-sql-engine/spark-warehouse/com.intel.oap.spark.sql.execution.datasources.arrow.ArrowDataSourceTest/test1
Schema: root
 |-- a: string (nullable = true)
 |-- b: string (nullable = true)
), org.apache.spark.sql.execution.datasources.InMemoryFileIndex@804d92de, [a, b]
+- ColumnarToFakeRowAdaptor
   +- RowToArrowColumnar
      +- GlobalLimit 2
         +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#178]
            +- LocalLimit 2
               +- Exchange RoundRobinPartitioning(2), REPARTITION_WITH_NUM, [id=#174]
                  +- FileScan parquet default.test[a#86,b#87] Batched: false, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/opt/code/native-sql-engine/spark-warehouse/com.intel.oap.spark.sql.execut..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:string,b:string>


++
||
++
++

+--------+
|count(1)|
+--------+
|       2|
+--------+

zhztheplayer added the bug Something isn't working label May 10, 2022

This was referenced May 10, 2022

[NSE-909] "INSERT OVERWRITE x SELECT /*+ REPARTITION(2) */ * FROM y L… #912

Merged

[NSE-909] fix slow test #916

Merged

zhztheplayer closed this as completed May 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"INSERT OVERWRITE x SELECT /+ REPARTITION(2) / * FROM y LIMIT 2" drains 4 rows into table x using Arrow write extension #909

"INSERT OVERWRITE x SELECT /+ REPARTITION(2) / * FROM y LIMIT 2" drains 4 rows into table x using Arrow write extension #909

zhztheplayer commented May 10, 2022

zhztheplayer commented May 10, 2022

"INSERT OVERWRITE x SELECT /*+ REPARTITION(2) */ * FROM y LIMIT 2" drains 4 rows into table x using Arrow write extension #909

"INSERT OVERWRITE x SELECT /*+ REPARTITION(2) */ * FROM y LIMIT 2" drains 4 rows into table x using Arrow write extension #909

Comments

zhztheplayer commented May 10, 2022

zhztheplayer commented May 10, 2022

"INSERT OVERWRITE x SELECT /+ REPARTITION(2) / * FROM y LIMIT 2" drains 4 rows into table x using Arrow write extension #909

"INSERT OVERWRITE x SELECT /+ REPARTITION(2) / * FROM y LIMIT 2" drains 4 rows into table x using Arrow write extension #909