Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

"INSERT OVERWRITE x SELECT /*+ REPARTITION(2) */ * FROM y LIMIT 2" drains 4 rows into table x using Arrow write extension #909

Closed
zhztheplayer opened this issue May 10, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@zhztheplayer
Copy link
Collaborator

No description provided.

@zhztheplayer zhztheplayer added the bug Something isn't working label May 10, 2022
@zhztheplayer
Copy link
Collaborator Author

Was not able to reproduce the bug.

Code I am using:

  test("TEST") {
    spark.sql("CREATE TABLE test(a string, b string) using parquet;")
    spark.sql("INSERT INTO test VALUES (\"0\", \"0\"), (\"1\", \"1\"), (\"2\", \"2\"), (\"3\", " +
        "\"3\"), (\"4\", \"4\"), (\"5\", \"5\");")
    spark.sql("CREATE TABLE test1(a string, b string) using arrow;")
    val frame = spark.sql("INSERT OVERWRITE test1 SELECT /*+ REPARTITION(2) */ * FROM test LIMIT " +
        "2;")
    frame.explain()
    frame.show()
    spark.sql("SELECT COUNT(*) FROM test1").show()
  }

Output:

== Physical Plan ==
Execute InsertIntoHadoopFsRelationCommand file:/opt/code/native-sql-engine/spark-warehouse/com.intel.oap.spark.sql.execution.datasources.arrow.ArrowDataSourceTest/test1, false, com.intel.oap.spark.sql.execution.datasources.arrow.ArrowFileFormat@af96ac9, Map(path -> file:/opt/code/native-sql-engine/spark-warehouse/com.intel.oap.spark.sql.execution.datasources.arrow.ArrowDataSourceTest/test1), Overwrite, CatalogTable(
Database: default
Table: test1
Created Time: Mon May 09 19:44:54 PDT 2022
Last Access: UNKNOWN
Created By: Spark 3.1.1
Type: MANAGED
Provider: arrow
Location: file:/opt/code/native-sql-engine/spark-warehouse/com.intel.oap.spark.sql.execution.datasources.arrow.ArrowDataSourceTest/test1
Schema: root
 |-- a: string (nullable = true)
 |-- b: string (nullable = true)
), org.apache.spark.sql.execution.datasources.InMemoryFileIndex@804d92de, [a, b]
+- ColumnarToFakeRowAdaptor
   +- RowToArrowColumnar
      +- GlobalLimit 2
         +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#178]
            +- LocalLimit 2
               +- Exchange RoundRobinPartitioning(2), REPARTITION_WITH_NUM, [id=#174]
                  +- FileScan parquet default.test[a#86,b#87] Batched: false, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/opt/code/native-sql-engine/spark-warehouse/com.intel.oap.spark.sql.execut..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:string,b:string>


++
||
++
++

+--------+
|count(1)|
+--------+
|       2|
+--------+

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant