Skip to content

Commit

Permalink
[SPARK-49193][SQL] Improve the performance of RowSetUtils.toColumnBas…
Browse files Browse the repository at this point in the history
…edSet

### What changes were proposed in this pull request?

Replace `while` loop with `foreach` in `RowSetUtils.toTColumn`.

### Why are the changes needed?

Improve the performance of `RowSetUtils.toColumnBasedSet`:
<img width="1196" alt="image" src="https://github.com/user-attachments/assets/f481de39-e0bf-41c5-8fee-09dc1a93c4e1">

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual test.
```scala
import org.apache.hive.service.rpc.thrift.TProtocolVersion
import org.apache.spark.sql.execution.HiveResult

val df = spark.sql("select id, cast(id as string), cast(id as timestamp) from range(200000)")
val dataTypes = df.schema.fields.map(_.dataType)
val rows = df.collect().toList
val start1 = System.currentTimeMillis()
RowSetUtils.toTRowSet(1, rows, dataTypes, TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V11, HiveResult.getTimeFormatters)
val start2 = System.currentTimeMillis()
RowSetUtils.toTRowSet(1, rows, dataTypes, TProtocolVersion.HIVE_CLI_SERVICE_PROTOCOL_V5, HiveResult.getTimeFormatters)
val start3 = System.currentTimeMillis()
println(s"toColumnBasedSet time: ${start2 - start1}, toRowBasedSet time: ${start3 - start2}")
```

Before this PR:
```
toColumnBasedSet time: 17307, toRowBasedSet time: 71
```

After this PR:
```
toColumnBasedSet time: 128, toRowBasedSet time: 70
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47699 from wangyum/toColumnBasedSet.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 567d58c)
Signed-off-by: Kent Yao <yao@apache.org>
  • Loading branch information
wangyum authored and yaooqinn committed Aug 12, 2024
1 parent 917c45e commit e22fa27
Showing 1 changed file with 1 addition and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -131,8 +131,7 @@ object RowSetUtils {
var i = 0
val rowSize = rows.length
val values = new java.util.ArrayList[String](rowSize)
while (i < rowSize) {
val row = rows(i)
rows.foreach { row =>
nulls.set(i, row.isNullAt(ordinal))
val value = if (row.isNullAt(ordinal)) {
""
Expand Down

0 comments on commit e22fa27

Please sign in to comment.