Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48025][SQL][TESTS] Fix org.apache.spark.sql.execution.benchmark.DateTimeBenchmark #46261

Closed
wants to merge 2 commits into from

Conversation

yaooqinn
Copy link
Member

What changes were proposed in this pull request?

This PR fixes several issues in org.apache.spark.sql.execution.benchmark.DateTimeBenchmark

  • Misuse trunc function, a.k.a, the parameter order is wrong in reverse order
  • Some benchmarks not compatible with ANSI by default

Why are the changes needed?

restore benchmark cases

Does this PR introduce any user-facing change?

no

How was this patch tested?

benchmark

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions github-actions bot added the SQL label Apr 28, 2024
@@ -75,7 +81,7 @@ object DateTimeBenchmark extends SqlBasedBenchmark {
doBenchmark(N, s"$dt + interval 1 month 2 day")
}
benchmark.addCase("date + interval(m, d, ms)") { _ =>
doBenchmark(N, s"$dt + interval 1 month 2 day 5 hour")
doBenchmarkAnsiOff(N, s"$dt + interval 1 month 2 day 5 hour")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

illegal hour portion for ansi date add

@@ -161,7 +167,7 @@ object DateTimeBenchmark extends SqlBasedBenchmark {
}
val dateExpr = "cast(timestamp_seconds(id) as date)"
Seq("year", "yyyy", "yy", "mon", "month", "mm").foreach { level =>
run(N, s"trunc $level", s"trunc('$level', $dateExpr)")
run(N, s"trunc $level", s"trunc($dateExpr, '$level')")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the parameter order is wrong

@@ -171,7 +177,7 @@ object DateTimeBenchmark extends SqlBasedBenchmark {
run(n, "to timestamp str", timestampStrExpr)
run(n, "to_timestamp", s"to_timestamp($timestampStrExpr, $pattern)")
run(n, "to_unix_timestamp", s"to_unix_timestamp($timestampStrExpr, $pattern)")
val dateStrExpr = "concat('2019-01-', lpad(mod(id, 25), 2, '0'))"
val dateStrExpr = "concat('2019-01-', lpad(mod(id, 25) + 1, 2, '0'))"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid invalid date value 2019-01-00

Comment on lines +360 to +396
trunc year wholestage off 866 867 2 11.5 86.6 1.0X
trunc year wholestage on 886 888 2 11.3 88.6 1.0X

OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure
OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
AMD EPYC 7763 64-Core Processor
trunc yyyy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
trunc yyyy wholestage off 301 305 5 33.2 30.1 1.0X
trunc yyyy wholestage on 271 275 3 36.9 27.1 1.1X
trunc yyyy wholestage off 882 883 1 11.3 88.2 1.0X
trunc yyyy wholestage on 881 887 5 11.4 88.1 1.0X

OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure
OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
AMD EPYC 7763 64-Core Processor
trunc yy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
trunc yy wholestage off 295 295 0 33.9 29.5 1.0X
trunc yy wholestage on 270 277 4 37.0 27.0 1.1X
trunc yy wholestage off 875 877 3 11.4 87.5 1.0X
trunc yy wholestage on 881 890 9 11.3 88.1 1.0X

OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure
OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
AMD EPYC 7763 64-Core Processor
trunc mon: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
trunc mon wholestage off 296 297 1 33.8 29.6 1.0X
trunc mon wholestage on 275 278 2 36.4 27.5 1.1X
trunc mon wholestage off 838 839 2 11.9 83.8 1.0X
trunc mon wholestage on 833 839 5 12.0 83.3 1.0X

OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure
OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
AMD EPYC 7763 64-Core Processor
trunc month: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
trunc month wholestage off 304 308 6 32.9 30.4 1.0X
trunc month wholestage on 276 280 4 36.2 27.6 1.1X
trunc month wholestage off 839 841 4 11.9 83.9 1.0X
trunc month wholestage on 837 839 2 12.0 83.7 1.0X

OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure
OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
AMD EPYC 7763 64-Core Processor
trunc mm: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
trunc mm wholestage off 297 298 1 33.6 29.7 1.0X
trunc mm wholestage on 274 276 2 36.5 27.4 1.1X
trunc mm wholestage off 846 847 1 11.8 84.6 1.0X
trunc mm wholestage on 834 845 11 12.0 83.4 1.0X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results are corrected

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine but do we have this issue in other banchmarks too?

@yaooqinn
Copy link
Member Author

Thank you @HyukjinKwon. I have dispatched a CI task to audit, https://github.com/yaooqinn/spark/actions/runs/8866497258

@yaooqinn
Copy link
Member Author

Merged to master

@yaooqinn yaooqinn closed this in 0bf3945 Apr 28, 2024
@yaooqinn yaooqinn deleted the SPARK-48025 branch April 28, 2024 12:02
JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
…k.DateTimeBenchmark

### What changes were proposed in this pull request?

This PR fixes several issues in org.apache.spark.sql.execution.benchmark.DateTimeBenchmark
- Misuse `trunc` function, a.k.a, the parameter order is wrong in reverse order
- Some benchmarks not compatible with ANSI by default

### Why are the changes needed?

restore benchmark cases

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

benchmark

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#46261 from yaooqinn/SPARK-48025.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Kent Yao <yao@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants