Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26337][SQL][TEST] Add benchmark for LongToUnsafeRowMap #23284

Closed
wants to merge 4 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Dec 11, 2018

What changes were proposed in this pull request?

Regarding the performance issue of SPARK-26155, it reports the issue on TPC-DS. I think it is better to add a benchmark for LongToUnsafeRowMap which is the root cause of performance regression.

It can be easier to show performance difference between different metric implementations in LongToUnsafeRowMap.

How was this patch tested?

Manually run added benchmark.

@viirya
Copy link
Member Author

viirya commented Dec 11, 2018

Without metrics (master after PR 23269):

[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.13.6
[info] Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
[info] LongToUnsafeRowMap metrics:              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------
[info] LongToUnsafeRowMap                             243 /  347          2.1         485.0       1.0X

Using LongAdder to compute metrics (by applying PR 23214):

[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.13.6
[info] Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
[info] LongToUnsafeRowMap metrics:              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------
[info] LongToUnsafeRowMap                             401 /  460          1.2         802.2       1.0X

Using Long variables (master before PR 23269):

[info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.13.6
[info] Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
[info] LongToUnsafeRowMap metrics:              Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------
[info] LongToUnsafeRowMap                            1220 / 1233          0.4        2439.1       1.0X

@viirya
Copy link
Member Author

viirya commented Dec 11, 2018

cc @cloud-fan @dongjoon-hyun

@SparkQA
Copy link

SparkQA commented Dec 11, 2018

Test build #99964 has finished for PR 23284 at commit fa70205.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 11, 2018

Test build #99974 has finished for PR 23284 at commit cdbae0a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

can you update #23284 (comment) ?

The revert PR is already merged, so we should revert the revert PR and run the benchmark again, and post the results in the comment.

@cloud-fan
Copy link
Contributor

BTW I think this proves that, in Java if long is accessed by multiple threads, it will cause perf problems even without lock. Maybe it's related to memory barrier.

cc @kiszk @dongjoon-hyun @JkSelf

@SparkQA
Copy link

SparkQA commented Dec 11, 2018

Test build #99973 has finished for PR 23284 at commit ccb6fed.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Thank you for ping me, @viirya and @cloud-fan .

@SparkQA
Copy link

SparkQA commented Dec 11, 2018

Test build #99979 has finished for PR 23284 at commit 723b27c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Dec 11, 2018

retest this please

@kiszk
Copy link
Member

kiszk commented Dec 11, 2018

To be honest, I cannot understand why the original performance degradation occurred.

I think that read/write of long value does not require any sychronization or memory barrier without declaring volatile.
At this PR, numKeyLookups and numProbes are non-volatile long variables. I confirmed it by decompling a class file. However, I have no time to disassemble the generated code today and tomorrow.

Here is an article that addresses the similar topic regarding static long.

cc @rednaxelafx

@srowen
Copy link
Member

srowen commented Dec 11, 2018

There's no good reason why 64-bit reads/writes shouldn't be atomic on a 64-bit machine, and I assume everything we're testing on is 64-bit these days. It was an issue in the past, and yes as you note, the JLS seems to allow for it to be implementation-specific. No idea...

@SparkQA
Copy link

SparkQA commented Dec 11, 2018

Test build #99985 has finished for PR 23284 at commit 723b27c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -6,6 +6,6 @@ Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.13.6
Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
LongToUnsafeRowMap metrics: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
LongToUnsafeRowMap 1265 / 1336 0.4 2530.5 1.0X
LongToUnsafeRowMap 234 / 315 2.1 467.3 1.0X
Copy link
Member

@dongjoon-hyun dongjoon-hyun Dec 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. It's a clear reason to have this benchmark (723b27c)

map.optimize()

val threads = (0 to 100).map { _ =>
val thread = new Thread {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, is this the real difference from AggregateBenchmark.LongToUnsafeRowMap benchmark case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think multi-thread is the key here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, here we focus on multi-thread reading the same LongToUnsafeRowMap .

* Results will be written to "benchmarks/HashedRelationMetricsBenchmark-results.txt".
* }}}
*/
object HashedRelationMetricsBenchmark extends SqlBasedBenchmark {
Copy link
Contributor

@cloud-fan cloud-fan Dec 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to match the real case, shall we benchmark BytesToBytesMap instead of LongToUnsafeRowMap? The real case is, we have one LongToUnsafeRowMap for each thread, but they share the same BytesToBytesMap

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnsafeHashedRelation uses BytesToBytesMap. LongHashedRelation uses LongToUnsafeRowMap. The real case is we have one LongHashedRelation for each thread and they share the same LongToUnsafeRowMap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes I got messed up :P

@viirya
Copy link
Member Author

viirya commented Dec 12, 2018

The revert PR is already merged, so we should revert the revert PR and run the benchmark again, and post the results in the comment.

I've posted the benchmarks for original master before the revert PR in previous comment. I think that is what you asked?

@cloud-fan
Copy link
Contributor

The future reviewers may not know the context, when they see Using Long variables (master):, they will be confused as the master branch before your PR is not that.

@viirya
Copy link
Member Author

viirya commented Dec 12, 2018

The future reviewers may not know the context, when they see Using Long variables (master):, they will be confused as the master branch before your PR is not that.

If so, let me update the previous comment.

@viirya
Copy link
Member Author

viirya commented Dec 14, 2018

@cloud-fan Is this ready to merge?

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in 93139af Dec 14, 2018
holdenk pushed a commit to holdenk/spark that referenced this pull request Jan 5, 2019
## What changes were proposed in this pull request?

Regarding the performance issue of SPARK-26155, it reports the issue on TPC-DS. I think it is better to add a benchmark for `LongToUnsafeRowMap` which is the root cause of performance regression.

It can be easier to show performance difference between different metric implementations in `LongToUnsafeRowMap`.

## How was this patch tested?

Manually run added benchmark.

Closes apache#23284 from viirya/SPARK-26337.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

Regarding the performance issue of SPARK-26155, it reports the issue on TPC-DS. I think it is better to add a benchmark for `LongToUnsafeRowMap` which is the root cause of performance regression.

It can be easier to show performance difference between different metric implementations in `LongToUnsafeRowMap`.

## How was this patch tested?

Manually run added benchmark.

Closes apache#23284 from viirya/SPARK-26337.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@viirya viirya deleted the SPARK-26337 branch December 27, 2023 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants