Qualification tool output recommendations on a per sql query basis #6092

tgravescs · 2022-07-26T16:08:58Z

This adds in an option to allow a user to get the recommendations at the per sql query level in addition to the application level. If an application has a lot of sql queries, this lets you narrow it down further to figure out which queries might benefit from being on the GPU the most.

2 new output files rapids_4_spark_qualification_output_persql.log and rapids_4_spark_qualification_output_persql.csv for text format and then csv format, contents are the same between them for now.

output looks like:

==========================================================================================================================================================================================================================================================
|                              App Name|             App ID|SQL ID|                                            SQL Description|SQL DF Duration|GPU Opportunity|Estimated GPU Duration|Estimated GPU Speedup|Estimated GPU Time Saved|      Recommendation|
==========================================================================================================================================================================================================================================================
|Rapids Spark Profiling Tool Unit Tests|local-1622043423018|     1|                   count at QualificationInfoUtils.scala:94|           7143|           6719|               2716.79|                 2.62|                  4426.2|Strongly Recommended|
|Rapids Spark Profiling Tool Unit Tests|local-1622043423018|     3|                   count at QualificationInfoUtils.scala:94|           2052|           1660|                958.28|                 2.14|                 1093.71|         Recommended|
|Rapids Spark Profiling Tool Unit Tests|local-1622043423018|     2|                   count at QualificationInfoUtils.scala:94|           1933|           1551|                 911.3|                 2.12|                 1021.69|         Recommended|
|                           Spark shell|local-1651187225439|     0|                                       show at <console>:26|            498|            249|                 373.5|                 1.33|                   124.5|         Recommended|

2 new options added, one to turn this output on and one to control the size of the sql description printed. Default is 100 there. The normal options qualification tool take apply to this table as well. Such as order and limiting number of rows.

This does not contain the UI changes.

… basis Signed-off-by: Thomas Graves <tgraves@apache.org>

tgravescs · 2022-07-26T16:11:11Z

build

tgravescs · 2022-07-26T16:30:22Z

looks like I didn't upmerge to latest, will update shortly

tgravescs · 2022-07-26T16:37:09Z

build

nartal1

Overall looks good. Just a question on perSqlInfo

tools/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala

tgravescs · 2022-07-26T20:03:01Z

build

amahussein

The CSV file format may get corrupted by the SQL description.
The SQL descriptions seem to include ","

Spark shell,app-20210509200722-0001,105,Execution: ss_max-v2.4, iteration: 1, StandardRun=true,335,0,335.0,1.0,0.0,Not Recommended

The above row has two extra columns compared to other ones.

tgravescs · 2022-07-26T20:36:47Z

ah yep, needs to replace those

with CSV files Signed-off-by: Thomas Graves <tgraves@nvidia.com>

tgravescs · 2022-07-26T23:46:05Z

build

pxLi · 2022-07-27T01:41:56Z

unrelated Unable to find py4j in failed the CI intermittently. let me retrigger

pxLi · 2022-07-27T01:42:01Z

build

amahussein

This fixes the bug. It is concerning that the fix did not require any change in the unit tests.
Later, we may consider covering this in the unit tests.

LGTME.

tgravescs · 2022-07-27T13:22:50Z

Sorry didn't hit return on my comment last night: want to come up with a test to check this still but could be followup. I didn't want to block the UI part

tgravescs · 2022-07-27T13:30:38Z

I'm going to merge this and followup with a test in separate pr.

tgravescs and others added 3 commits July 26, 2022 10:20

Add in support for Qualification tool to recommend on a per sql query…

21763b9

… basis Signed-off-by: Thomas Graves <tgraves@apache.org>

Add in test expectation files

9904cfd

Add documentation for per sql option

1204b6b

tgravescs added the tools label Jul 26, 2022

tgravescs added this to the July 22 - Aug 5 milestone Jul 26, 2022

tgravescs self-assigned this Jul 26, 2022

Merge remote-tracking branch 'origin/branch-22.08' into qualPerSql2

c152789

Update test after upmerging

3b5fb65

amahussein mentioned this pull request Jul 26, 2022

[FEA] Add SQL table to Qualification's app-details view #6059

Closed

nartal1 reviewed Jul 26, 2022

View reviewed changes

tools/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala Outdated Show resolved Hide resolved

use the correct filtered planInfos

391f71f

nartal1 previously approved these changes Jul 26, 2022

View reviewed changes

amahussein requested changes Jul 26, 2022

View reviewed changes

make sure to replace delimiter in SQL description so don't cause issues

f0397b3

with CSV files Signed-off-by: Thomas Graves <tgraves@nvidia.com>

tgravescs dismissed nartal1’s stale review via f0397b3 July 26, 2022 23:45

amahussein approved these changes Jul 27, 2022

View reviewed changes

tgravescs merged commit 5ba525c into NVIDIA:branch-22.08 Jul 27, 2022

tgravescs deleted the qualPerSql2 branch July 27, 2022 13:30

tgravescs mentioned this pull request Jul 27, 2022

Qualification Tool add test for SQL Description escaping commas for csv #6120

Merged

amahussein mentioned this pull request Sep 9, 2022

[BUG] per-sql unit-tests need to be added to the test generator #6537

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qualification tool output recommendations on a per sql query basis #6092

Qualification tool output recommendations on a per sql query basis #6092

tgravescs commented Jul 26, 2022

tgravescs commented Jul 26, 2022

tgravescs commented Jul 26, 2022

tgravescs commented Jul 26, 2022

nartal1 left a comment

tgravescs commented Jul 26, 2022

amahussein left a comment

tgravescs commented Jul 26, 2022

tgravescs commented Jul 26, 2022

pxLi commented Jul 27, 2022

pxLi commented Jul 27, 2022

amahussein left a comment

tgravescs commented Jul 27, 2022

tgravescs commented Jul 27, 2022

Qualification tool output recommendations on a per sql query basis #6092

Qualification tool output recommendations on a per sql query basis #6092

Conversation

tgravescs commented Jul 26, 2022

tgravescs commented Jul 26, 2022

tgravescs commented Jul 26, 2022

tgravescs commented Jul 26, 2022

nartal1 left a comment

Choose a reason for hiding this comment

tgravescs commented Jul 26, 2022

amahussein left a comment

Choose a reason for hiding this comment

tgravescs commented Jul 26, 2022

tgravescs commented Jul 26, 2022

pxLi commented Jul 27, 2022

pxLi commented Jul 27, 2022

amahussein left a comment

Choose a reason for hiding this comment

tgravescs commented Jul 27, 2022

tgravescs commented Jul 27, 2022