Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qualification tool output recommendations on a per sql query basis #6092

Merged
merged 7 commits into from
Jul 27, 2022

Conversation

tgravescs
Copy link
Collaborator

fixes #5502

This adds in an option to allow a user to get the recommendations at the per sql query level in addition to the application level. If an application has a lot of sql queries, this lets you narrow it down further to figure out which queries might benefit from being on the GPU the most.

2 new output files rapids_4_spark_qualification_output_persql.log and rapids_4_spark_qualification_output_persql.csv for text format and then csv format, contents are the same between them for now.

output looks like:

==========================================================================================================================================================================================================================================================
|                              App Name|             App ID|SQL ID|                                            SQL Description|SQL DF Duration|GPU Opportunity|Estimated GPU Duration|Estimated GPU Speedup|Estimated GPU Time Saved|      Recommendation|
==========================================================================================================================================================================================================================================================
|Rapids Spark Profiling Tool Unit Tests|local-1622043423018|     1|                   count at QualificationInfoUtils.scala:94|           7143|           6719|               2716.79|                 2.62|                  4426.2|Strongly Recommended|
|Rapids Spark Profiling Tool Unit Tests|local-1622043423018|     3|                   count at QualificationInfoUtils.scala:94|           2052|           1660|                958.28|                 2.14|                 1093.71|         Recommended|
|Rapids Spark Profiling Tool Unit Tests|local-1622043423018|     2|                   count at QualificationInfoUtils.scala:94|           1933|           1551|                 911.3|                 2.12|                 1021.69|         Recommended|
|                           Spark shell|local-1651187225439|     0|                                       show at <console>:26|            498|            249|                 373.5|                 1.33|                   124.5|         Recommended|

2 new options added, one to turn this output on and one to control the size of the sql description printed. Default is 100 there. The normal options qualification tool take apply to this table as well. Such as order and limiting number of rows.

This does not contain the UI changes.

@tgravescs tgravescs added this to the July 22 - Aug 5 milestone Jul 26, 2022
@tgravescs tgravescs self-assigned this Jul 26, 2022
@tgravescs
Copy link
Collaborator Author

build

@tgravescs
Copy link
Collaborator Author

looks like I didn't upmerge to latest, will update shortly

@tgravescs
Copy link
Collaborator Author

build

Copy link
Collaborator

@nartal1 nartal1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Just a question on perSqlInfo

@tgravescs
Copy link
Collaborator Author

build

nartal1
nartal1 previously approved these changes Jul 26, 2022
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CSV file format may get corrupted by the SQL description.
The SQL descriptions seem to include ","

Spark shell,app-20210509200722-0001,105,Execution: ss_max-v2.4, iteration: 1, StandardRun=true,335,0,335.0,1.0,0.0,Not Recommended

The above row has two extra columns compared to other ones.

@tgravescs
Copy link
Collaborator Author

ah yep, needs to replace those

with CSV files

Signed-off-by: Thomas Graves <tgraves@nvidia.com>
@tgravescs
Copy link
Collaborator Author

build

@pxLi
Copy link
Collaborator

pxLi commented Jul 27, 2022

unrelated Unable to find py4j in failed the CI intermittently. let me retrigger

@pxLi
Copy link
Collaborator

pxLi commented Jul 27, 2022

build

Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the bug. It is concerning that the fix did not require any change in the unit tests.
Later, we may consider covering this in the unit tests.

LGTME.

@tgravescs
Copy link
Collaborator Author

Sorry didn't hit return on my comment last night: want to come up with a test to check this still but could be followup. I didn't want to block the UI part

@tgravescs
Copy link
Collaborator Author

I'm going to merge this and followup with a test in separate pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Qualification tool should use SQL ID of each Application ID like profiling tool
4 participants