-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qualification tool output recommendations on a per sql query basis #6092
Conversation
… basis Signed-off-by: Thomas Graves <tgraves@apache.org>
build |
looks like I didn't upmerge to latest, will update shortly |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Just a question on perSqlInfo
tools/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala
Outdated
Show resolved
Hide resolved
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CSV file format may get corrupted by the SQL description.
The SQL descriptions seem to include ","
Spark shell,app-20210509200722-0001,105,Execution: ss_max-v2.4, iteration: 1, StandardRun=true,335,0,335.0,1.0,0.0,Not Recommended
The above row has two extra columns compared to other ones.
ah yep, needs to replace those |
with CSV files Signed-off-by: Thomas Graves <tgraves@nvidia.com>
build |
unrelated |
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes the bug. It is concerning that the fix did not require any change in the unit tests.
Later, we may consider covering this in the unit tests.
LGTME.
Sorry didn't hit return on my comment last night: want to come up with a test to check this still but could be followup. I didn't want to block the UI part |
I'm going to merge this and followup with a test in separate pr. |
fixes #5502
This adds in an option to allow a user to get the recommendations at the per sql query level in addition to the application level. If an application has a lot of sql queries, this lets you narrow it down further to figure out which queries might benefit from being on the GPU the most.
2 new output files rapids_4_spark_qualification_output_persql.log and rapids_4_spark_qualification_output_persql.csv for text format and then csv format, contents are the same between them for now.
output looks like:
2 new options added, one to turn this output on and one to control the size of the sql description printed. Default is 100 there. The normal options qualification tool take apply to this table as well. Such as order and limiting number of rows.
This does not contain the UI changes.