Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-44394][CONNECT][WEBUI] Add a Spark UI page for Spark Connect #41964

Closed
wants to merge 4 commits into from

Conversation

jasonli-db
Copy link
Contributor

@jasonli-db jasonli-db commented Jul 12, 2023

What changes were proposed in this pull request?

Add a new Spark UI page to display session and execution information for Spark Connect. This builds of the work in SPARK-43923 (#41443) that adds the relevant SparkListenerEvents and mirrors the ThriftServerPage in the Spark UI for JDBC/ODBC.

Screenshot 2023-07-27 at 11 29 22 PM Screenshot 2023-07-27 at 11 29 15 PM

Why are the changes needed?

This gives users a way to access session and execution information for Spark Connect via the UI and provides the frontend interface for the related SparkListenerEvents.

Does this PR introduce any user-facing change?

Yes, it will add a new tab/page in the Spark UI

How was this patch tested?

Unit tests

@jasonli-db jasonli-db force-pushed the spark-connect-ui branch 4 times, most recently from 8caf083 to b54d2b8 Compare July 14, 2023 08:38
@jasonli-db jasonli-db changed the title [WIP][SPARK-44394] Add a Spark UI page for Spark Connect [SPARK-44394] Add a Spark UI page for Spark Connect Jul 14, 2023
@jasonli-db jasonli-db changed the title [SPARK-44394] Add a Spark UI page for Spark Connect [SPARK-44394][CONNECT][WEBUI] Add a Spark UI page for Spark Connect Jul 14, 2023
@jasonli-db jasonli-db marked this pull request as ready for review July 14, 2023 08:55
@jasonli-db jasonli-db force-pushed the spark-connect-ui branch 2 times, most recently from 5c70890 to 6718961 Compare July 14, 2023 20:45
@jasonli-db
Copy link
Contributor Author

@juliuszsompolski @grundprinzip Can you help review this PR? This builds off of #41443, so the commit to look at is fcf0ca6.

Also cc @rednaxelafx @gengliangwang

@gengliangwang
Copy link
Member

Since the Spark connect is more about SQL, shall we show the SQL execution links on the UI?

@jasonli-db jasonli-db force-pushed the spark-connect-ui branch 2 times, most recently from 5842eb0 to 7118923 Compare July 18, 2023 07:44
Copy link
Contributor

@juliuszsompolski juliuszsompolski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed connect service parts, didn't review ui package.

@gengliangwang
Copy link
Member

Since the Spark connect is more about SQL, shall we show the SQL execution links on the UI?

@jasonli-db What is your opinion of this one?

@jasonli-db
Copy link
Contributor Author

Since the Spark connect is more about SQL, shall we show the SQL execution links on the UI?

@jasonli-db What is your opinion of this one?
I agree that adding the SQL execution links would be ideal and useful for the user. @juliuszsompolski Can you give your input on this and confirm whether or not this is currently possible. Do we have a link between SQL execution IDs and Connect requests?

Copy link
Contributor

@juliuszsompolski juliuszsompolski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed and LGTM on the connect part; didn't review details of the UI and state store part.

Comment on lines 67 to 68
val executionIdOpt: Option[String] = Option(jobStart.properties)
.flatMap { p => Option(p.getProperty(SQLExecution.EXECUTION_ID_KEY)) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline with @jasonli-db and @jdesjean that for executions that don't start a Spark Job (e.g. SELECT 1), they will not be recorded. For that reason, as a followup, it would be best if jobTags could also be added to the SparkListenerSQLExecutionStart event.

@juliuszsompolski
Copy link
Contributor

In the screenshot above, the close time and finish time column seems duplicated. And the execution time and duration also seems quite similar. Why do we have so many columns about the timings?

@gengliangwang Finish time is finish execution, close time is when all the results were transferred; execution time is the time query was executing, while duration is end to end. It is meaningful for queries with large results, or e.g. complex queries that spend a lot of time in optimizer. It's the same with Thriftserver UI.

@jasonli-db jasonli-db force-pushed the spark-connect-ui branch 2 times, most recently from ff8f815 to 59b415c Compare July 28, 2023 00:22
@jasonli-db
Copy link
Contributor Author

@jasonli-db Please try more tests. I did a quick test and find the SQL IDs need deduplication. image

Thanks for catching this. It's been fixed

@jasonli-db
Copy link
Contributor Author

Nit: we can have follow-up to improve the page when a session doesn't exist(e.g. evicted from KV store): image

For example, id not existed is handled in the SQL page: image

Addressed to match the behavior for the SQL page.

@jasonli-db
Copy link
Contributor Author

OperationId could indeed maybe be more useful instead. If a user uses connect api like SparkSession.interruptOperation, they pass just operationId. The only time a user would want to use jobTag is if they wanted to directly call SparkContext.interruptJobsWithTag, which is a rather internal use. We could consider displaying SparkListenerConnectOperationStarted.sparkSessionTags, because these are uses settable (via connect SparkSession.addTag / removeTag / clearTags), and they are used by users to call connect SparkSession.interruptTag.

@juliuszsompolski @gengliangwang I've updated the screenshots. I added the operationID and sparkSessionTags. For now, I kept the jobTag since it seems it could still be used if only internally. I'll remove it if/when we come to a final consensus to do so. Please take a look!

@gengliangwang
Copy link
Member

@jasonli-db I have to say to the current table is quite wide.
image

Can we at least put the operationID column behind the column "State" since it is not commonly useful?

A good way to resolve this is to have checkboxes and make operationID/sparkSessionTags/jobTag as additional metrics:
image
This can be done in a followup

</span>
</td>
<td>
{if (info.isExecutionActive) "RUNNING" else info.state}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think it makes more sense, thanks!

@gengliangwang
Copy link
Member

@jasonli-db Thanks for the work. I will merge the PR once the tests are passed.

@gengliangwang
Copy link
Member

gengliangwang commented Jul 29, 2023

Thanks, merging to master

@gengliangwang
Copy link
Member

@jasonli-db This PR has conflict with the branch-3.5
Please cherry-pick it before the RC1 on Aug 1

juliuszsompolski pushed a commit to juliuszsompolski/apache-spark that referenced this pull request Jul 29, 2023
## What changes were proposed in this pull request?
Add a new Spark UI page to display session and execution information for Spark Connect. This builds of the work in SPARK-43923 (apache#41443) that adds the relevant SparkListenerEvents and mirrors the ThriftServerPage in the Spark UI for JDBC/ODBC.

<img width="1709" alt="Screenshot 2023-07-27 at 11 29 22 PM" src="https://github.com/apache/spark/assets/65624911/934b7c69-3b44-460b-8fbb-36a9eb3f0798">

<img width="1716" alt="Screenshot 2023-07-27 at 11 29 15 PM" src="https://github.com/apache/spark/assets/65624911/33dbe6ab-44bf-49a5-ad4c-5ba4a476a1f0">

### Why are the changes needed?
This gives users a way to access session and execution information for Spark Connect via the UI and provides the frontend interface for the related SparkListenerEvents.

### Does this PR introduce _any_ user-facing change?
Yes, it will add a new tab/page in the Spark UI

### How was this patch tested?
Unit tests

Closes apache#41964 from jasonli-db/spark-connect-ui.

Authored-by: Jason Li <jason.li@databricks.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(cherry picked from commit f8786f0)
@juliuszsompolski
Copy link
Contributor

Thanks @gengliangwang . Since the conflict was trivial (imports), I resolved and raised #42224 for branch-3.5.

gengliangwang pushed a commit that referenced this pull request Jul 29, 2023
## What changes were proposed in this pull request?

Add a new Spark UI page to display session and execution information for Spark Connect. This builds of the work in SPARK-43923 (#41443) that adds the relevant SparkListenerEvents and mirrors the ThriftServerPage in the Spark UI for JDBC/ODBC.

<img width="1709" alt="Screenshot 2023-07-27 at 11 29 22 PM" src="https://github.com/apache/spark/assets/65624911/934b7c69-3b44-460b-8fbb-36a9eb3f0798">

<img width="1716" alt="Screenshot 2023-07-27 at 11 29 15 PM" src="https://github.com/apache/spark/assets/65624911/33dbe6ab-44bf-49a5-ad4c-5ba4a476a1f0">

### Why are the changes needed?
This gives users a way to access session and execution information for Spark Connect via the UI and provides the frontend interface for the related SparkListenerEvents.

### Does this PR introduce _any_ user-facing change? Yes, it will add a new tab/page in the Spark UI

### How was this patch tested?
Unit tests

Closes #41964 from jasonli-db/spark-connect-ui.

Authored-by: Jason Li <jason.lidatabricks.com>
Signed-off-by: Gengliang Wang <gengliangapache.org>
(cherry picked from commit f8786f0)

Closes #42224 from juliuszsompolski/SPARK-44394-3.5.

Authored-by: Jason Li <jason.li@databricks.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
ragnarok56 pushed a commit to ragnarok56/spark that referenced this pull request Mar 2, 2024
## What changes were proposed in this pull request?
Add a new Spark UI page to display session and execution information for Spark Connect. This builds of the work in SPARK-43923 (apache#41443) that adds the relevant SparkListenerEvents and mirrors the ThriftServerPage in the Spark UI for JDBC/ODBC.

<img width="1709" alt="Screenshot 2023-07-27 at 11 29 22 PM" src="https://github.com/apache/spark/assets/65624911/934b7c69-3b44-460b-8fbb-36a9eb3f0798">

<img width="1716" alt="Screenshot 2023-07-27 at 11 29 15 PM" src="https://github.com/apache/spark/assets/65624911/33dbe6ab-44bf-49a5-ad4c-5ba4a476a1f0">

### Why are the changes needed?
This gives users a way to access session and execution information for Spark Connect via the UI and provides the frontend interface for the related SparkListenerEvents.

### Does this PR introduce _any_ user-facing change?
Yes, it will add a new tab/page in the Spark UI

### How was this patch tested?
Unit tests

Closes apache#41964 from jasonli-db/spark-connect-ui.

Authored-by: Jason Li <jason.li@databricks.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants