Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50385][CORE] Use class name prefix for REST Submission API thread pool #48924

Closed
wants to merge 2 commits into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Nov 21, 2024

What changes were proposed in this pull request?

This PR aims to use a meaningful class name prefix for REST Submission API thread pool instead of the default value of Jetty QueuedThreadPool, "qtp"+super.hashCode().

https://github.com/dekellum/jetty/blob/3dc0120d573816de7d6a83e2d6a97035288bdd4a/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java#L64

Why are the changes needed?

This is helpful during JVM investigation.

BEFORE (4.0.0-preview2)

$ SPARK_MASTER_OPTS='-Dspark.master.rest.enabled=true' sbin/start-master.sh
$ jstack 28217 | grep qtp
"qtp1925630411-52" #52 daemon prio=5 os_prio=31 cpu=0.07ms elapsed=19.06s tid=0x0000000134906c10 nid=0xde03 runnable  [0x0000000314592000]
"qtp1925630411-53" #53 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=19.06s tid=0x0000000134ac6810 nid=0xc603 runnable  [0x000000031479e000]
"qtp1925630411-54" #54 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=19.06s tid=0x000000013491ae10 nid=0xdc03 runnable  [0x00000003149aa000]
"qtp1925630411-55" #55 daemon prio=5 os_prio=31 cpu=0.08ms elapsed=19.06s tid=0x0000000134ac9810 nid=0xc803 runnable  [0x0000000314bb6000]
"qtp1925630411-56" #56 daemon prio=5 os_prio=31 cpu=0.04ms elapsed=19.06s tid=0x0000000134ac9e10 nid=0xda03 runnable  [0x0000000314dc2000]
"qtp1925630411-57" #57 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=19.06s tid=0x0000000134aca410 nid=0xca03 runnable  [0x0000000314fce000]
"qtp1925630411-58" #58 daemon prio=5 os_prio=31 cpu=0.04ms elapsed=19.06s tid=0x0000000134acaa10 nid=0xcb03 runnable  [0x00000003151da000]
"qtp1925630411-59" #59 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=19.06s tid=0x0000000134acb010 nid=0xcc03 runnable  [0x00000003153e6000]
"qtp1925630411-60-acceptor-0@108e9815-ServerConnector@1e497474{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #60 daemon prio=3 os_prio=31 cpu=0.11ms elapsed=19.06s tid=0x00000001317ffa10 nid=0xcd03 runnable  [0x00000003155f2000]
"qtp1925630411-61-acceptor-1@1d90f2aa-ServerConnector@1e497474{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #61 daemon prio=3 os_prio=31 cpu=0.10ms elapsed=19.06s tid=0x00000001314ed610 nid=0xcf03 waiting on condition  [0x00000003157fe000]

AFTER

$ SPARK_MASTER_OPTS='-Dspark.master.rest.enabled=true' sbin/start-master.sh
$ jstack 28317 | grep StandaloneRestServer
"StandaloneRestServer-52" #52 daemon prio=5 os_prio=31 cpu=0.09ms elapsed=60.06s tid=0x00000001284a8e10 nid=0xdb03 runnable  [0x000000032cfce000]
"StandaloneRestServer-53" #53 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284acc10 nid=0xda03 runnable  [0x000000032d1da000]
"StandaloneRestServer-54" #54 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284ae610 nid=0xd803 runnable  [0x000000032d3e6000]
"StandaloneRestServer-55" #55 daemon prio=5 os_prio=31 cpu=0.09ms elapsed=60.06s tid=0x00000001284aec10 nid=0xd703 runnable  [0x000000032d5f2000]
"StandaloneRestServer-56" #56 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284af210 nid=0xc803 runnable  [0x000000032d7fe000]
"StandaloneRestServer-57" #57 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284af810 nid=0xc903 runnable  [0x000000032da0a000]
"StandaloneRestServer-58" #58 daemon prio=5 os_prio=31 cpu=0.06ms elapsed=60.06s tid=0x00000001284afe10 nid=0xcb03 runnable  [0x000000032dc16000]
"StandaloneRestServer-59" #59 daemon prio=5 os_prio=31 cpu=0.05ms elapsed=60.06s tid=0x00000001284b0410 nid=0xcc03 runnable  [0x000000032de22000]
"StandaloneRestServer-60-acceptor-0@4aefbaa8-ServerConnector@44284d85{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #60 daemon prio=3 os_prio=31 cpu=0.13ms elapsed=60.05s tid=0x000000015cda1a10 nid=0xcd03 runnable  [0x000000032e02e000]
"StandaloneRestServer-61-acceptor-1@48976251-ServerConnector@44284d85{HTTP/1.1, (http/1.1)}{M3-Max.local:6066}" #61 daemon prio=3 os_prio=31 cpu=0.12ms elapsed=60.05s tid=0x000000015cd1c810 nid=0xce03 waiting on condition  [0x000000032e23a000]

Does this PR introduce any user-facing change?

No, the thread names are accessed during the debugging.

How was this patch tested?

Manual review.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the CORE label Nov 21, 2024
@dongjoon-hyun
Copy link
Member Author

Could you review this PR, @panbingkun ?

@panbingkun
Copy link
Contributor

LGTM, +1

@@ -93,6 +93,7 @@ private[spark] abstract class RestSubmissionServer(
*/
private def doStart(startPort: Int): (Server, Int) = {
val threadPool = new QueuedThreadPool(masterConf.get(MASTER_REST_SERVER_MAX_THREADS))
threadPool.setName(getClass().getSimpleName())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Default value:
image
  • We manually set a value here:
image

@dongjoon-hyun
Copy link
Member Author

Thank you so much, @panbingkun .

@dongjoon-hyun
Copy link
Member Author

All tests passed already in the first commit, 6c9bdd1.

I just resolved the conflicts.

Since this is a one-liner PR, could you try to merge this PR via the following script, @panbingkun ?

Copy link
Member

@yaooqinn yaooqinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM, thank you @dongjoon-hyun

@panbingkun
Copy link
Contributor

All tests passed already in the first commit, 6c9bdd1.

I just resolved the conflicts.

Since this is a one-liner PR, could you try to merge this PR via the following script, @panbingkun ?

Okay, let me try, Thank you very much for providing me with such detailed guidance, ❤️!

@panbingkun
Copy link
Contributor

Merged to master.
Thank you, @dongjoon-hyun and @yaooqinn.

@panbingkun
Copy link
Contributor

Thank you again to @dongjoon-hyun for guiding me step by step towards progress ❤️ !

@dongjoon-hyun
Copy link
Member Author

Thank you, @panbingkun and @yaooqinn .

To @panbingkun , I'm happy we are in the same Spark community! 😄

Thank you again to @dongjoon-hyun for guiding me step by step towards progress ❤️ !

@dongjoon-hyun dongjoon-hyun deleted the SPARK-50385 branch November 22, 2024 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants