-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-43952][CORE][CONNECT][SQL] Add SparkContext APIs for query cancellation by tag #41440
Conversation
If we don't want to add public APIs like that, I'm also fine with having all that as |
I am fine with adding them as a public API |
cc @gengliangwang - I also added the job tags to AppStatusTracker. |
@gengliangwang it does complain about the incompatible constructor to JobData in MIMA check:
is this a part of API that needs a compatible constructor to be added, or is it something to filter and ignore? |
Resolved problem after merge. |
@gengliangwang @rednaxelafx I updated the SHS tests canons. I claim it's fine because it's and additive change that should be backwards compatible (older versions ignoring this field) and forwards compatible (new versions reading old data with this missing will get empty) |
core/src/test/scala/org/apache/spark/status/protobuf/KVStoreProtobufSerializerSuite.scala
Show resolved
Hide resolved
LGTM on the UI-related changes. |
https://github.com/juliuszsompolski/apache-spark/actions/runs/5278491450/jobs/9547955928
flaky again, retrigger again |
...t/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala
Show resolved
Hide resolved
https://github.com/juliuszsompolski/apache-spark/actions/runs/5280624891/jobs/9553060736 @HyukjinKwon @hvanhovell I have restarted CI on this PR 4 times, and each time I got a failed run due to a different flake. Do I need to keep restarting until I get a clean run? |
It should be deflake now. |
* | ||
* @param interruptOnCancel If true, then job cancellation will result in `Thread.interrupt()` | ||
* being called on the job's executor threads. This is useful to help ensure that the tasks | ||
* are actually stopped in a timely manner, but is off by default due to HDFS-1208, where HDFS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someone should check if this is still a thing :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HDFS-1208 bug is still open... but multiple places in core of Spark has by now elected to just pass true
here, so it likely doesn't make sense for the user to set it to false
, as these places would generate interrupts anyway... But removing it completely would be orthogonal to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just "nitting" 😉
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
Outdated
Show resolved
Hide resolved
https://github.com/juliuszsompolski/apache-spark/actions/runs/5321069534/jobs/9636800137
Unrelated failure. |
I guess https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/47350391 has just issues at the moment.
|
AppVeyor is fine. Merged to master |
### What changes were proposed in this pull request? Add APIs from #41440 to SparkR: * addJobTag(tag) * removeJobTag(tag) * getJobTags() * clearJobTags() * cancelJobsWithTag() * setInterruptOnCancel(tag) Additionally: * fix a bug in removeJobTag when the last tag is removed (should be left with empty tags, not an empty string tag) * fix comments to cancelJobsWithTag * add a few defensive reinforcements against an empty string tag as a result of missing property / removing last tag. ### Why are the changes needed? SparkR parity. ### Does this PR introduce _any_ user-facing change? Yes, introduce the APIs introduced in Scala in #41440 to SparkR ### How was this patch tested? Added test. Closes #41742 from juliuszsompolski/SPARK-44195. Authored-by: Juliusz Sompolski <julek@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
interruptOnCancel = true) | ||
// Setup a job tag here so later it may get cancelled by tag if necessary. | ||
sparkContext.addJobTag(jobTag) | ||
sparkContext.setInterruptOnCancel(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @ulysses-you FYI. Now we can cancel the jobs in boradcast like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see we support cancel job with tag!
Hi @juliuszsompolski, I got a local build failure related to the master branch (5d840eb). For example, when I try to run
|
@shuwang21 these functions come from code generated by https://github.com/apache/spark/pull/41440/files#diff-28490b011ea75c4442d8ac95a3d8599abaffcd050672ed3416d7d259a22057c7R50 |
I see. Thank you. @juliuszsompolski Solved. Need to recompile the project. |
What changes were proposed in this pull request?
Currently, the only way to cancel running Spark Jobs is by using
SparkContext.cancelJobGroup
, using a job group name that was previously set usingSparkContext.setJobGroup
. This is problematic if multiple different parts of the system want to do cancellation, and set their own ids.For example, BroadcastExchangeExec sets it's own job group, which may override job group set by user. This way, if user cancels the job group they set in the "parent" execution, it will not cancel these broadcast jobs launches from within their jobs. It would also be useful in e.g. Spark Connect to be able to cancel jobs without overriding jobGroupId, which may be used and needed for other purposes.
As a solution, consider add API to set tags on jobs, and to cancel jobs using tags:
SparkContext.addJobTag(tag: String): Unit
SparkContext.removeJobTag(tag: String): Unit
SparkContext.getJobTags(): Set[String]
SparkContext.clearJobTags(): Unit
SparkContext.cancelJobsWithTag(tag: String): Unit
DAGScheduler.cancelJobsWithTag(tag: String): Unit
Also added
SparkContext.setInterruptOnCancel(interruptOnCancel: Boolean): Unit
, which previously could only be set insetJobGroup
.The tags are also added to
JobData
andAppStatusTracker
. A new API is added toSparkStatusTracker
:SparkStatusTracker.getJobIdsForTag(jobTag: String): Array[Int]
Use the new API internally in BroadcastExchangeExec instead of cancellation using job group, to fix the issue with these not being cancelled by user-set jobgroupid. Now, the user set jobgroupid should propagate into broadcast execution.
Also, switch cancellation in Spark Connect to use tag instead of jobgroup.
Why are the changes needed?
Currently, there may be multiple places that want to cancel a set of jobs, with different scopes.
Does this PR introduce any user-facing change?
The APIs described above are added.
How was this patch tested?
Added test to JobCancellationSuite.