-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference #27053
Conversation
ResourceProfileId
resource scheduling
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #116664 has finished for PR 27053 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #116728 has finished for PR 27053 at commit
|
test this please |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #116787 has finished for PR 27053 at commit
|
Test build #117240 has finished for PR 27053 at commit
|
Test build #117243 has finished for PR 27053 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #117451 has finished for PR 27053 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test starting |
Kubernetes integration test status failure |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
This is all the code for stage level scheduling feature - except for documentation.
This is meant to be for a reference when reviewing as I'm splitting this into mulitple prs with the intention its easier to review. Note that only YARN currently supports this and it requires dynamic allocation to be enabled because currently we get new executors that match the profile exactly. We do not try to fit tasks into executors that were acquired for a different profile.
At a high level in order to support having different stages with different ResourceProfiles the changes required include:
End user api looks like this:
val rpBuilder = new ResourceProfileBuilder()
val ereq = new ExecutorResourceRequests()
val treq = new TaskResourceRequests()
ereq.cores(2).memory("6g").memoryOverhead("2g").pysparkMemory("2g").resource("gpu", 2, "/home/tgraves/getGpus")
treq.cpus(2).resource("gpu", 2)
val resourceProfile = rpBuilder.require(ereq).require(treq).build
val rdd = sc.parallelize(1 to 1000, 6).withResources(resourceProfile).map(x => (x, x))
Why are the changes needed?
Allow for different stages to use different executor/task resources
Does this PR introduce any user-facing change?
Yes the RDD.withResources and ResourceProfile, ExecutorResourceRequest, TaskResourceRequest apis
How was this patch tested?
Unit tests and manually.