-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broadcast task sql test [DO NOT MERGE] #1654
Closed
Closed
Commits on Jul 30, 2014
-
[SPARK-2521] Broadcast RDD object (instead of sending it along with e…
…very task). Currently (as of Spark 1.0.1), Spark sends RDD object (which contains closures) using Akka along with the task itself to the executors. This is inefficient because all tasks in the same stage use the same RDD object, but we have to send RDD object multiple times to the executors. This is especially bad when a closure references some variable that is very large. The current design led to users having to explicitly broadcast large variables. The patch uses broadcast to send RDD objects and the closures to executors, and use Akka to only send a reference to the broadcast RDD/closure along with the partition specific information for the task. For those of you who know more about the internals, Spark already relies on broadcast to send the Hadoop JobConf every time it uses the Hadoop input, because the JobConf is large. The user-facing impact of the change include: 1. Users won't need to decide what to broadcast anymore, unless they would want to use a large object multiple times in different operations 2. Task size will get smaller, resulting in faster scheduling and higher task dispatch throughput. In addition, the change will simplify some internals of Spark, eliminating the need to maintain task caches and the complex logic to broadcast JobConf (which also led to a deadlock recently). A simple way to test this: ```scala val a = new Array[Byte](1000*1000); scala.util.Random.nextBytes(a); sc.parallelize(1 to 1000, 1000).map { x => a; x }.groupBy { x => a; x }.count ``` Numbers on 3 r3.8xlarge instances on EC2 ``` master branch: 5.648436068 s, 4.715361895 s, 5.360161877 s with this change: 3.416348793 s, 1.477846558 s, 1.553432156 s ``` Author: Reynold Xin <rxin@apache.org> Closes apache#1452 from rxin/broadcast-task and squashes the following commits: 762e0be [Reynold Xin] Warn large broadcasts. ade6eac [Reynold Xin] Log broadcast size. c3b6f11 [Reynold Xin] Added a unit test for clean up. 754085f [Reynold Xin] Explain why broadcasting serialized copy of the task. 04b17f0 [Reynold Xin] [SPARK-2521] Broadcast RDD object once per TaskSet (instead of sending it for every task). (cherry picked from commit 7b8cd17) Signed-off-by: Reynold Xin <rxin@apache.org>
Configuration menu - View commit details
-
Copy full SHA for 102bc8f - Browse repository at this point
Copy the full SHA 102bc8fView commit details -
Configuration menu - View commit details
-
Copy full SHA for bd11028 - Browse repository at this point
Copy the full SHA bd11028View commit details -
Configuration menu - View commit details
-
Copy full SHA for c1e03a8 - Browse repository at this point
Copy the full SHA c1e03a8View commit details -
Configuration menu - View commit details
-
Copy full SHA for fc64e01 - Browse repository at this point
Copy the full SHA fc64e01View commit details -
Configuration menu - View commit details
-
Copy full SHA for 87948d5 - Browse repository at this point
Copy the full SHA 87948d5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5980457 - Browse repository at this point
Copy the full SHA 5980457View commit details -
Configuration menu - View commit details
-
Copy full SHA for 023719d - Browse repository at this point
Copy the full SHA 023719dView commit details -
Configuration menu - View commit details
-
Copy full SHA for d057b27 - Browse repository at this point
Copy the full SHA d057b27View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1f861b2 - Browse repository at this point
Copy the full SHA 1f861b2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3c8f496 - Browse repository at this point
Copy the full SHA 3c8f496View commit details -
Configuration menu - View commit details
-
Copy full SHA for f670886 - Browse repository at this point
Copy the full SHA f670886View commit details -
Configuration menu - View commit details
-
Copy full SHA for f0fefdf - Browse repository at this point
Copy the full SHA f0fefdfView commit details -
Configuration menu - View commit details
-
Copy full SHA for aab05e5 - Browse repository at this point
Copy the full SHA aab05e5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6810286 - Browse repository at this point
Copy the full SHA 6810286View commit details -
Configuration menu - View commit details
-
Copy full SHA for 40857a1 - Browse repository at this point
Copy the full SHA 40857a1View commit details -
Configuration menu - View commit details
-
Copy full SHA for e40a9ac - Browse repository at this point
Copy the full SHA e40a9acView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7127096 - Browse repository at this point
Copy the full SHA 7127096View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.