Skip to content

Commit

Permalink
[SPARK-2025] Unpersist edges of previous graph in Pregel
Browse files Browse the repository at this point in the history
Due to a bug introduced by #497, Pregel does not unpersist replicated vertices from previous iterations. As a result, they stay cached until memory is full, wasting GC time.

This PR corrects the problem by unpersisting both the edges and the replicated vertices of previous iterations. This is safe because the edges and replicated vertices of the current iteration are cached by the call to `g.cache()` and then materialized by the call to `messages.count()`. Therefore no unmaterialized RDDs depend on `prevG.edges`. I verified that no recomputation occurs by running PageRank with a custom patch to Spark that warns when a partition is recomputed.

Thanks to Tim Weninger for reporting this bug.

Author: Ankur Dave <ankurdave@gmail.com>

Closes #972 from ankurdave/SPARK-2025 and squashes the following commits:

13d5b07 [Ankur Dave] Unpersist edges of previous graph in Pregel

(cherry picked from commit 9bad0b7)
Signed-off-by: Reynold Xin <rxin@apache.org>
  • Loading branch information
ankurdave authored and rxin committed Jun 6, 2014
1 parent 4ac8135 commit 715fbfa
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ object Pregel extends Logging {
oldMessages.unpersist(blocking=false)
newVerts.unpersist(blocking=false)
prevG.unpersistVertices(blocking=false)
prevG.edges.unpersist(blocking=false)
// count the iteration
i += 1
}
Expand Down

0 comments on commit 715fbfa

Please sign in to comment.