-
Notifications
You must be signed in to change notification settings - Fork 28.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-1931] Reconstruct routing tables in Graph.partitionBy
905173d introduced a bug in partitionBy where, after repartitioning the edges, it reuses the VertexRDD without updating the routing tables to reflect the new edge layout. Subsequent accesses of the triplets contain nulls for many vertex properties. This commit adds a test for this bug and fixes it by introducing `VertexRDD#withEdges` and calling it in `partitionBy`. Author: Ankur Dave <ankurdave@gmail.com> Closes #885 from ankurdave/SPARK-1931 and squashes the following commits: 3930cdd [Ankur Dave] Note how to set up VertexRDD for efficient joins 9bdbaa4 [Ankur Dave] [SPARK-1931] Reconstruct routing tables in Graph.partitionBy
- Loading branch information
Showing
3 changed files
with
31 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
56c771c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ankur - I am seeing something strange with outerJoinVertices(and triangle count that relies on this api), and I was wondering if that could be related to this bug that you are patching?(i.e. likely fixed by your patch?).
Here is what I am doing:
I ran this program in a loop where minEdgePartitions is changed in each iteration. When minEdgePartitions == 1 I see correct number of edges. When minEdgePartitions == 2 result is ~1/2 number of edges; when minEdgePartitions == 3 result is ~1/3 number of edges and so on
It seems that outerJoinVertices is returning srcAttr/dstAtt = nulll for many attributes; and from numbers it seems that it might be returning null for vertices residing on other partitions ?
Environment : I am using RC5; and 22 executers.
BUT I get correct number of edges in each iteration when I repeated my experiment by keeping the vertex attribute type Int in step 2 (i.e. just kept the number of vertices instead of array of vertices), which is same as the type vertex attribute in graph before join.