PG: join new vertex data by vertex ids #2796

eriknw · 2022-10-12T04:26:50Z

We are using the assumption that there should be a single row for each vertex id.

This does not yet handle MG. Let's figure out how we want SG to behave first.

Fixes rapidsai#2793. We are using the assumption that there should be a single row for each vertex id.

codecov-commenter · 2022-10-12T22:29:07Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.12@e4ed302). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files

@@               Coverage Diff               @@
##             branch-22.12    #2796   +/-   ##
===============================================
  Coverage                ?   62.60%           
===============================================
  Files                   ?      118           
  Lines                   ?     6570           
  Branches                ?        0           
===============================================
  Hits                    ?     4113           
  Misses                  ?     2457           
  Partials                ?        0

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@VibhuJawa

This should fix the quadratic scaling we're seeing when adding new data. CC @VibhuJawa. I'm still trying to improve the merges for MG to be like #2796, but I'm encountering issues. Authors: - Erik Welch (https://github.com/eriknw) Approvers: - Vibhu Jawa (https://github.com/VibhuJawa) - Rick Ratzel (https://github.com/rlratzel) - Alex Barghi (https://github.com/alexbarghi-nv) URL: #2805

eriknw · 2022-10-26T06:32:42Z

This PR needs rapidsai/cudf#11998 to pass tests.

I think we ought to try to benchmark this before merging.

Here is the mini-benchmark bench_add_edges_cyber for "branch-22.12" branch (I added [50] for kicks):

------------------------------------------------------------- benchmark: 5 tests ------------------------------------------------------------
Name (time in s, mem in bytes)        Mean                GPU mem            GPU Leaked mem            Rounds            GPU Rounds
---------------------------------------------------------------------------------------------------------------------------------------------
bench_add_edges_cyber[1]            1.3833 (1.0)      273,342,072 (1.0)                 144 (1.0)           1           1
bench_add_edges_cyber[3]            2.8139 (2.03)     273,342,968 (1.00)              1,032 (7.17)          1           1
bench_add_edges_cyber[10]           6.7457 (4.88)     273,343,656 (1.00)              1,712 (11.89)         1           1
bench_add_edges_cyber[30]          18.8487 (13.63)    273,344,440 (1.00)              2,528 (17.56)         1           1
bench_add_edges_cyber[50]          30.2863 (21.89)    273,346,992 (1.00)              5,048 (35.06)         1           1
---------------------------------------------------------------------------------------------------------------------------------------------

and here is the benchmark for the current branch

--------------------------------------------------------------- benchmark: 5 tests ---------------------------------------------------------------
Name (time in ms, mem in bytes)            Mean                GPU mem            GPU Leaked mem            Rounds            GPU Rounds
--------------------------------------------------------------------------------------------------------------------------------------------------
bench_add_edges_cyber[1]               801.3284 (1.0)      273,342,224 (1.0)                 280 (1.0)           1           1
bench_add_edges_cyber[3]             1,723.6213 (2.15)     273,342,672 (1.00)                496 (1.77)          1           1
bench_add_edges_cyber[10]            4,731.0946 (5.90)     273,343,672 (1.00)              1,360 (4.86)          1           1
bench_add_edges_cyber[30]           13,608.0636 (16.98)    273,343,448 (1.00)              1,256 (4.49)          1           1
bench_add_edges_cyber[50]           22,819.4191 (28.48)    273,348,912 (1.00)              6,488 (23.17)         1           1
--------------------------------------------------------------------------------------------------------------------------------------------------

We shouldn't yet read too much into these numbers, but this PR actually isn't as fast as I was expecting. I'll investigate further and experiment with MAG240 dataset.

@rlratzel, this reworks the merge in add_vertex_data and add_edge_data, which seemed to be a particularly sensitive part of the code base. I think this PR can begin to be reviewed even though we're blocked on cudf PR for the tests.

VibhuJawa

Minor clarification. Looks good other wise.

VibhuJawa · 2022-11-07T06:27:15Z

python/cugraph/cugraph/dask/structure/mg_property_graph.py

+            if df.npartitions > 2 * self.__num_workers:
+                # TODO: better understand behavior of npartitions argument in join
+                df = df.repartition(npartitions=self.__num_workers).persist()


With dask_cudf merges/operations, I think we should not actually decrease number of partitions to less than or equal to _num_workers unless we actually need to because it decrease parallelization and worker starving can become a problem.

Though the shuffle cost of the merge is nlogn but if your input partitions are less than available workers , worker starving can become a problem.

Maybe lets pick 2*_num_workers or something ?

That sounds reasonable. Done. I think we may want to explore these values as we scale out.

I know it's bad if we don't repartition though and the number of partitions grows too large.

Agreed . Yup , it becomes really slow as you increase partitions . In case you are curios on how the shuffle works under the hood, below is my favorite explanation by rjzamora which i think still holds true.

rapidsai/cudf#4308 (comment)

VibhuJawa

LGTM

BradReesWork · 2022-11-09T15:05:51Z

@gpucibot merge

PG: join new vertex data by vertex ids

4e75c0a

Fixes rapidsai#2793. We are using the assumption that there should be a single row for each vertex id.

eriknw requested a review from a team as a code owner October 12, 2022 04:26

eriknw mentioned this pull request Oct 12, 2022

[BUG]Adding data to same nodes twice in a PG results in incorrect __vertex_prop_dataframe #2793

Closed

Merge branch 'branch-22.12' into join_vertex_ids

e4f6abc

eriknw marked this pull request as draft October 13, 2022 00:27

alexbarghi-nv assigned eriknw Oct 13, 2022

alexbarghi-nv added non-breaking Non-breaking change Fix labels Oct 13, 2022

alexbarghi-nv added this to the 22.12 milestone Oct 13, 2022

alexbarghi-nv added the improvement Improvement / enhancement to an existing function label Oct 13, 2022

eriknw mentioned this pull request Oct 13, 2022

Persist more in MGPropertyGraph #2805

Merged

eriknw added 4 commits October 24, 2022 22:12

Update join for add edge data, and for MG (which is failing)

6de7699

Merge branch 'branch-22.12' into join_vertex_ids

bc56d57

Fix SG PG tests

4531779

Fix tests, and repartition after join when too many partitions

18de9b2

Merge branch 'branch-22.12' into join_vertex_ids

706e425

eriknw marked this pull request as ready for review November 3, 2022 04:53

BradReesWork requested review from rlratzel and VibhuJawa November 3, 2022 14:20

BradReesWork approved these changes Nov 3, 2022

View reviewed changes

VibhuJawa suggested changes Nov 7, 2022

View reviewed changes

eriknw added 2 commits November 8, 2022 11:11

Repartition to 2 * num_workers

d21030c

Merge branch 'branch-22.12' into join_vertex_ids

4a19d8a

VibhuJawa approved these changes Nov 8, 2022

View reviewed changes

rapids-bot bot merged commit 7387fbc into rapidsai:branch-22.12 Nov 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PG: join new vertex data by vertex ids #2796

PG: join new vertex data by vertex ids #2796

eriknw commented Oct 12, 2022 •

edited by rlratzel

Loading

codecov-commenter commented Oct 12, 2022 •

edited

Loading

eriknw commented Oct 26, 2022

VibhuJawa left a comment

VibhuJawa Nov 7, 2022 •

edited

Loading

eriknw Nov 8, 2022

VibhuJawa Nov 8, 2022

VibhuJawa left a comment

BradReesWork commented Nov 9, 2022

PG: join new vertex data by vertex ids #2796

PG: join new vertex data by vertex ids #2796

Conversation

eriknw commented Oct 12, 2022 • edited by rlratzel Loading

codecov-commenter commented Oct 12, 2022 • edited Loading

Codecov Report

eriknw commented Oct 26, 2022

VibhuJawa left a comment

Choose a reason for hiding this comment

VibhuJawa Nov 7, 2022 • edited Loading

Choose a reason for hiding this comment

eriknw Nov 8, 2022

Choose a reason for hiding this comment

VibhuJawa Nov 8, 2022

Choose a reason for hiding this comment

VibhuJawa left a comment

Choose a reason for hiding this comment

BradReesWork commented Nov 9, 2022

eriknw commented Oct 12, 2022 •

edited by rlratzel

Loading

codecov-commenter commented Oct 12, 2022 •

edited

Loading

VibhuJawa Nov 7, 2022 •

edited

Loading