Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PG: join new vertex data by vertex ids #2796

Merged
merged 9 commits into from
Nov 9, 2022

Conversation

eriknw
Copy link
Contributor

@eriknw eriknw commented Oct 12, 2022

Fixes #2793
Fixes #2794

We are using the assumption that there should be a single row for each vertex id.

This does not yet handle MG. Let's figure out how we want SG to behave first.

Fixes rapidsai#2793.

We are using the assumption that there should be a single row for each vertex id.
@codecov-commenter
Copy link

codecov-commenter commented Oct 12, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.12@e4ed302). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-22.12    #2796   +/-   ##
===============================================
  Coverage                ?   62.60%           
===============================================
  Files                   ?      118           
  Lines                   ?     6570           
  Branches                ?        0           
===============================================
  Hits                    ?     4113           
  Misses                  ?     2457           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@eriknw eriknw marked this pull request as draft October 13, 2022 00:27
@alexbarghi-nv alexbarghi-nv added non-breaking Non-breaking change Fix labels Oct 13, 2022
@alexbarghi-nv alexbarghi-nv added this to the 22.12 milestone Oct 13, 2022
@alexbarghi-nv alexbarghi-nv added the improvement Improvement / enhancement to an existing function label Oct 13, 2022
rapids-bot bot pushed a commit that referenced this pull request Oct 24, 2022
This should fix the quadratic scaling we're seeing when adding new data.

CC @VibhuJawa. I'm still trying to improve the merges for MG to be like #2796, but I'm encountering issues.

Authors:
  - Erik Welch (https://github.com/eriknw)

Approvers:
  - Vibhu Jawa (https://github.com/VibhuJawa)
  - Rick Ratzel (https://github.com/rlratzel)
  - Alex Barghi (https://github.com/alexbarghi-nv)

URL: #2805
@eriknw
Copy link
Contributor Author

eriknw commented Oct 26, 2022

This PR needs rapidsai/cudf#11998 to pass tests.

I think we ought to try to benchmark this before merging.

Here is the mini-benchmark bench_add_edges_cyber for "branch-22.12" branch (I added [50] for kicks):

------------------------------------------------------------- benchmark: 5 tests ------------------------------------------------------------
Name (time in s, mem in bytes)        Mean                GPU mem            GPU Leaked mem            Rounds            GPU Rounds
---------------------------------------------------------------------------------------------------------------------------------------------
bench_add_edges_cyber[1]            1.3833 (1.0)      273,342,072 (1.0)                 144 (1.0)           1           1
bench_add_edges_cyber[3]            2.8139 (2.03)     273,342,968 (1.00)              1,032 (7.17)          1           1
bench_add_edges_cyber[10]           6.7457 (4.88)     273,343,656 (1.00)              1,712 (11.89)         1           1
bench_add_edges_cyber[30]          18.8487 (13.63)    273,344,440 (1.00)              2,528 (17.56)         1           1
bench_add_edges_cyber[50]          30.2863 (21.89)    273,346,992 (1.00)              5,048 (35.06)         1           1
---------------------------------------------------------------------------------------------------------------------------------------------

and here is the benchmark for the current branch

--------------------------------------------------------------- benchmark: 5 tests ---------------------------------------------------------------
Name (time in ms, mem in bytes)            Mean                GPU mem            GPU Leaked mem            Rounds            GPU Rounds
--------------------------------------------------------------------------------------------------------------------------------------------------
bench_add_edges_cyber[1]               801.3284 (1.0)      273,342,224 (1.0)                 280 (1.0)           1           1
bench_add_edges_cyber[3]             1,723.6213 (2.15)     273,342,672 (1.00)                496 (1.77)          1           1
bench_add_edges_cyber[10]            4,731.0946 (5.90)     273,343,672 (1.00)              1,360 (4.86)          1           1
bench_add_edges_cyber[30]           13,608.0636 (16.98)    273,343,448 (1.00)              1,256 (4.49)          1           1
bench_add_edges_cyber[50]           22,819.4191 (28.48)    273,348,912 (1.00)              6,488 (23.17)         1           1
--------------------------------------------------------------------------------------------------------------------------------------------------

We shouldn't yet read too much into these numbers, but this PR actually isn't as fast as I was expecting. I'll investigate further and experiment with MAG240 dataset.

@rlratzel, this reworks the merge in add_vertex_data and add_edge_data, which seemed to be a particularly sensitive part of the code base. I think this PR can begin to be reviewed even though we're blocked on cudf PR for the tests.

@eriknw eriknw marked this pull request as ready for review November 3, 2022 04:53
Copy link
Member

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor clarification. Looks good other wise.

Comment on lines 456 to 458
if df.npartitions > 2 * self.__num_workers:
# TODO: better understand behavior of npartitions argument in join
df = df.repartition(npartitions=self.__num_workers).persist()
Copy link
Member

@VibhuJawa VibhuJawa Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With dask_cudf merges/operations, I think we should not actually decrease number of partitions to less than or equal to _num_workers unless we actually need to because it decrease parallelization and worker starving can become a problem.

Though the shuffle cost of the merge is nlogn but if your input partitions are less than available workers , worker starving can become a problem.

Maybe lets pick 2*_num_workers or something ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable. Done. I think we may want to explore these values as we scale out.

I know it's bad if we don't repartition though and the number of partitions grows too large.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed . Yup , it becomes really slow as you increase partitions . In case you are curios on how the shuffle works under the hood, below is my favorite explanation by rjzamora which i think still holds true.

rapidsai/cudf#4308 (comment)

Copy link
Member

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BradReesWork
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 7387fbc into rapidsai:branch-22.12 Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
5 participants