Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add distributed feature info for distributed training #7678

Closed

Conversation

ZhengHongming888
Copy link
Contributor

This code belongs to the part of the whole distributed training for PyG.

This PR originally designed for the DistFeature class and now merged with LocalFeatureStore -

  1. Add partition/rpc info into LocalFeatureStore like num_partition, partition_idx, feature_pb (feature_partitionbook), partition_meta, RpcRouter, etc
  2. Add one new class (RpcCallFeatureLookup) to do real remote rpc feature_lookup work
  3. Add one api ( .lookup_features() ) to do feature lookup in local node and remote nodes based on sampled global node ids/edge ids based on torch rpc apis
  4. one unit test to verify the function of local/remote feature lookup under .test/distributed/. folder

Now we combined the local feature store and distributed feature properties (partition info and rpc remote access apis) into one FeatureStore. later on we will change the class name from LocalFeatureStore into PartitionFeatureStore with another PR.

Any comments please let us know. thanks.

@codecov
Copy link

codecov bot commented Jul 2, 2023

Codecov Report

Merging #7678 (179fd0e) into master (9bc7017) will decrease coverage by 0.54%.
The diff coverage is 36.54%.

@@            Coverage Diff             @@
##           master    #7678      +/-   ##
==========================================
- Coverage   91.58%   91.05%   -0.54%     
==========================================
  Files         452      454       +2     
  Lines       25534    25783     +249     
==========================================
+ Hits        23385    23476      +91     
- Misses       2149     2307     +158     
Impacted Files Coverage Δ
torch_geometric/distributed/local_feature_store.py 55.73% <18.80%> (-34.03%) ⬇️
torch_geometric/distributed/rpc.py 50.92% <50.92%> (ø)
torch_geometric/distributed/dist_context.py 56.52% <56.52%> (ø)
torch_geometric/distributed/__init__.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@mananshah99 mananshah99 Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rebase this PR on the other two PRs to avoid a duplicated diff? Otherwise, this one is hard to review.

rusty1s added a commit that referenced this pull request Aug 7, 2023
This code belongs to the part of the whole distributed training for PyG.
(This PR is to replace #7678)

This PR originally designed for the DistFeature class and now merged
with LocalFeatureStore -

Add partition/rpc info into LocalFeatureStore like num_partition,
partition_idx, feature_pb (feature_partitionbook), partition_meta,
RpcRouter, etc
Add one new class (RpcCallFeatureLookup) to do real remote rpc
feature_lookup work
Add one api ( .lookup_features() ) to do feature lookup in local node
and remote nodes based on sampled global node ids/edge ids based on
torch rpc apis
one unit test to verify the function of local/remote feature lookup
under .test/distributed/. folder
Now we combined the local feature store and distributed feature
properties (partition info and rpc remote access apis) into one
FeatureStore. later on we will change the class name from
LocalFeatureStore into PartitionFeatureStore with another PR.

Any comments please let us know. thanks.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>
Co-authored-by: root <root@skyocean.sh.intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants