-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]Uniform layer-wise sampler #362
Conversation
Conflicts: src/graph/immutable_graph.cc
I also need to test my implementation with multi-layer GCN. |
Could we refactor the sampler codes to another source file? Also, I suggest you break down this PR into two. One is only the layer sampler codes in C++, and another is the model-related codes in Python. |
examples/pytorch/gcn/gcn_ls.py
Outdated
subgraph.map_to_subgraph_nid(dst).asnumpy()) | ||
for src, dst in parent_uv_edges_per_hop] | ||
#print(subg_uv_edges_per_hop) | ||
return subgraph, subg_uv_edges_per_hop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to call LayerSampler like this. Previously, neighbor sampling can't handle resampled nodes correctly. That's why @ZiyueHuang implement neighbor sampling like this. If your layer sampler can handle resampled nodes, you can directly use layer sampler.
BTW, this implementation generates only a single batch in an epoch. This isn't desired. We need to enable mini-batch training.
examples/pytorch/gcn/gcn_ls.py
Outdated
|
||
print(args) | ||
|
||
main(args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this file should eventually merge with @ZiyueHuang's neighbor sampling. Most of the code should be the same.
* \return a subgraph | ||
*/ | ||
/* virtual SampledSubgraph LayerUniformSample(IdArray seeds, const std::string &neigh_type, | ||
int n_layers, size_t layer_size) const = 0; */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why comment this?
if not prefetch: | ||
return loader | ||
else: | ||
return _PrefetchingLoader(loader, num_prefetch=num_workers*2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between layer sampler and neighbor sampler, in terms of API? if they are the same, should we merge them?
@@ -976,6 +977,138 @@ SampledSubgraph ImmutableGraph::SampleSubgraph(IdArray seed_arr, | |||
return subg; | |||
} | |||
|
|||
SampledSubgraph ImmutableGraph::LayerSample(IdArray seed_array, | |||
const float* probability, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to pass probability?
src/graph/immutable_graph.cc
Outdated
size_t n_seeds = seed_array->shape[0]; | ||
const dgl_id_t* seed_data = static_cast<dgl_id_t*>(seed_array->data); | ||
candidate_set.insert(seed_data, seed_data + n_seeds); | ||
std::copy(candidate_set.begin(), candidate_set.end(), nodes.begin()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to allocate memory for nodes
first?
src/graph/immutable_graph.cc
Outdated
const dgl_id_t* seed_data = static_cast<dgl_id_t*>(seed_array->data); | ||
candidate_set.insert(seed_data, seed_data + n_seeds); | ||
std::copy(candidate_set.begin(), candidate_set.end(), nodes.begin()); | ||
layer_ids.insert(layer_ids.end(), nodes.size(), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use push_back?
src/graph/immutable_graph.cc
Outdated
std::vector<size_t> positions {0, nodes.size()}; | ||
for (int i = 0; i != n_layers; i++) { | ||
candidate_set.clear(); | ||
for (auto j = positions.end()[-2]; j != positions.back(); ++j) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
positions.end()[-2]
looks weird. it seems you want to use positions.front()
src/graph/immutable_graph.cc
Outdated
for (auto const &pair : n_times) { | ||
nodes.push_back(pair.first); | ||
layer_ids.push_back(i + 1); | ||
probabilities.push_back(pair.second / n_nodes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this sampling probability? also, it's over the number of all nodes in the original graph?
src/graph/immutable_graph.cc
Outdated
subg.layer_ids[i] = node_info[i].first; | ||
subg.sample_prob[i] = node_info[i].second; | ||
} | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these?
examples/pytorch/gcn/gcn_ls.py
Outdated
fn.sum(msg='m', out='h')) | ||
h = subg.ndata.pop('h') | ||
# same as TODO above | ||
h = h / math.sqrt(3.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The normalization doesn't seem right. Could you try normalizing with vertex degree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can try random walk normalized Laplacian, like in https://github.com/ZiyueHuang/dgl/blob/4aa22e63d42fba3a27709a02c55aba680e30dd33/examples/mxnet/gcn/gcn_ns.py for easier implementation (for uniform sampling, mean
is OK, no need for the norm
),
def gcn_reduce(node):
accum = mx.nd.mean(node.mailbox['m'], 1)
return {'h': accum}
instead of symmetric normalized Laplacian.
Description
The implementation of uniform layer-wise sampler, tested with SSE on the PubMed dataset.
Checklist
Please feel free to remove inapplicable items for your PR.
or have been fixed to be compatible with this change
Changes