Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update negative sampling to work directly on GPU #9608

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

danielecastellana22
Copy link

@rusty1s I re-implemented the negative edge sampling in order to work directly on GPUs.
In the following, I summarise the main idea of my implementation. Then, I will rise same questions that I hope you can help me to answer.

Negative Sampling

The idea of the negative edge sampling is to obtain a list of edges, and then discard all the ones in the input graph.
To perform the existence check, I use torch.searchsorted. The input edge_index should be sorted, but this is usually the case.

The initial guess of negative edges can be obtained in a dense and sparse way. The function also supports the automatic way to let the code automatically choose the best method for the input graph.

Dense Method

The dense method is exact (you can always obtain the desired number of negative edge samples if it is possible), but it is costly since it enumerates all the possible edges (so the cost is quadratic w.r.t the number of nodes). The samples are obtained through the function torch.randperm to get a stochastic process.

Sparse Method

The sparse method is not exact (you could obtain fewer samples than the requested number), but it is more efficient since it does not enumerate all possible edges.
To obtain the guess, we simply sample k edges using torch.multinomial. The number k is crucial to obtain the desired number of negative edges, and it depends on the probability to sample a negative edge randomly.

Structured Negative Sampling

In a similar way, I implemented the structured negative sampling. The main difference here is that we would like to sample a negative edge $(u,w)$ for each edge $(u,v)$ in the graph.

Dense Method

For each node $u$, we obtain a random permutation of all the nodes. Then, we select the first $deg(u)$ nodes that are not linked with $u$.

Sparse Method

We sample $k*E$ edges, where $E$ is the number of edges in the input graph. Here the choice of $k$ is more tricky since it depends on the degree of each node. When the method is not able to obtain a negative sample for an edge, it returns $-1$.

Open Questions

  1. Sorting the edges
    I use the pyg function index_sort to sort the input edge_index. However, I believe that in most of the cases, the input is already sorted. Hence, another option could be assuming that the input edge_index is already sorted. In this way, the sort becomes a duty of the user.

  2. How to manage the warning
    It could be cool to raise a warning when the sparse method could fail. I raise some warning when the probability of sampling is low, but probably it is a better idea to raise the warning when we are sure that the method has failed (e.g. the number of sampled edges is 10% less than the requested number)

  3. Determine the number of samples in the structured negative sampling
    I am struggling to find a way to determine the number $k$ in the sparse structured negative sampling. Probably my approach is not the best one since it is "global", hence it is difficult to find the right $k$. For now, I just set $k=10$.

  4. Feasibility of structured negative sampling
    If I understood the code of structured_negative_sampling_feasible, it returns true if there are no nodes that are connected with all the others. I think this is wrong since the structured negative sampling should be feasible if and only if $deg(u) < N/2$ for all nodes in the graph.

Let me know what you think about this!

@denadai2
Copy link
Contributor

wow this would be awesome!!

@rusty1s
Copy link
Member

rusty1s commented Aug 22, 2024

Wow, this is pretty cool. Can we fix the tests so that we are Python 3.8 compatible (no Python 3.10 type hints such as str | int)?

@danielecastellana22
Copy link
Author

Thank you, I am happy to hear this is a desired feature!

I updated the code to support python 3.8, and made small changes to the negative sampling test functions.
On my laptop, all the test cases affected by the changes are passed: it looks like other tests failed.

pre-commit-ci bot and others added 9 commits September 5, 2024 09:10
… retrieve the requested number of edges, it raises an exception.
…have no guarantee that there were no repeated edges. Now it is based on randperm. Also, the number of nodes is doubled to ensure that structured sampling is (almost) always feasible.
@danielecastellana22
Copy link
Author

Hello, I updated the code to pass all the pytests.
However, pre-commit.ci raises a PEP8 error that I cannot solve (it is the only required checks that fail).
@wsad1 can you help me?

@@ -153,16 +159,72 @@ def test_structured_negative_sampling():
assert (adj & neg_adj).sum() == 0

# Test with no self-loops:
edge_index = torch.LongTensor([[0, 0, 1, 1, 2], [1, 2, 0, 2, 1]])
#edge_index = torch.LongTensor([[0, 0, 1, 1, 2], [1, 2, 0, 2, 1]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#edge_index = torch.LongTensor([[0, 0, 1, 1, 2], [1, 2, 0, 2, 1]])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielecastellana22 this line was causing the lint issue. But why was it commented out? was that just a temporary change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it can be removed.
I think that in the previous test, there were two different edge_index to make sure that the structured negative sampling was feasible with and without the negative self-loops.

Since I changed the structured negative sampling, now it is feasible also with the first definition of edge_index

Copy link
Member

@wsad1 wsad1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. I'll make few more passes with reviews.

Comment on lines -72 to -74
neg_edge_index = negative_sampling(edge_index, method='dense')
assert neg_edge_index.size(1) == edge_index.size(1)
assert is_negative(edge_index, neg_edge_index, (4, 4), bipartite=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why drop this test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the method is now inferred automatically based on the graph size. Since the graph used for the test is small, the method is always dense. This reflects the idea that a sparse method (which is based on a random guessing of the negative edges) is reasonable only when the graph is spare ($E \ll N^2$).

To test both the sparse and the dense method, I added a new test called test_negative_sampling_with_different_edge_density. Actually, I think that the whole function test_negative_sampling can be removed but I left it there to be sure that the old test still works.

Comment on lines +364 to +365
# structured sample is feasible if, for each node, deg > max_neigh/2
return bool(torch.all(2 * deg <= max_num_neighbors))
Copy link
Member

@wsad1 wsad1 Sep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# structured sample is feasible if, for each node, deg > max_neigh/2
return bool(torch.all(2 * deg <= max_num_neighbors))
# structured sample is feasible if, for each node, deg > max_neigh/2
return bool(torch.all(2 * deg <= max_num_neighbors))

The assumption here is that we don't want the negative edges to repeat. I like the assumption, but it might be too tight, also we don't guarantee that the negative edges are unique. I'd suggest keeping the old condition. Or we could make it explicit by adding an argument unique_neg, and use this new condition when it is True and the old condition when its False.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point, and this makes me wondering the following:

  1. Should the negative sampling return duplicate edges?
    In this new implementation, structured_negative_sampling with method=dense guarantees that the negative edges are unique. Instead, when method=sparse, this is not explicitly guaranteed (but it should be very unlikely especially if the graph is sparse). However, this could be easily resolved by removing the duplicates after the sampling. Also, it is worth highlighting that the negative_sampling never returns duplicates.

  2. What is the purpose of structured_negative_sampling_feasible?
    In my understanding, this function checks if a structured negative sampling is feasible in a graph. Note that this does not mean that the function structured_negative_sampling will return all the negative edges requested. In fact, when the sparse method is selected, the method can fail (i.e. it is not able to sample a negative edge for all the true edges) even if it is feasible due to the randomness of the sampling. Vice versa, when the dense method is used, the method will always return the correct set of negative edges (if it is feasible) since it enumerates all edges.

vector_id = torch.arange(N1 * N2)
edge_index3 = torch.stack(vector_id_to_edge_index(vector_id, (N1, N2)),
dim=0)
assert edge_index.tolist() == edge_index3.tolist()


def test_negative_sampling():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decorate with @withCUDA to test on cpu and gpu.
See this for an example.

@wsad1 wsad1 self-requested a review September 8, 2024 18:48
Comment on lines +399 to +404
assert k is not None
k = 2 * k if force_undirected else k
p = torch.tensor([1.], device=device).expand(N1 * N2)
if k > N1 * N2:
k = N1 * N2
new_edge_id = torch.multinomial(p, k, replacement=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert k is not None
k = 2 * k if force_undirected else k
p = torch.tensor([1.], device=device).expand(N1 * N2)
if k > N1 * N2:
k = N1 * N2
new_edge_id = torch.multinomial(p, k, replacement=False)
assert k is not None
k = 2 * k if force_undirected else k
new_edge_id = torch.randperm(N1*N2, device = device)[:min(k, N1*N2)]

Why not just do this? Also given this are we saying dense and sparse are not different anymore. Previosly the difference in dense and sparse was in how negatives were identified, now we don't have that difference.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The methods differ in terms of complexity:

  • the dense method enumerates all the possible edges and then filters out the positive ones. This requires $O(N^2)$ space and time.
  • the sparse method samples $k$ edges and then filters out the positive ones. The complexity now scales with $O(k)$.

The idea is that if we have a big graph, the dense method is prohibitive. However, in most cases, big graphs are not dense (i.e. $E \ll N^2$). Thus, we can still obtain negative edges by sampling $k$ edges at random since the probability to get a positive edge is low (it is $E/N^2 \ll 1$). Thus, thanks to the sparse method, we can obtain the negative edges without enumerating all the possible edges (as in the dense method).

In your suggestion, both sparse and dense will use torch.randperm(N1*N2), which returns a vector of size N1*N2, i.e. $O(N^2)$.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in the above method aren't you creating p = torch.tensor([1.], device=device).expand(N1 * N2) which is of shape N^2.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch.expand does not allocate new memory, as you can see in the pytorch documtation.
Actually, I am not sure about the complexity of torch.multinomial when replacement=False, but I didn't find anything online. However, this can be replaced by torch.randint and then handling the repetition of negative edges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants