Update negative sampling to work directly on GPU #9608

danielecastellana22 · 2024-08-19T17:37:49Z

@rusty1s I re-implemented the negative edge sampling in order to work directly on GPUs.
In the following, I summarise the main idea of my implementation. Then, I will rise same questions that I hope you can help me to answer.

Negative Sampling

The idea of the negative edge sampling is to obtain a list of edges, and then discard all the ones in the input graph.
To perform the existence check, I use torch.searchsorted. The input edge_index should be sorted, but this is usually the case.

The initial guess of negative edges can be obtained in a dense and sparse way. The function also supports the automatic way to let the code automatically choose the best method for the input graph.

Dense Method

The dense method is exact (you can always obtain the desired number of negative edge samples if it is possible), but it is costly since it enumerates all the possible edges (so the cost is quadratic w.r.t the number of nodes). The samples are obtained through the function torch.randperm to get a stochastic process.

Sparse Method

The sparse method is not exact (you could obtain fewer samples than the requested number), but it is more efficient since it does not enumerate all possible edges.
To obtain the guess, we simply sample k edges using torch.multinomial. The number k is crucial to obtain the desired number of negative edges, and it depends on the probability to sample a negative edge randomly.

Structured Negative Sampling

In a similar way, I implemented the structured negative sampling. The main difference here is that we would like to sample a negative edge $(u,w)$ for each edge $(u,v)$ in the graph.

Dense Method

For each node $u$, we obtain a random permutation of all the nodes. Then, we select the first $deg(u)$ nodes that are not linked with $u$.

Sparse Method

We sample $k*E$ edges, where $E$ is the number of edges in the input graph. Here the choice of $k$ is more tricky since it depends on the degree of each node. When the method is not able to obtain a negative sample for an edge, it returns $-1$.

Open Questions

Sorting the edges
I use the pyg function index_sort to sort the input edge_index. However, I believe that in most of the cases, the input is already sorted. Hence, another option could be assuming that the input edge_index is already sorted. In this way, the sort becomes a duty of the user.
How to manage the warning
It could be cool to raise a warning when the sparse method could fail. I raise some warning when the probability of sampling is low, but probably it is a better idea to raise the warning when we are sure that the method has failed (e.g. the number of sampled edges is 10% less than the requested number)
Determine the number of samples in the structured negative sampling
I am struggling to find a way to determine the number $k$ in the sparse structured negative sampling. Probably my approach is not the best one since it is "global", hence it is difficult to find the right $k$. For now, I just set $k=10$.
Feasibility of structured negative sampling
If I understood the code of structured_negative_sampling_feasible, it returns true if there are no nodes that are connected with all the others. I think this is wrong since the structured negative sampling should be feasible if and only if $deg(u) < N/2$ for all nodes in the graph.

Let me know what you think about this!

for more information, see https://pre-commit.ci

denadai2 · 2024-08-20T11:31:24Z

wow this would be awesome!!

rusty1s · 2024-08-22T02:12:59Z

Wow, this is pretty cool. Can we fix the tests so that we are Python 3.8 compatible (no Python 3.10 type hints such as str | int)?

for more information, see https://pre-commit.ci

danielecastellana22 · 2024-08-22T17:47:07Z

Thank you, I am happy to hear this is a desired feature!

I updated the code to support python 3.8, and made small changes to the negative sampling test functions.
On my laptop, all the test cases affected by the changes are passed: it looks like other tests failed.

for more information, see https://pre-commit.ci

… retrieve the requested number of edges, it raises an exception.

…kSplit transform.

… added edges are actually new.

…have no guarantee that there were no repeated edges. Now it is based on randperm. Also, the number of nodes is doubled to ensure that structured sampling is (almost) always feasible.

for more information, see https://pre-commit.ci

danielecastellana22 · 2024-09-05T12:35:54Z

Hello, I updated the code to pass all the pytests.
However, pre-commit.ci raises a PEP8 error that I cannot solve (it is the only required checks that fail).
@wsad1 can you help me?

wsad1 · 2024-09-07T19:56:59Z

test/utils/test_negative_sampling.py

@@ -153,16 +159,72 @@ def test_structured_negative_sampling():
    assert (adj & neg_adj).sum() == 0

    # Test with no self-loops:
-    edge_index = torch.LongTensor([[0, 0, 1, 1, 2], [1, 2, 0, 2, 1]])
+    #edge_index = torch.LongTensor([[0, 0, 1, 1, 2], [1, 2, 0, 2, 1]])


Suggested change

#edge_index = torch.LongTensor([[0, 0, 1, 1, 2], [1, 2, 0, 2, 1]])

@danielecastellana22 this line was causing the lint issue. But why was it commented out? was that just a temporary change?

No, it can be removed.
I think that in the previous test, there were two different edge_index to make sure that the structured negative sampling was feasible with and without the negative self-loops.

Since I changed the structured negative sampling, now it is feasible also with the first definition of edge_index

wsad1

Great work. I'll make few more passes with reviews.

wsad1 · 2024-09-07T20:23:53Z

test/utils/test_negative_sampling.py

-    neg_edge_index = negative_sampling(edge_index, method='dense')
-    assert neg_edge_index.size(1) == edge_index.size(1)
-    assert is_negative(edge_index, neg_edge_index, (4, 4), bipartite=False)


Why drop this test?

Because the method is now inferred automatically based on the graph size. Since the graph used for the test is small, the method is always dense. This reflects the idea that a sparse method (which is based on a random guessing of the negative edges) is reasonable only when the graph is spare ($E \ll N^2$).

To test both the sparse and the dense method, I added a new test called test_negative_sampling_with_different_edge_density. Actually, I think that the whole function test_negative_sampling can be removed but I left it there to be sure that the old test still works.

wsad1 · 2024-09-07T20:40:19Z

torch_geometric/utils/_negative_sampling.py

+    # structured sample is feasible if, for each node, deg > max_neigh/2
+    return bool(torch.all(2 * deg <= max_num_neighbors))


Suggested change

# structured sample is feasible if, for each node, deg > max_neigh/2

return bool(torch.all(2 * deg <= max_num_neighbors))

# structured sample is feasible if, for each node, deg > max_neigh/2

return bool(torch.all(2 * deg <= max_num_neighbors))

The assumption here is that we don't want the negative edges to repeat. I like the assumption, but it might be too tight, also we don't guarantee that the negative edges are unique. I'd suggest keeping the old condition. Or we could make it explicit by adding an argument unique_neg, and use this new condition when it is True and the old condition when its False.

I understand your point, and this makes me wondering the following:

Should the negative sampling return duplicate edges?
In this new implementation, structured_negative_sampling with method=dense guarantees that the negative edges are unique. Instead, when method=sparse, this is not explicitly guaranteed (but it should be very unlikely especially if the graph is sparse). However, this could be easily resolved by removing the duplicates after the sampling. Also, it is worth highlighting that the negative_sampling never returns duplicates.

What is the purpose of structured_negative_sampling_feasible?
In my understanding, this function checks if a structured negative sampling is feasible in a graph. Note that this does not mean that the function structured_negative_sampling will return all the negative edges requested. In fact, when the sparse method is selected, the method can fail (i.e. it is not able to sample a negative edge for all the true edges) even if it is feasible due to the randomness of the sampling. Vice versa, when the dense method is used, the method will always return the correct set of negative edges (if it is feasible) since it enumerates all edges.

wsad1 · 2024-09-08T03:43:08Z

test/utils/test_negative_sampling.py

+    vector_id = torch.arange(N1 * N2)
+    edge_index3 = torch.stack(vector_id_to_edge_index(vector_id, (N1, N2)),
+                              dim=0)
+    assert edge_index.tolist() == edge_index3.tolist()


 def test_negative_sampling():


Decorate with @withCUDA to test on cpu and gpu.
See this for an example.

wsad1 · 2024-09-09T01:24:43Z

torch_geometric/utils/_negative_sampling.py

+        assert k is not None
+        k = 2 * k if force_undirected else k
+        p = torch.tensor([1.], device=device).expand(N1 * N2)
+        if k > N1 * N2:
+            k = N1 * N2
+        new_edge_id = torch.multinomial(p, k, replacement=False)


Suggested change

assert k is not None

k = 2 * k if force_undirected else k

p = torch.tensor([1.], device=device).expand(N1 * N2)

if k > N1 * N2:

k = N1 * N2

new_edge_id = torch.multinomial(p, k, replacement=False)

assert k is not None

k = 2 * k if force_undirected else k

new_edge_id = torch.randperm(N1*N2, device = device)[:min(k, N1*N2)]

Why not just do this? Also given this are we saying dense and sparse are not different anymore. Previosly the difference in dense and sparse was in how negatives were identified, now we don't have that difference.

The methods differ in terms of complexity:

the dense method enumerates all the possible edges and then filters out the positive ones. This requires $O(N^2)$ space and time.

the sparse method samples $k$ edges and then filters out the positive ones. The complexity now scales with $O(k)$.

The idea is that if we have a big graph, the dense method is prohibitive. However, in most cases, big graphs are not dense (i.e. $E \ll N^2$). Thus, we can still obtain negative edges by sampling $k$ edges at random since the probability to get a positive edge is low (it is $E/N^2 \ll 1$). Thus, thanks to the sparse method, we can obtain the negative edges without enumerating all the possible edges (as in the dense method).

In your suggestion, both sparse and dense will use torch.randperm(N1*N2), which returns a vector of size N1*N2, i.e. $O(N^2)$.

but in the above method aren't you creating p = torch.tensor([1.], device=device).expand(N1 * N2) which is of shape N^2.

torch.expand does not allocate new memory, as you can see in the pytorch documtation.
Actually, I am not sure about the complexity of torch.multinomial when replacement=False, but I didn't find anything online. However, this can be replaced by torch.randint and then handling the repetition of negative edges.

Update negative sampling to work directly on GPU

728de8a

danielecastellana22 requested a review from wsad1 as a code owner August 19, 2024 17:37

[pre-commit.ci] auto fixes from pre-commit.com hooks

7ba4732

for more information, see https://pre-commit.ci

danielecastellana22 and others added 4 commits August 22, 2024 18:17

Add compatibility to Python 3.8

c5c3889

Correct structured_edge_sampling feasibility test

c6db6d7

[pre-commit.ci] auto fixes from pre-commit.com hooks

4d764fa

for more information, see https://pre-commit.ci

Merge branch 'master' into negative_sampling_on_GPU

f399554

rusty1s assigned danielecastellana22 Aug 26, 2024

rusty1s added feature 1 - Priority P1 utils labels Aug 26, 2024

danielecastellana22 added 2 commits September 5, 2024 09:02

Merge branch 'master' into negative_sampling_on_GPU

7db30a6

Add all type annotations

10d440c

danielecastellana22 force-pushed the negative_sampling_on_GPU branch from 92e2629 to 10d440c Compare September 5, 2024 09:05

pre-commit-ci bot and others added 9 commits September 5, 2024 09:10

[pre-commit.ci] auto fixes from pre-commit.com hooks

0926a61

for more information, see https://pre-commit.ci

Add check for structured sampling feasibility. If the method fails to…

f919b2d

… retrieve the requested number of edges, it raises an exception.

Change the deafult method of negative_sampling to "auto" in RandomLin…

5e99af8

…kSplit transform.

test_add_random_edge was based on fixing the seed. Now it checks that…

afe67e2

… added edges are actually new.

The generation of the graph test_signed_gcn was based on randint. We …

2f3c2e7

…have no guarantee that there were no repeated edges. Now it is based on randperm. Also, the number of nodes is doubled to ensure that structured sampling is (almost) always feasible.

[pre-commit.ci] auto fixes from pre-commit.com hooks

d837eb8

for more information, see https://pre-commit.ci

Adjust the number of trials for sparse negative sampling.

6d466cc

[pre-commit.ci] auto fixes from pre-commit.com hooks

a2f5a71

for more information, see https://pre-commit.ci

Adjust PEP8

27d6079

danielecastellana22 force-pushed the negative_sampling_on_GPU branch from a6314ec to 27d6079 Compare September 5, 2024 12:14

[pre-commit.ci] auto fixes from pre-commit.com hooks

92ff538

for more information, see https://pre-commit.ci

wsad1 reviewed Sep 7, 2024

View reviewed changes

lint issue fix.

038f81a

wsad1 reviewed Sep 7, 2024

View reviewed changes

wsad1 reviewed Sep 8, 2024

View reviewed changes

wsad1 self-requested a review September 8, 2024 18:48

wsad1 reviewed Sep 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update negative sampling to work directly on GPU #9608

Update negative sampling to work directly on GPU #9608

danielecastellana22 commented Aug 19, 2024

denadai2 commented Aug 20, 2024

rusty1s commented Aug 22, 2024 •

edited

Loading

danielecastellana22 commented Aug 22, 2024

danielecastellana22 commented Sep 5, 2024

wsad1 Sep 7, 2024

wsad1 Sep 7, 2024

danielecastellana22 Sep 8, 2024

wsad1 left a comment

wsad1 Sep 7, 2024

danielecastellana22 Sep 8, 2024

wsad1 Sep 7, 2024 •

edited

Loading

danielecastellana22 Sep 8, 2024

wsad1 Sep 8, 2024

wsad1 Sep 9, 2024

danielecastellana22 Sep 9, 2024

wsad1 Sep 9, 2024

danielecastellana22 Sep 10, 2024

		# structured sample is feasible if, for each node, deg > max_neigh/2
		return bool(torch.all(2 * deg <= max_num_neighbors))

Update negative sampling to work directly on GPU #9608

Are you sure you want to change the base?

Update negative sampling to work directly on GPU #9608

Conversation

danielecastellana22 commented Aug 19, 2024

Negative Sampling

Dense Method

Sparse Method

Structured Negative Sampling

Dense Method

Sparse Method

Open Questions

denadai2 commented Aug 20, 2024

rusty1s commented Aug 22, 2024 • edited Loading

danielecastellana22 commented Aug 22, 2024

danielecastellana22 commented Sep 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wsad1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wsad1 Sep 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rusty1s commented Aug 22, 2024 •

edited

Loading

wsad1 Sep 7, 2024 •

edited

Loading