[QST]: Leiden clustering before and after 23.02 #4529

wdnlotm · 2024-07-09T19:52:46Z

What is your question?

Hi,
I use the Leiden clustering a lot. Louvain as well, sometimes. I am experiencing big differences in the results when I use 23.02 and newer versions. My dataset has 80 mil edges and 6 mil nodes. The biggest notable difference is older than 23.02, the Leiden clustering produces about 20 clusters, but newer than 23.02 produces, sometimes, 200K clusters for the same dataset and settings.
Now I need to use Dask cugraph but this big difference prevents me from moving on to the new versions.
Any idea?
Thank you.

Code of Conduct

I agree to follow cuGraph's Code of Conduct
I have searched the open issues and have found no duplicates for this question

alexbarghi-nv · 2024-07-09T20:53:19Z

23.02 is a fairly old version of cuGraph. I would start by moving to a newer version of cuGraph (the latest is 24.06) and seeing if that resolves your issues.

Also @ChuckHastings I think we've had a similar issue reported in the past? What was the resolution there?

ChuckHastings · 2024-07-10T00:16:10Z

Our original implementation of Leiden had a number of functionality issues (it was not much more than Louvain in actuality) and had significant scaling issues. Our implementation of Leiden was completely rewritten as of version 23.04, so you have probably observed that major shift. This version was much better, although there continued to be some minor issues with it through earlier this year. As of 24.06 we have no known outstanding issues with Leiden. So I would suggest using 24.06 or later.

If you have examples where our Leiden implementation in 24.06 generates answers that you are concerned about, please provide some details (ideally a sample data set, parameters and what you might expect the result to be) and we will be happy to investigate.

wdnlotm · 2024-07-10T01:15:53Z

I am testing a few versions.
I have an edge list from a knn (82 mil edges, 5.5 mil vertices).
I load it to G=cugraph.MultiGraph(directed=False). I use leiden or louvain by running
parts, modularity_score = cugraph.louvain(G, max_level=500, resolution = 0.65) or
parts, modularity_score = cugraph.leiden(G, max_iter=500, resolution = 0.65, random_state=123)

I use the Apptainer by the way.
24.06 Leiden produced 1741523 clusters
24.06 Louvain produced 11 (eleven) clusters

24.02 Leiden produced 10 clusters.
Great!! But if I rerun the same code, I got 11 clusters. It's good too but not consistent. After one more run, I got 141 clusters.
24.02 Louvain produced 11 (eleven) clusters

23.02 Leiden produced 19 clusters.
23.02 Louvain produced 20 clusters

In fact, I like the results from 23.02. Those results are comparable to what I could get from the leidenalg library which is s l o w.

Thanks.

ChuckHastings · 2024-07-10T16:09:28Z

Leiden circa 23.02 was really Louvain with just a tiny bit of extra logic. It didn't actually implement much of the Leiden algorithm at all.

The big issue with 23.06 through 24.02 was inconsistencies. We identified and corrected several bugs that made things more consistent. Because we are operating in parallel, and Louvain/Leiden are greedy approximation algorithms, we can still be affected by numerical instability. Sums of values occur in parallel in a non-deterministic order and we can see minor variations in the results on larger graphs. We have seen several cases where the best choice in the greedy algorithm is really a tie, and due to the variability in the numerical stability we might see different vertices in the tie actually be a clear winner. This is a known source of variability in the results that we haven't tried to address. But we have addressed many of the inconsistencies.

Producing 1.7 million clusters seems like a bug of some kind. Is the graph you are using something you can share? I can try to find comparable size graphs and see if I can get this type of behavior, but starting from your graph will be the easiest path.

wdnlotm · 2024-07-10T16:20:59Z

I can share my graph and code. Give me an hour or two.
Thanks.

wdnlotm · 2024-07-10T16:42:31Z

I put files here. Thank you for your help!!
https://drive.google.com/drive/folders/10lziXlXut-ZGcXDlwbk6SxqPB9OIunSS?usp=sharing

ChuckHastings · 2024-07-10T17:18:33Z

It may take some time to investigate, but we will keep you informed on progress.

ChuckHastings · 2024-08-06T00:37:23Z

Just added this to our development plan for the 24.10 release. We should at least get some analysis done soon.

wdnlotm · 2024-08-19T23:53:11Z

If you don't mind, I can add another edge list, much smaller, showing similar results.
https://drive.google.com/drive/folders/10lziXlXut-ZGcXDlwbk6SxqPB9OIunSS?usp=sharing
go into smaller_knn

Results using 24.06 were like
with Leiden 35387 clusters
with Louvain 15 clusters.

Thanks.

ChuckHastings · 2024-08-20T14:33:07Z

Smaller graphs are always easier to debug. Thanks!

wdnlotm added the question Further information is requested label Jul 9, 2024

ChuckHastings self-assigned this Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST]: Leiden clustering before and after 23.02 #4529

[QST]: Leiden clustering before and after 23.02 #4529

wdnlotm commented Jul 9, 2024

alexbarghi-nv commented Jul 9, 2024

ChuckHastings commented Jul 10, 2024

wdnlotm commented Jul 10, 2024

ChuckHastings commented Jul 10, 2024

wdnlotm commented Jul 10, 2024

wdnlotm commented Jul 10, 2024

ChuckHastings commented Jul 10, 2024

ChuckHastings commented Aug 6, 2024

wdnlotm commented Aug 19, 2024

ChuckHastings commented Aug 20, 2024

[QST]: Leiden clustering before and after 23.02 #4529

[QST]: Leiden clustering before and after 23.02 #4529

Comments

wdnlotm commented Jul 9, 2024

What is your question?

Code of Conduct

alexbarghi-nv commented Jul 9, 2024

ChuckHastings commented Jul 10, 2024

wdnlotm commented Jul 10, 2024

ChuckHastings commented Jul 10, 2024

wdnlotm commented Jul 10, 2024

wdnlotm commented Jul 10, 2024

ChuckHastings commented Jul 10, 2024

ChuckHastings commented Aug 6, 2024

wdnlotm commented Aug 19, 2024

ChuckHastings commented Aug 20, 2024