GitHub - puzzlef/louvain-communities: Comparing approaches for community detection using Louvain algorithm.

Comparing approaches for community detection using Louvain algorithm.

Louvain is an algorithm for detecting communities in graphs. Community detection helps us understand the natural divisions in a network in an unsupervised manner. It is used in e-commerce for customer segmentation and advertising, in communication networks for multicast routing and setting up of mobile networks, and in healthcare for epidemic causality, setting up health programmes, and fraud detection is hospitals. Community detection is an NP-hard problem, but heuristics exist to solve it (such as this). Louvain algorithm is an agglomerative-hierarchical community detection method that greedily optimizes for modularity (iteratively).

Modularity is a score that measures relative density of edges inside vs outside communities. Its value lies between −0.5 (non-modular clustering) and 1.0 (fully modular clustering). Optimizing for modularity theoretically results in the best possible grouping of nodes in a graph.

Given an undirected weighted graph, all vertices are first considered to be their own communities. In the first phase, each vertex greedily decides to move to the community of one of its neighbors which gives greatest increase in modularity. If moving to no neighbor's community leads to an increase in modularity, the vertex chooses to stay with its own community. This is done sequentially for all the vertices. If the total change in modularity is more than a certain threshold, this phase is repeated. Once this local-moving phase is complete, all vertices have formed their first hierarchy of communities. The next phase is called the aggregation phase, where all the vertices belonging to a community are collapsed into a single super-vertex, such that edges between communities are represented as edges between respective super-vertices (edge weights are combined), and edges within each community are represented as self-loops in respective super-vertices (again, edge weights are combined). Together, the local-moving and the aggregation phases constitute a pass. This super-vertex graph is then used as input for the next pass. This process continues until the increase in modularity is below a certain threshold. As a result from each pass, we have a hierarchy of community memberships for each vertex as a dendrogram. We generally consider the top-level hierarchy as the final result of community detection process.

Adjusting Tolerance

Louvain algorithm is a hierarchical algorithm, and thus has two different tolerance parameters: tolerance and passTolerance. tolerance defines the minimum amount of increase in modularity expected, until the local-moving phase of the algorithm is considered to have converged. We compare the increase in modularity in each iteration of the local-moving phase to see if it is below tolerance. passTolerance defines the minimum amount of increase in modularity expected, until the entire algorithm is considered to have converged. We compare the increase in modularity across all iterations of the local-moving phase in the current pass to see if it is below passTolerance. passTolerance is normally set to 0 (we want to maximize our modularity gain), but the same thing does not apply for tolerance. Adjusting values of tolerance between each pass have been observed to impact the runtime of the algorithm, without significantly affecting the modularity of obtained communities.

In this experiment (adjust-tolerance), we change the initial value of tolerance (for local-moving phase) from 1e-00 to 1e-6 in steps of 10. For each initial value of tolerance, we use a tolerance-norm of L1, L2, or L∞. We compare the results, both in terms of quality (modularity) of communities obtained, and performance. We choose the remaining Louvain parameters as resolution = 1.0, toleranceDeclineFactor = 10 (the rate at which we reduce tolerance after every pass), and passTolerance = 0.0. In addition we limit the maximum number of iterations in a single local-moving phase with maxIterations = 500, and limit the maximum number of passes with maxPasses = 500. We run the Louvain algorithm until convergence (or until the maximum limits are exceeded), and measure the time taken for the computation (performed 5 times for averaging), the modularity score, the total number of iterations (in the local-moving phase), and the number of passes. This is repeated for seventeen different graphs.

From the results, we make the following observations. A tolerance-norm of L1 converges the fastest, followed by L∞, and then L2 (except for initial tolerance below 10^-2). This could be due to delta-modularity for any vertex being small, so that squiring it (as with L2-norm) reduces the net error significantly. In general we observe that the modularity obtained with all tolerance-norms is the same, but for some graphs, best modularity is achieved with an initial tolerance of 10^-5 with L∞-norm, and an initial tolerance of > 10^-6 with L2-norm. As L∞-norm considers only the max error, we believe it would be a suitable tolerance-norm of choice for dynamic Louvain algorithm. Hence, we ask you to consider an initial tolerance of 10^-5 with L∞-norm to be the suitable choice in the general case.

Adjusting Tolerance iteratively

In this experiment (adjust-tolerance-iteratively), we adjust tolerance in two different ways. First, we change the initial value of tolerance from 1e-00 to 1e-12 in steps of 10. For each initial value of tolerance, we adjust the rate at which we decline tolerance between each pass (toleranceDeclineFactor) from 10 to 10000. We compare the results, both in terms of quality (modularity) of communities obtained, and performance. We choose the remaining Louvain parameters as resolution = 1.0 and passTolerance = 0.0. In addition we limit the maximum number of iterations in a single local-moving phase with maxIterations = 500, and limit the maximum number of passes with maxPasses = 500. We run the Louvain algorithm until convergence (or until the maximum limits are exceeded), and measure the time taken for the computation (performed 5 times for averaging), the modularity score, the total number of iterations (in the local-moving phase), and the number of passes. This is repeated for seventeen different graphs.

From the results, we observe that an initial tolerance of 1e-2 yields communities with the best possible modularity while requiring the least computation time. In addition, increasing the toleranceDeclineFactor increases the computation time (as expected), but does not seem to impact resulting modularity. Therefore choosing a toleranceDeclineFactor of 10 would be a good idea.

Adjusting Accumulator hashtable capacity

An accumulator hashtable is a data structure that is used to obtain the total weight of each community connected to each vertex in the graph. This is then used to find delta modularity of moving to each of its neighboring communities. Its capacity is always a prime number, and the hash key is obtained by simply finding the modulo of the community id with the capacity of the accumulator hashtable. In order to accomodate all communities in the worst case, the accumulator hastable is normally initialized with a capacity of the degree of each vertex in the graph. High degree vertices will thus require large accumulator hashtables. The problem with using a large accumulator hastable is that it is not feasible on a GPU, where we have a large number of threads, but a small amount of working memory (shared memory). For high-degree vertices, this would have to be done in the global memory instead (which is slow). In addition, we would have to use atomicCAS() operations in order to avoid collisions which can further drop performance.

Therefore, here my idea is to look if we can simply do away with large hash tables and collision resolution altogether, by simply considering identical hash keys as identical labels. This may lead to bad communities, but that is what this experiment is for. If they do yield good communities it can be a big win in performance. Note that such a scheme is only likely to cause issues only in the first few iterations, when we have a large number of communities. In addition, as there is potential for multiple communities to be combined, we will consider the new community to be the hash key.

In this experiment (adjust-accumulator-capacity), we adjust the capacity of accumulator hashtable from 2 to 4093 in multiples of 2. This capacity is always set to the highest prime number below a power of 2. We choose the Louvain parameters as resolution = 1.0, tolerance = 0 and passTolerance = 0.5. In addition we limit the maximum number of iterations in a single local-moving phase with maxIterations = 500, and limit the maximum number of passes with maxPasses = 500. We run the Louvain algorithm until convergence (or until the maximum limits are exceeded), and measure the time taken for the computation (performed 5 times for averaging), the modularity score, the total number of iterations (in the local-moving phase), and the number of passes. This is repeated for seventeen different graphs.

We however obtain results that are significantly different from our expectations. We observe that choosing a lower accumulator hastable capacity causes the computation to converge in much longer time, require a larger number of iterations, a smaller number of passes, and worse modularity. Choosing an accumulator hashtable capacity of 4093 also does not seem to provide any advantage over a full-size array based accumulator (except on some graphs, i.e., soc-LiveJournal1, coPapersCiteseer, coPapersDBLP). This is hopefully interesting as we observe that this approach of using a limit capacity accumulator hashtable seems to work well for the LabelRank algorithm [(1)].

However with higher accumulator labelset capacities, the time taken may increase beyond the time required for a full size accumulator labelset. This is because of the additional modulus (%) operator required with a limited capacity accumulator labelset (my expectaction is that this would not significantly affect performance in a GPU). Again, a similar effect is observed with modularity (in the average case), which increases with increasing accumulator labelset capacity. It appears that using an accumulator labelset capacity of 61 / 127 would yield a good enough modularity. In some cases, using a smaller accumulator labelset capacity yeilds an even better modularity than full-size labelsets (but these are exception cases i think). Note that choices might differ if a different labelset capacity is used. It would be interesting to implement this kind of collision-ignoring hash table on a GPU, and observe its impact on LabelRank as well as the Louvain algorithm for community detection.

Adjusting Accumulator hashtable capacity without collisions

Originally, my idea was to look if we can simply do away with large hash tables and collision resolution altogether, by simply considering identical hash keys as identical labels. However, either due to a mistake of my own or due to the limit to the total number of communities enforced by the lack of collision resolution, the quality of communitied (based on modularity) were really bad. Therefore, with this experiment, we put in place proper collision resolution (algorithmically), but still restrict the capacity of the accumulator hashtable. As you will read later, this gives us good results.

In this experiment (adjust-accumulator-capacity-with-collisions), we adjust the capacity of accumulator hashtable from 2 to 4093 in multiples of 2. This capacity is always set to the highest prime number below a power of 2. We choose the Louvain parameters as resolution = 1.0, tolerance = 0 and passTolerance = 0.5. In addition we limit the maximum number of iterations in a single local-moving phase with maxIterations = 500, and limit the maximum number of passes with maxPasses = 500. We run the Louvain algorithm until convergence (or until the maximum limits are exceeded), and measure the time taken for the computation (performed 5 times for averaging), the modularity score, the total number of iterations (in the local-moving phase), and the number of passes. This is repeated for seventeen different graphs.

From the results we observe that we can achieve good modularity with accumulator capacity of 7+. We are able to obtain such good quality communities within the least amount of time with an accumulator capacity of 31+, and with the least number of iterations with an accumulator capacity of 127+. We therefore conclude that using a limited capacity accumulator hashtable of 251 with ~50% occupancy would be a suitable for community detection using the Louvain algorithm on limited memory parallel devices such as the GPU. We may also go down to a hashtable capacity of 61 and still achieve good communities in a small amount of time. If time is not a concern (due to high-degree of parallelism achieved with small hashtables), an accumulator capacity of 13 may also be used. If minimization of collisions is not important (thanks to parallel memory scanning), then an accumulator capacity of 7 may also be attempted.

Adjusting Hash function for Accumulator hashtable

In this experiment (adjust-accumulator-hash-function), we try five different hash functions, and adjust the capacity of accumulator hashtable from 509 to 4093 in multiples of 2. This capacity is always set to the highest prime number below a power of 2. We choose the Louvain parameters as resolution = 1.0, tolerance = 0 and passTolerance = 0.5. In addition we limit the maximum number of iterations in a single local-moving phase with maxIterations = 500, and limit the maximum number of passes with maxPasses = 500. We run the Louvain algorithm until convergence (or until the maximum limits are exceeded), and measure the time taken for the computation (performed 5 times for averaging), the modularity score, the total number of iterations (in the local-moving phase), and the number of passes. This is repeated for seventeen different graphs.

We observe that magic hash function with an accumulator hashtable capacity of 4093 appears to perform the best among the limited capacity hashtable approaches. However it is still worse than the default approach of Louvain algorithm. Therefore it seems using a limited capacity accumulator hashtable for the Louvain algorithm is not useful, i.e., we must stick to full capacity accumulator hashtable.

Simple first move optimization

The first iteration of local-moving phase can be simplified to not require the use of an accumulator hashtable. This is because in the first iteration, each vertex is its own community (no accumulation is needed). However, this must be performed in an unordered fashion, i.e., the community choosing of one vertex would not affect the community choosing of another vertex (as it normally does with local-moving phase, as we are using ordered Louvain algorithm). We anticipate that this may help reducing the time taken for convergence, while still taking almost the same time for convergence. This can be useful on the GPU when using limited capacity accumulator hashtable as it can reduce the number of communities present and thus help limit the impact of the inaccuracies of a limited capacity hashtable. Note that we apply this simiplification only for the first iteration of the first local-moving phase of the algorithm.

In this experiment (optimization-simple-first-move), we compare the standard and simplified first move approaches for the Louvain algorithm, both in terms of quality (modularity) of communities obtained, and performance. We choose the Louvain parameters as resolution = 1.0, tolerance = 1e-2 (for local-moving phase) with tolerance decreasing after every pass by a factor of toleranceDeclineFactor = 10, and a passTolerance = 0.0 (when passes stop). In addition we limit the maximum number of iterations in a single local-moving phase with maxIterations = 500, and limit the maximum number of passes with maxPasses = 500. We run the Louvain algorithm until convergence (or until the maximum limits are exceeded), and measure the time taken for the computation (performed 5 times for averaging), the modularity score, the total number of iterations (in the local-moving phase), and the number of passes. We also track the time, iterations, and modularity obtained from the first local-moving phase on the CPU and the GPU. This is repeated for seventeen different graphs.

From the results, we observe that using a simple first move requires in general somewhat more time (and iterations) to converge than the standard approach, but may also return communities of slightly higher modularity. We therefore conclude that using a simple first move is not useful on the CPU, but it may be useful on the GPU for the reasons mentioned above.

Comparision with Unordered/Synchronous approach

There exist two possible approaches of vertex processing with the Louvain algorithm: ordered and unordered. With the ordered approach (original paper's approach), the local-moving phase is performed sequentially upon each vertex such that the moving of a previous vertex in the graph affects the decision of the current vertex being processed. On the other hand, with the unordered approach the moving of a previous vertex in the graph does not affect the decision of movement for the current vertex. This unordered approach (aka relaxed approach) is made possible by maintaining the previous and the current community membership status of each vertex (along with associated community information), and is the approach followed by parallel Louvain implementation on the CPU as well as the GPU. We are interested in looking at performance/modularity penalty (if any) associated with the unordered approach.

In this experiment (compare-unordered), we compare the ordered and unordered vertex processing approaches for the Louvain algorithm, both in terms of quality (modularity) of communities obtained, and performance. We choose the Louvain parameters as resolution = 1.0, tolerance = 0.0 (for local-moving phase), passTolerance = 0.0 (when passes stop). In addition we limit the maximum number of iterations in a single local-moving phase with maxIterations = 500, and limit the maximum number of passes with maxPasses = 500. We run the Louvain algorithm until convergence (or until the maximum limits are exceeded), and measure the time taken for the computation (performed 5 times for averaging), the modularity score, the total number of iterations (in the local-moving phase), and the number of passes. This is repeated for seventeen different graphs.

From the results, we observe that both the ordered and the unordered vertex processing approaches of the Louvain algorithm are able to provide communities of equivalent quality in terms of modularity, with the ordered approach providing slightly higher quality communities for certain graphs. However, the unordered approach is quite a bit slower than the ordered approach in terms of the total time taken, as well as the total number of iterations of the local-moving phase (which is the most expensive part of the algorithm). We therefore conclude that partially ordered approaches for vertex processing are likely to provide good performance improvements over fully unordered approaches for parallel implementations of the Louvain algorithm. Vertex ordering via graph coloring has been explored by Halappanavar et al.

Other experiments

adjust-iteration-processing

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.cxx		main.cxx
main.py		main.py
main.sh		main.sh
process.js		process.js
system.txt		system.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adjusting Tolerance

Adjusting Tolerance iteratively

Adjusting Accumulator hashtable capacity

Adjusting Accumulator hashtable capacity without collisions

Adjusting Hash function for Accumulator hashtable

Simple first move optimization

Comparision with Unordered/Synchronous approach

Other experiments

References

About

Languages

License

puzzlef/louvain-communities

Folders and files

Latest commit

History

Repository files navigation

Adjusting Tolerance

Adjusting Tolerance iteratively

Adjusting Accumulator hashtable capacity

Adjusting Accumulator hashtable capacity without collisions

Adjusting Hash function for Accumulator hashtable

Simple first move optimization

Comparision with Unordered/Synchronous approach

Other experiments

References

About

Resources

License

Stars

Watchers

Forks

Languages