-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Count number of connected components more efficiently than length(connected_components(g))
#407
base: master
Are you sure you want to change the base?
Changes from 7 commits
59270de
a8f8d19
2316b0f
05e3b7e
0153724
ce66be5
f440525
da1d31e
21b1854
ead687e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -1,26 +1,32 @@ | ||||||||||||||||||||||
# Parts of this code were taken / derived from Graphs.jl. See LICENSE for | ||||||||||||||||||||||
# licensing details. | ||||||||||||||||||||||
""" | ||||||||||||||||||||||
connected_components!(label, g) | ||||||||||||||||||||||
connected_components!(label, g, [search_queue]) | ||||||||||||||||||||||
|
||||||||||||||||||||||
Fill `label` with the `id` of the connected component in the undirected graph | ||||||||||||||||||||||
`g` to which it belongs. Return a vector representing the component assigned | ||||||||||||||||||||||
to each vertex. The component value is the smallest vertex ID in the component. | ||||||||||||||||||||||
|
||||||||||||||||||||||
A `search_queue`, an empty `Vector{eltype(edgetype(g))}`, can be provided to reduce | ||||||||||||||||||||||
allocations if `connected_components!` is intended to be called multiple times sequentially. | ||||||||||||||||||||||
If not provided, it is automatically instantiated. | ||||||||||||||||||||||
|
||||||||||||||||||||||
### Performance | ||||||||||||||||||||||
This algorithm is linear in the number of edges of the graph. | ||||||||||||||||||||||
""" | ||||||||||||||||||||||
function connected_components!(label::AbstractVector, g::AbstractGraph{T}) where {T} | ||||||||||||||||||||||
function connected_components!( | ||||||||||||||||||||||
label::AbstractVector{T}, g::AbstractGraph{T}, search_queue::Vector{T}=Vector{T}() | ||||||||||||||||||||||
) where {T} | ||||||||||||||||||||||
isempty(search_queue) || error("provided `search_queue` is not empty") | ||||||||||||||||||||||
thchr marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||
for u in vertices(g) | ||||||||||||||||||||||
label[u] != zero(T) && continue | ||||||||||||||||||||||
label[u] = u | ||||||||||||||||||||||
Q = Vector{T}() | ||||||||||||||||||||||
push!(Q, u) | ||||||||||||||||||||||
while !isempty(Q) | ||||||||||||||||||||||
src = popfirst!(Q) | ||||||||||||||||||||||
push!(search_queue, u) | ||||||||||||||||||||||
while !isempty(search_queue) | ||||||||||||||||||||||
src = popfirst!(search_queue) | ||||||||||||||||||||||
for vertex in all_neighbors(g, src) | ||||||||||||||||||||||
if label[vertex] == zero(T) | ||||||||||||||||||||||
push!(Q, vertex) | ||||||||||||||||||||||
push!(search_queue, vertex) | ||||||||||||||||||||||
label[vertex] = u | ||||||||||||||||||||||
end | ||||||||||||||||||||||
end | ||||||||||||||||||||||
|
@@ -129,9 +135,69 @@ julia> is_connected(g) | |||||||||||||||||||||
true | ||||||||||||||||||||||
``` | ||||||||||||||||||||||
""" | ||||||||||||||||||||||
function is_connected(g::AbstractGraph) | ||||||||||||||||||||||
function is_connected(g::AbstractGraph{T}) where {T} | ||||||||||||||||||||||
mult = is_directed(g) ? 2 : 1 | ||||||||||||||||||||||
return mult * ne(g) + 1 >= nv(g) && length(connected_components(g)) == 1 | ||||||||||||||||||||||
if mult * ne(g) + 1 >= nv(g) | ||||||||||||||||||||||
label = zeros(T, nv(g)) | ||||||||||||||||||||||
connected_components!(label, g) | ||||||||||||||||||||||
return allequal(label) | ||||||||||||||||||||||
else | ||||||||||||||||||||||
return false | ||||||||||||||||||||||
end | ||||||||||||||||||||||
end | ||||||||||||||||||||||
|
||||||||||||||||||||||
""" | ||||||||||||||||||||||
count_connected_components( g, [label, search_queue]; reset_label::Bool=false) | ||||||||||||||||||||||
|
||||||||||||||||||||||
Return the number of connected components in `g`. | ||||||||||||||||||||||
|
||||||||||||||||||||||
Equivalent to `length(connected_components(g))` but uses fewer allocations by not | ||||||||||||||||||||||
materializing the component vectors explicitly. Additionally, mutated work-arrays `label` | ||||||||||||||||||||||
and `search_queue` can be provided to reduce allocations further (see | ||||||||||||||||||||||
[`connected_components!`](@ref)). | ||||||||||||||||||||||
|
||||||||||||||||||||||
## Keyword arguments | ||||||||||||||||||||||
- `reset_label :: Bool` (default, `false`): if `true`, `label` is reset to zero before | ||||||||||||||||||||||
returning. | ||||||||||||||||||||||
|
||||||||||||||||||||||
## Example | ||||||||||||||||||||||
``` | ||||||||||||||||||||||
julia> using Graphs | ||||||||||||||||||||||
|
||||||||||||||||||||||
julia> g = Graph(Edge.([1=>2, 2=>3, 3=>1, 4=>5, 5=>6, 6=>4, 7=>8])); | ||||||||||||||||||||||
|
||||||||||||||||||||||
length> connected_components(g) | ||||||||||||||||||||||
3-element Vector{Vector{Int64}}: | ||||||||||||||||||||||
[1, 2, 3] | ||||||||||||||||||||||
[4, 5, 6] | ||||||||||||||||||||||
[7, 8] | ||||||||||||||||||||||
|
||||||||||||||||||||||
julia> count_connected_components(g) | ||||||||||||||||||||||
3 | ||||||||||||||||||||||
``` | ||||||||||||||||||||||
""" | ||||||||||||||||||||||
function count_connected_components( | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am a bit undecided if we should call this Ideally we use the same word everywhere. @gdalle Do you have an opinion on that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's also There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I had to pick I'd rather use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Definitely no to I don't mind abbreviation from time to time, but lets go with |
||||||||||||||||||||||
g::AbstractGraph{T}, | ||||||||||||||||||||||
label::AbstractVector{T}=zeros(T, nv(g)), | ||||||||||||||||||||||
search_queue::Vector{T}=Vector{T}(); | ||||||||||||||||||||||
reset_label::Bool=false, | ||||||||||||||||||||||
) where {T} | ||||||||||||||||||||||
connected_components!(label, g, search_queue) | ||||||||||||||||||||||
c = count_unique(label) | ||||||||||||||||||||||
reset_label && fill!(label, zero(eltype(label))) | ||||||||||||||||||||||
return c | ||||||||||||||||||||||
end | ||||||||||||||||||||||
|
||||||||||||||||||||||
function count_unique(label::Vector{T}) where {T} | ||||||||||||||||||||||
seen = Set{T}() | ||||||||||||||||||||||
c = 0 | ||||||||||||||||||||||
for l in label | ||||||||||||||||||||||
if l ∉ seen | ||||||||||||||||||||||
push!(seen, l) | ||||||||||||||||||||||
c += 1 | ||||||||||||||||||||||
end | ||||||||||||||||||||||
end | ||||||||||||||||||||||
return c | ||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's less performant than the explicity looped version though: julia> label_small = rand(1:3, 20)
julia> @b count_unique($label_small)
150.851 ns (4 allocs: 320 bytes) # loop
174.412 ns (4 allocs: 464 bytes) # length(Set(label))
julia> label_big = rand(1:50, 5000)
julia> @b count_unique($label_big)
23.385 μs (11 allocs: 3.312 KiB) # loop
32.719 μs (6 allocs: 72.172 KiB) # length(Set(label))
julia> label_huge = rand(1:5000, 500000)
julia> @b count_unique($label_huge)
3.499 ms (25 allocs: 192.625 KiB) # loop
4.876 ms (6 allocs: 9.000 MiB, 2.51% gc time) # length(Set(label)) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's indeed not very great that the A related thing is that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, it is not really an "issue" in Base, per se: rather, it seems There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's interesting. I did not know that. Btw. if try to be really efficient here - would using |
||||||||||||||||||||||
end | ||||||||||||||||||||||
|
||||||||||||||||||||||
""" | ||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am all for performance improvements. But I am a bit skeptical if it is worth making the interface more complicated.
Almost all graph algorithms need some kind of of work buffer, so we could have something like in al algorithms but in the end it should be the job for Julia's allocator to verify if there is some suitable piece of memory lying around. We can help it by using
sizehint!
with a suitable heuristic.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this will usually not be relevant; in my case it is though, and is the main reason I made the changes. I also agree that there is a trade off between performance improvements and complications of the API. On the other hand, I think passing such work buffers as optional arguments is a good solution to such trade-offs: for most users, the complication can be safely ignored and shouldn't complicate their lives much.
As you say, there are potentially many algorithms in Graphs.jl that could take a work buffer; in light of that, maybe this could be more palatable if we settle on a unified name for these kinds of optional buffers, so that it lowers the complications by standardizing across methods.
Maybe just
work_buffer
(and, if there are multiple,work_buffer1
,work_buffer2
, etc?)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this then all functions should take exactly one
work_buffer
(possibly a tuple) and have an appropriate function to initialize the buffer. I think it is a major change which should be discussed separately.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think if this is really important for your use case you can either
Experimental
submodule. Currently we don't guarantee semantic versioning there - this allows use to remove things in the future without breaking the API.But just to clarify - your problem is not that you are building graphs by adding edges until they are connected? Because if that is the issue, there is a much better algorithm.