Skip to content
This repository has been archived by the owner on Oct 8, 2021. It is now read-only.

Use matrices in FloydWarshallState #38

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 19 additions & 21 deletions src/floyd-warshall.jl
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@
# > WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


type FloydWarshallState<:AbstractPathState
dists::Vector{Vector{Float64}}
parents::Vector{Vector{Int}}
type FloydWarshallState{T<:Real} <: AbstractPathState
dists::Matrix{T}
parents::Matrix{Int}
end

# @doc doc"""
Expand All @@ -36,17 +36,17 @@ end
# Note that it is possible to consume large amounts of memory as the
# space required for the FloydWarshallState is O(n^2).
# """ ->
function floyd_warshall_shortest_paths(
function floyd_warshall_shortest_paths{T<:Real}(
g::AbstractGraph;
edge_dists::AbstractArray{Float64, 2} = Array(Float64,(0,0))
edge_dists::AbstractArray{T,2} = zeros(0,0)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be zeros(T,0,0) instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. A method definition with a default argument is exactly equivalent to two separate method definitions, one of which sets the default value.

f{T}(x::T=0) = x

is exactly equivalent to

f{T}(x::T) = x
f() = f(0)

Therefore, the default argument must already specify a concrete type and cannot refer to a typevar. (You cannot have a method signature f{T}().)

Were you to change the method definition to f{T}(x::T=zero(T)), you would get a runtime error because T is not defined at runtime in the scope of the method that runs zero(T).

)

use_dists = issparse(edge_dists)? nnz(edge_dists > 0) : !isempty(edge_dists)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I seeing a bug here? This should certainly be
use_dists = issparse(edge_dists)? nnz(edge_dists) > 0 : !isempty(edge_dists), no? I'll push a hotfix now.

We need to include tests that contains sparse graphs.

ETA: this was a bug. Fixed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed via #42.

n_v = nv(g)
dists = fill(convert(Float64,Inf), (n_v,n_v))
parents = zeros(Int, (n_v,n_v))
S = typeof(one(T) + one(T))
dists = fill(typemax(S), n_v, n_v)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break things down the line. I'd prefer to keep edge weights as Float64 since we have an isinf test for it. Can we get rid of the type parameterization here and keep it Float64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did think about it, but the only problem I could see is when the longest path length reaches typemax(T), which is only a problem for small integers like UInt8. The advantage of doing T<:Real is that Floyd-Warshall will work transparently on other types like BigFloats and Rationals.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that this will all break horribly if someone decides to use UInt8, and we have no "max test" like isinf for these other types (except for constructing something with typemax and kludging around that way).

I've avoided type parameterization in LightGraphs precisely because of the complexity it introduced with Graphs.jl. LightGraphs is designed to be quick, easy to understand, but not as flexible as Graphs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbromberger does the failure in the case of UInt8 outweigh the benefit of being able to run with Rational edge weights?

I genuinely do not see how the type parameter use in this function leads to any additional user-facing complexity, loss of code readability or loss of performance. Appealing to the complexity of Graph.jl seems like a slippery slope argument to me - the complexity there comes from forcing users to assign types to everything, even to initialize an empty graph for metadata the user doesn't necessarily care about; here, the eltype is very naturally expressed as a tag for the user specified data, which is the entire point of Julia.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each type, we need a value that represents "no edge exists". For floats, it's Inf, and we have Base.isinf as a test.

If we can do that with a single (preferably Base) function across all these parameterized types, I think there's an argument for this sort of flexibility.

Keep in mind that this will be the first type parameterization in LightGraphs. I am very worried about a slippery slope here. (It might be unfounded.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that this will be the first type parameterization in LightGraphs.

Well, it turns out that this statement is false. There are already polymorphic constructors for Graph and Digraph. Furthermore, the type parameters in those functions probably should not be T<:Number, but rather T<:Real. The entries of adjacency matrices are counts of edges (possibly signed due to directedness), and I do not know of any nonintegral or nonreal generalizations of the notion of adjacency matrix. Distances must be real. So in either case the matrix must have real entries, as opposed to complex or something even more exotic.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it turns out that this statement is false.

Yes, it is - my apologies. This was a recent commit to LightGraphs for @jpfairbanks' work with spectral graph theory. The reason they're Number rather than Integer (and this makes the case unequivocally for the proposed change we're discussing here) is because we wanted to be able to pass distance matrices - as well as adjacency matrices - in to create the graph.

Just out of interest, why can't one have complex distances?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not that the typemax should be a different type from "the number in question", rather, it is that the type of "the number in question" can change under addition

OK, but if we had the proposed isconnected function, then wouldn't we run into trouble if we looked at the typemax of the sum? That is, assume we wanted a distance matrix of Uint8 (humor me for a second). What value would we pick for "no edge between these vectors"? If we used the sum, it would have to be typemax(Uint64), which would be unintuitive to the user and would mean that he couldn't pass in a matrix of Uint8 for any incomplete graph.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For your isconnected function, are you think that this will based on dist[i,j] != inf? We could build isconnected on the output of running connected components and then having the output of connected components support a query of connectedness

g = Graph(somedata)
state = connectedcomponents(g)
for vtx1,vtx2 in someiterator
    if isconnected(state, vtx1, vtx2)
         f(vtx1,vtx2)
    end
end

We should probably support this api regardless of what we do on F-W
Connected components should be much faster to compute than all pairs shortest paths. But once you have computed APSP you have all the information to answer connected components.

Also I agree with @jiahao on the type stuff. I would use rational arithmetic for making an illustrative example in a paper or when giving a class. I think people prefer to do fraction arithmetic over floating point arithmetic in their head.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isconnected is only for determining whether or not a given distance matrix element represents a valid distance when we've determined that values in a distance matrix need to be taken into account when performing shortest-paths calculations. For everything else we have a very fast has_edge (or, if you need something more bare-metal, finclist and binclist, but those I think might change to be adjacency lists instead of incidence lists if I can get the performance a touch better).

Keep in mind that distance matrices are deliberately outside the scope of LightGraphs, and the more code we write to support them the less clean the implementation becomes.

(Ah - and if we're talking about testing whether components (subgraphs) are connected, that's a different issue and one that I'd prefer to keep in a separate discussion thread. We should probably have a way to do this, but we need a "view" into the graph itself that describes subgraphs, and that's pretty complicated - and will break memory and speed efficiencies.)

parents = zeros(Int, n_v, n_v)

fws = FloydWarshallState(Vector{Float64}[], Vector{Int}[])
for v in 1:n_v
dists[v,v] = 0.0
end
Expand All @@ -67,23 +67,21 @@ function floyd_warshall_shortest_paths(
end
end
for w in vertices(g), u in vertices(g), v in vertices(g)
if dists[u,v] > dists[u,w] + dists[w,v]
dists[u,v] = dists[u,w] + dists[w,v]
if dists[u,w] == typemax(S) || dists[w,v] == typemax(S)
continue
end
d = dists[u,w] + dists[w,v]
if dists[u,v] > d
dists[u,v] = d
parents[u,v] = parents[w,v]
end
end
for r in 1:size(parents,1) # row by row
push!(fws.parents, vec(parents[r,:]))
end
for r in 1:size(dists,1)
push!(fws.dists, vec(dists[r,:]))
end

return fws
return FloydWarshallState{S}(dists, parents)
end

function enumerate_paths(s::FloydWarshallState, v::Integer)
pathinfo = s.parents[v]
function enumerate_paths(s::FloydWarshallState, v::Int)
pathinfo = slice(s.parents, v, :)
paths = Vector{Int}[]
for i in 1:length(pathinfo)
if i == v
Expand All @@ -101,5 +99,5 @@ function enumerate_paths(s::FloydWarshallState, v::Integer)
return paths
end

enumerate_paths(s::FloydWarshallState) = [enumerate_paths(s, v) for v in 1:length(s.parents)]
enumerate_paths(st::FloydWarshallState, s::Integer, d::Integer) = enumerate_paths(st, s)[d]
enumerate_paths(s::FloydWarshallState) = [enumerate_paths(s, v) for v in 1:size(s.parents, 1)]
enumerate_paths(st::FloydWarshallState, s::Int, d::Int) = enumerate_paths(st, s)[d]
23 changes: 19 additions & 4 deletions test/floyd-warshall.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,22 @@
d = float([ 0 1 2 3 4; 5 0 6 7 8; 9 10 0 11 12; 13 14 15 0 16; 17 18 19 20 0])
z = floyd_warshall_shortest_paths(g3; edge_dists=d)
@test z.dists[3] == [7, 6, 0, 11, 27]
@test z.parents[3] == [2, 3, 0, 3, 4]
d = [ 0 1 2 3 4; 5 0 6 7 8; 9 10 0 11 12; 13 14 15 0 16; 17 18 19 20 0]

for T in (Int8, Int16, Int32, Int64, Int128,
Float16, Float32, Float64)

z = floyd_warshall_shortest_paths(g3; edge_dists=convert(Matrix{T}, d))
@test z.dists == [
0 1 7 18 34
1 0 6 17 33
7 6 0 11 27
18 17 11 0 16
34 33 27 16 0]
@test z.parents == [
0 1 2 3 4
2 0 2 3 4
2 3 0 3 4
2 3 4 0 4
2 3 4 5 0]
end

@test enumerate_paths(z)[2][2] == []
@test enumerate_paths(z)[2][4] == enumerate_paths(z,2)[4] == enumerate_paths(z,2,4) == [2,3,4]