-
Notifications
You must be signed in to change notification settings - Fork 81
Graph data structure and algorithm design #16
Comments
@lindahua The Boost Graph library is great, but wouldn't we need multiple inheritance for that? |
I was actually suggesting taking the approach that decouples the graph concepts / types and graph algorithms, such that one can write generic algorithms work on multiple graph types (that implement some concept) or write a specific graph type to fit his need, but have generic algorithms working on it. In terms of concept/type hierarchy, BGL did use a multiple inheritance hierarchy. But we can use Union as work-around. The BGL concept hierarchy as shown in Figure 1 in (http://www.boost.org/doc/libs/1_53_0/libs/graph/doc/graph_concepts.html) can be expressed in Julia, as abstract Graph
abstract IncidenceGraph <: Graph
abstract AdjacencyGraph <: Graph
abstract AdjacencyMatrix <: Graph
abstract BidirectionalGraph <: IncidenceGraph
abstract VertexListOnlyGraph <: Graph # only provides the vertex list
abstract EdgeListOnlyGraph <: Graph # only provides the edge list
abstract VertexAndEdgeListGraph <: Graph # provides both vertex and edge list
typealias VertexListGraph Union(VertexListOnlyGraph, VertexAndEdgeListGraph)
typealias EdgeListGraph Union(EdgeListOnlyGraph, VertexAndEdgeListGraph) I think this nearly reproduces the BGL concept hierarchy. The only issue is the relations between For graph type author, they can choose whether to declare their type as a sub-type of |
BGL provides a plethora of graph algorithms. Implementing such algorithms in an efficient way would take a lot of efforts. If we write the graph types using BGL's interfaces. Those algorithms can be ported to Julia much more easily. |
The union trick is very nice! This indeed comes very close to the BGL hierarchy. With this notation the graph type I had in the Networks.jl package would look something like this (see below), and the graph type in the Graphs.jl package could inherit EdgeListOnlyGraph. I think it would also be nice to incorporate the PropertyMap thing BGL has. module Graphs
import
Base.start,
Base.done,
Base.next
export
Graph,
IncidenceGraph, AdjacencyGraph, AdjacencyMatrix, BidirectionalGraph,
VertexListOnlyGraph, EdgeListOnlyGraph, VertexAndEdgeListGraph,
VertexListGraph, EdgeListGraph,
AdjacencyList,
adj,
has_vertex,
add_vertex!,
add_edge!
abstract Graph
abstract IncidenceGraph <: Graph
abstract AdjacencyGraph <: Graph
abstract AdjacencyMatrix <: Graph
abstract BidirectionalGraph <: IncidenceGraph
abstract VertexListOnlyGraph <: Graph # only provides the vertex list
abstract EdgeListOnlyGraph <: Graph # only provides the edge list
abstract VertexAndEdgeListGraph <: Graph # provides both vertex and edge list
typealias VertexListGraph Union(VertexListOnlyGraph, VertexAndEdgeListGraph)
typealias EdgeListGraph Union(EdgeListOnlyGraph, VertexAndEdgeListGraph)
type AdjacencyList{V, E} <: VertexListOnlyGraph
adj::Dict{V, Dict{V, E}}
end
AdjacencyList(V,E) = AdjacencyList{V,E}(Dict{V,Dict{V,E}}())
adj(g::AdjacencyList) = g.adj
start{V,E}(g::AdjacencyList{V, E}) = start(g.adj)[1]
done{V,E}(g::AdjacencyList{V, E}, state) = done(g.adj, state)
function next{V,E}(g::AdjacencyList{V, E}, state)
n = next(g.adj, state)
n[1][1], n[2]
end
has_vertex{V,E}(g::AdjacencyList{V,E}, u::V) = has(adj(g), u)
function add_vertex!{V,E}(g::AdjacencyList{V,E}, u::V)
if !has_vertex(g, u)
adj(g)[u] = Dict{V, E}()
end
u
end
function add_edge!{V,E}(g::AdjacencyList{V,E}, u::V, v::V, e::E)
add_vertex!(g, u)
add_vertex!(g, v)
adj(g)[u][v] = e
adj(g)[v][u] = e
end
end |
Also cc: @JeffBezanson, @StefanKarpinski |
@daviddelaat What you outlined above is pretty much what I was thinking about. More thoughts about the package structure:
The Actually, there is another package that needs to be ported. The heaps -- this is vital for implementing for many important graph algorithms. For example, Dijkstra shortest path, Prim's algorithm on Minimum spanning tree, etc are relying on a heap. Fibonacci heap is generally recommended for these things, as it has lower amortized cost than say binary heap, but it is not as easy to implement. Perhaps, we may port Boost.Heap to Heap.jl One of the advantage of Julia is that we can implement such data structures and algorithms quite efficiently. (I can't really imagine someone can vectorize such things in MATLAB). Once we have these ready, Julia can clearly demonstrate its advantage over other technical computing languages (e.g. MATLAB and Python). |
See Fast operations on user defined data structures is really where julia shines. I did a bunch of parallel graph algorithms using distributed sparse matrices in Matlab + Star-P for my thesis, but the vectorization only takes you so far. |
It may also be useful to provide |
Most graph algorithms don't actually rely on being able to do random access to pick individual vertices or edges. What those concepts require is that you can provide an iteration interface (use So, basically we can have something like this for v in vertices(g)
# do something on vertex v
end If it happens that |
The ability to do random indexing is good for interactive exploration of a graph, where you may pick a subgraph and examine it further, for example. In any case, I agree that |
Thinking about this a bit more. How do we specify that a function requires the graph to inherit more than one of the above abstract types. The breadth first search algorithm, for instance, requires the graph on which it operates to be both an IncidenceGraph and VertexListGraph. There is no Intersection function, right? |
In development of probabilistic inference stuff, I got to a point that needs the tree & graph data structure. In particular, I need an efficient graph representation that does not explicitly store a vertex set (instead using 1:n by default to refer to vertices). I am going to fork this and create a BGL branch to experiment with the BGL concepts along the development of some more graph types. When it is ready, I will do a pull request. |
Sounds like a great idea to me. |
Sounds good! I'm definitely interested in seeing the BGL concepts modeled in Julia. |
I outlined the interface specification here: https://github.com/lindahua/Graphs.jl/blob/bgl/Interfaces.md I continue to work on several specific graph types, and will wrap John's original graph types under this interface. Let me know if you have any feedback. |
There are some problems that I would like to listen to more suggestions before proceeding: It is very common for a specific graph type to implement multiple concepts -- for example -- an incidence graph can also provide a list of edges / vertices. Or something can be simultaneously a tree and with an incidence graph. In a more abstract level. We can decompose things into several orthogonal aspects: A, B, C, D. A particular type can provide some combination of them. I would like to express
Julia does not allow to express this (neither does C++ actually -- C++ concept has not been accepted to standard). What BGL did is to just document the interface of each concept, and document which concept a specific graph type implements. Instead of using the hierarchical system. What about we do something similar. Only defines a Graph algorithms always take arguments of type |
It seems reasonable to do what BGL has done. The hierarchy allows reuse of code - but in this case, the data structures actually change, requiring much of the interface to be rewritten anyways. I feel that it is best to just have |
Thinking about sequencing the implementation, I would prefer that we have one general purpose but complete implementation for a specific type of graph, rather than partial implementations of lots of graph types. |
Update: Following BGL convention, interfaces have been standardized. Instead of using a complex type hierarchy, the current system only uses one abstract type Concrete graph types should be derived from type MyGraph <: AbstractGraph
... # some definitions
end
# declares that MyGraph implements all interfaces required by
# vertex list, edge list, and adjacency graph
@graph_implements MyGraph vertex_list edge_list adjacency_graph
# on algorithm part
function bfs_traverse(g::AbstractGraph, ... )
# check whether g implements the required concepts
@graph_requires g vertex_list adjacency_list
....
end In BGL, all requirements are only documented, and it does not actually enforce it through function interface. Therefore, when some requirements are violated, the errors will only be raised deep inside the algorithm implementation, usually resulting in miles-long mysterious error messages. The In addition, I implemented two generic graph types
type AdjacencyList{V, VList, AList}
# V: vertex type
# VList: the type of vertex list
# AList: the type of adjacency list.
# Suppose a is an instance of AList, then a[v] is an iterable container or vertices. This definition is very flexible, and can achieve different goals by using different parameters. For example,
is very efficient. Here you simply use In some cases, one might want to use symbols as vertices, and use dictionary to retrieve neighbor list of a specific vertex, then the type can be written as
This representation contains more information, but is much less efficient. Therefore, with one generic type, one can achieve different trade-off between various goals.
A set of friendly constructing functions are provided such that users need not write such long type names in practice. I believe some of the original graph types in this package can be wrapped as special case of these generic types. I will play with the interfaces by implementing several graph algorithms. |
|
It would be nice if Julia had some way of defining mixins, relying on macros to verify if a type implements some interface is definitely a kludge. |
Just wanted to point a few references which might provide a starting/point or point of comparison with sparse matrices. https://github.com/dgleich/gaimc - graph algorithms in Matlab code basically just applies a few graph algorithms on top of the sparse matrix type. For the Matlab code, we have to extract the CSR arrays ourselves (sparse_to_csr), but for Julia, you could use the internal ones directly. At one point in Matlab's JIT history, I got 4-times the speed of the BGL library below... https://github.com/dgleich/matlab-bgl/tree/master/libmbgl - this is just a wrapper around the BGL functions that exports a more "c"-oriented interface on the CSR arrays for a sparse matrix. This has proven really annoying to maintain. If you are looking for a better library to use, checkout igraph. That is a wonderful set of network analysis functions. |
Thanks for all the great references. Under current design, it is very easy to wrap SparseMatrixCSC into a graph by simply wrapping it into a light weight wrapper type, and providing several basic interface. Then all algorithms that work on other graphs would work on sparse matrices. I have used the MATLAB-BGL before, and I agreed @dgleich that there is considerable overhead at the boundary between MATLAB and the underlying BGL library. Here, we are taking an approach that implements the Graph library in Julia rather than interfacing with a C library. This has several benefits:
Having said that, it may took a little bit more time to get this up and working (wrapping a C library is generally easier though). The current plan is to "translate" BGL codes (implement things in Julia, but take BGL and other open source libraries as reference for implementation). I am now on an urgent deadline on mid April (ICCV 2013). And will resume the implementation after that. |
I agree with the general direction. This will also push the development of the language and the compiler. Having a language with first class support for regular and irregular data structures will be really awesome. |
Hi, This looks great. I was wondering if the current plan covers the support Thanks, Glen On Tue, Mar 26, 2013 at 9:43 AM, Viral B. Shah notifications@git.luolix.topwrote:
|
Latest updates: The interface has been stabilized. Classic graph algorithms have been implemented or are being implemented:
Note that all these algorithms are implemented based on generic interfaces, so all of them can be applied to any graph classes that provide required set of interfaces. I will propose to merge this branch once this basic set of algorithms are implemented and properly tested & a brief document is ready. |
That's great news. Thanks so much for your work on this, Dahua! |
This is pretty awesome. That's quite a decent set of algorithms. One of the things I want to be able to do is call some of these algorithms on |
All algorithms listed above have been implemented and tested. Currently, there are three types: I am going to add a special type to represent the original graph type in this package, the one that allows attributes attached to vertices or edges.
|
@johnmyleswhite would you please grant me the commit privilege to this package? I would like to first move the BGL branch here, which would make things easier. I will request comments & review when I think BGL is ready, and will merge it to the master when people think it is good to merge. |
Done. Sorry that I hadn't realized you didn't have commit access before. |
Almost done. I am writing up a brief document. |
Here are some things that could be helpful, from updating the dot serialization stuff right now: First, document vertex_index(v), I re-implemented that before realizing it already existed. I was thinking of adding it to interface.rst, as "Vertex Interface". What do you think? Second, I need to be able to check whether vertices, edges, and the whole graph have attributes, so I know if I need to serialize those as I traverse the graph. Third, do you think it would be good to have slower edges(g) and vertices(g) functions for graphs that don't provide that concept? As in, a vertex iterator that runs across the edges, ignoring repetition, and an edge iterator that runs across the adjacency/incidence lists. I'm going to move the random graph generation stuff out of original now, then go back to attribute serialization. |
What do you think about making the interfaces explicit in code, and implementing @graph_requires by using method_exists! to check whether the required functions actually exist, like BOOST_CONCEPT_ASSERT? I think that would be very helpful in writing graph adapters. |
@rsofaer Checking existence of multiple methods is much more heavier than doing You can easily adapt any class using |
@lindahua I still think that having the requirements of each concept be executable is valuable. As we have more graph interfaces, some of which will never be part of this repo, I think being able to verify that an interface you're building meets the requirements of the concepts you want to implement will create some peace of mind. What if |
Hi all! Not much happening lately in this thread, and that's probably because you are happy with the data structure you have. Still, I think soon I'll start experimenting with wrapping igraph in Julia. I was excited about doing this a long time ago, but then https://groups.google.com/forum/#!searchin/julia-dev/igraph$20julia$20wrapping/julia-dev/LzeQYQTdewM/QqR3vB0urSwJ cooled me down. It seemed you wanted to have your own graph package, which is understandable. I think there are some exciting benefits in wrapping igraph, as opposed to another library, or having your own graph library from scratch.
While igraph is huge, the API that deals with the actual data type is thin, 20 functions. The rest of the library are all using this API. This means that replacing the data type is very easy, a student replaced it with an SQL database, as an experiment, just by rewriting those 20 functions. It is possible to replace the data structure with a Julia object, actually. Essentially, one would need to implement the 20 function API in Julia, and include some default callbacks to it from C. Assuming that calling back to Julia from C is not too costly, this can work just fine, although I am happy with the current data structure igraph has. Of course I can do the wrapping on my own, I just thought I would let you know. Disclaimer: I am a developer of igraph, so I am naturally biased. Best, |
Very welcome news, @gaborcsardi I am certainly willing to help if only in answering questions of the form, "I know how to do X in R, what is a good way of doing of doing X in Julia?" |
There is a discussion on graph data structure and algorithm design that started out at:
JuliaLang/METADATA.jl#91
I am opening this issue to bring that discussion here.
Cc: @daviddelaat @lindahua
The text was updated successfully, but these errors were encountered: