Skip to content

Working with Graphs

John Casey edited this page Jul 31, 2013 · 1 revision

Working with Graphs and Webs of Relationships

EGraphManager

Most of what you need in terms of working with graphs of relationships (and workspaces for these graphs) can be done via the org.commonjava.maven.atlas.effective.EGraphManager class. This is designed to be a simple entry point into the graph database system, which abstracts any need to interact with the database driver beyond constructing it in the first place.

Selecting Sub-Sets of Relationships

Filtering

Atlas provides an interface and set of basic implementations for filtering individual ProjectRelationship instances in a graph. They're useful for traversing or transforming networks of relationships, or just to directly constrain the dependency graph instance returned from the database.

Traversal

Atlas also provides an interface, two abstract base classes, and some basic implementations for traversing a network of relationships. This is done by instantiating the ProjectNetTraversal in question, then calling traverse(..) on the EProjectGraph or EProjectWeb and passing in the traversal instance. Once the traverse(..) completes, the traversal instance should provide extra methods to access whatever information it was designed to accumulate.

The ProjectNetTraversal provides callback methods for starting and ending graph traversal, and for ending edge traversal. Additionally, and most critically, it provides a method that allows the instance to veto the traversal of any given edge. This is where most filters are used, and the simplest way to filter in a traversal is by subclassing AbstractFilteringTraversal. Along with these callbacks and acceptance methods, each traversal has the option of specifying how many times it needs to traverse the graph, and for each iteration, whether it needs to use depth-first or breadth-first traversal to achieve its ends.

Workspaces

Atlas provides durable workspaces to address the ambiguity that can build up in the graph db. These workspaces allow you to constrain results from the database in three dimensions:

  • source URI (the location from which the relationship was discovered)
  • POM location (allowing relationships from profiles with certain names, for instance)
  • version selection (allowing selection of snapshots, ranges, and other variable versions down to a single, concrete version)

Since the workspace is durable, you have the option of building up a very sophisticated set of controls to tailor the output you want over time. Combined with the right filter, you can answer very detailed questions without even doing an explicit traverse(..) call at all. Atlas provides some basic CRUD support for workspaces; just enough to support durability.

The Graph Database

Atlas stores relationship information in a single, global database that covers all relationships regardless of where they were discovered from, and from what part of the POM. The source URI and pom location are noted for each relationship to allow filtering later.

Since project releases in Maven are designed to be immutable, a dependency graph database composed entirely of artifacts released into public repositories should not include any overlapping information for release-level versions of artifacts. However, when snapshots are included, or the database includes staging repositories and the like, depgraph information may become overlaid in the database, producing different output depending on which sources your choices: restricting the set of source locations, or selecting specific timestamped versions for snapshots can produce variable results.

For this reason, it may not be enough to work with the unconstrained relationship data available in the graph db.

Database Drivers

Atlas currently supports two different drivers for its dependency-graph database: Jung, which is an in-memory implementation, and Neo4J, which is backed by Lucene and written to disk. Selection between these drivers depends on your specific needs.