Consider dropping Vertex ID #131

clue · 2015-03-08T11:21:15Z

Some thoughts:

Ticket Use Attributes for all mathematical properties (weight, balance etc.) #114 will remove all mathematical properties from the Vertex and Edge classes, such as weight, balance etc. and will use general purpose Attributes instead.
This means that the Vertex ID will be the only property remaining that is not being stored in the Attributes. Arguably, this might cause confusion and limits how this library can be used.
Also, the way the Vertex ID is currently implemented, requires it to be unique within the whole graph. Among others, this means that we can not create an exact clone of the vertex within the same graph (see Support native cloning of Base objects #58).
The Vertex ID is often used as an index for arrays / maps. Because of this, it can only be set during construction time and can not be changed later on.

As an alternative, we should consider if dropping the Vertex ID altogether could be an option.

Most use cases can probably store an arbitrary identifier (or any number thereof) in the Attributes instead
I'm not sure if enforcing a unique key constraint is actually needed anywhere and even if so: is this necessarily part of this library?

The text was updated successfully, but these errors were encountered:

clue · 2015-10-12T10:23:58Z

The lack of feedback appears to indicate the Vertex ID will not be missed much :-)

I'll look into removing this and see how it works out.

clue · 2015-10-12T10:24:43Z

For the reference: Dropping the vertex ID also helps avoiding performance issues when adding a huge number of vertices, see #136 for some more background.

ghost · 2017-01-13T17:29:18Z

here is my feedback: I personally use and depend on the ID field a whole lot. The graph uniqueness enforcement comes in handy sometimes. Cloning a vertex within a graph is rather important. However, there should be a "flag" or a mechanism by which we one can choose to enforce uniqueness. As long as that is the case, I don't think anyone will miss the ID field in particular.

clue · 2018-09-29T15:45:44Z

This ticket hasn't seen any updates in quite a while and I suppose it's about time for a progress update at least: I'm still planning to remove the ID attribute, but will postpone this to a later release simply because a release has been outstanding for quite some time and I would rather not introduce too many BC breaks in a single release. This should also give people plenty of time to plan ahead and be aware that they might be relying on a deprecated feature that will be removed with the following release.

phyrwork · 2018-10-11T09:13:50Z

I strongly oppose this change. Vertex identity is a key concept in basically every graph algorithm.

Even the basics like BFS and DFS rely on identity to know when cycles are encountered.

How else do you suggest doing vertex comparison in way that is portable between graph instances?

Suppose we have an algorithm that decomposes a graph into N subgraphs before doing further processing. Simple cycles (graphp/algorithms#33) for example.

A subgraph with all new vertex instances and no $id would only be identifiable by its object instance, which would be a different object to the equivalent vertex in the original graph.

Sure, you would be able to get the strongly connected components, and then the simple cycles within those strongly connected components, but there would be no way to relating the resulting vertices to the original ones.

Pretty much defeats the point of the exercise, no?

phyrwork · 2018-10-11T09:18:31Z

This means that the Vertex ID will be the only property remaining that is not being stored in the Attributes. Arguably, this might cause confusion and limits how this library can be used.

Identity is a core concept. Any other attributes are labels. We can't remove identity.

Also, the way the Vertex ID is currently implemented, requires it to be unique within the whole graph. Among others, this means that we can not create an exact clone of the vertex within the same graph (see Support native cloning of Base objects #58).

What's the point of having an identical clone of a vertex in a graph? That would just be redundant. This is very different to having a new vertex with new identity and a matching set of in/out edges.

The Vertex ID is often used as an index for arrays / maps. Because of this, it can only be set during construction time and can not be changed later on.

Yes, this is the whole point of identity.

clue · 2018-10-11T10:51:29Z

@phyrwork Thank you for your input, I think you're bringing up some very good points here! 👍

This means that the Vertex ID will be the only property remaining that is not being stored in the Attributes. Arguably, this might cause confusion and limits how this library can be used.

Identity is a core concept. Any other attributes are labels. We can't remove identity.

I think we can agree that identity is indeed a core concept and that all other attributes are indeed just arbitrary attributes.

I think we can discuss the idea of whether identity needs to be a core concept in this library and I would argue that we can indeed remove IDs altogether.

Vertex identity is a key concept in basically every graph algorithm.
Even the basics like BFS and DFS rely on identity to know when cycles are encountered.

Admittedly, I agree that relying on the vertex ID is a convenient feature that is used extensively in our algorithms.

As an alternative, we can also rely on PHP's object instance identity using with the === comparator:

$graph = new Graph();
$a = $graph->createVertex();
$b = $graph->createVertex();

assert($a !== $b);

Similarly, many algorithms rely on marking vertices as "visited" and currently rely on the vertex ID for this. This can similarly be achieved by either using spl_object_hash() or SplObjectStorage:

$graph = new Graph();
$a = $graph->createVertex();
$b = $graph->createVertex();

$visited = [
    spl_object_hash($a) => true
];

assert(isset($visited[spl_object_hash($a)]));
assert(!isset($visited[spl_object_hash($b)]));

Both examples have the advantage that this is entirely built-in and nothing we have to worry about. Additionally, this actually ensures identity of an instance, not just identity within some graph. This is an important distinction because a number of places rely on identity when it comes to multiple independent (or cloned) graphs like this:

$graph = new Graph();
$a = $graph->createVertex();

$graph = new Graph();
$b = $graph->createVertex();

// these are never the same, even if they may have the same vertex ID
assert($a !== $b);

How else do you suggest doing vertex comparison in way that is portable between graph instances?

Suppose we have an algorithm that decomposes a graph into N subgraphs before doing further processing. Simple cycles (graphp/algorithms#33) for example.

A subgraph with all new vertex instances and no $id would only be identifiable by its object instance, which would be a different object to the equivalent vertex in the original graph.

Again a very good point. And arguably one that I suppose may be a bit more tricky to address. If possible, it may be best to avoid this altogether (depending on the algorithm of course). As an alternative, one option would be to map from the original vertex instance to the cloned vertex instance relying on vertex positions:

$graph = new Graph();
$graph->createVertex();
$graph->createVertex();

$clone = $graph->createGraphClone();

$original = $graph->getVertices();
$new = $clone->getVertices();

assert($original[0] !== $new[0]);
assert($original[1] !== $new[1]);

Similarly, one can easily build a map using the instance identifiers as in the previous example. Or as another alternative, we may even attach some arbitrary identifier attributes to the original graph before cloning and then use this to match from the cloned instances.

Admittedly, this may sound like a non-trivial change and one that still has to be evaluated to see if it's reasonable to change our algorithms like this and how much of a big deal this actually is. My gut feeling is that this sounds more challenging than it actually is, as not too many algorithms should rely on this in the first place.

Again, your input is very much appreciated! The suggested change is still under discussion and nothing is set in stone yet. Keep it coming, any input is welcome!

phyrwork · 2018-10-12T02:49:59Z

As an alternative, we can also rely on PHP's object instance identity using with the === comparator:

Yes, this is what I was getting at. But you're suggesting that vertex objects can be shared between graphs.

And how would this vertex ID be represented in a user friendly way when exporting to X format?

Not very user-friendly to export a huge string when you could just export 1,2,3,4...

Singleton vertex objects complicates a whole bunch of stuff.

This can similarly be achieved by either using spl_object_hash() or SplObjectStorage

one option would be to map from the original vertex instance to the cloned vertex instance relying on vertex positions

Similarly, one can easily build a map using the instance identifiers as in the previous example. Or as another alternative, we may even attach some arbitrary identifier attributes to the original graph before cloning and then use this to match from the cloned instances

Essentially you are suggesting maintaining a linked list of objects to their original objects in place of simply retaining vertex ID like every other graph library in existence does.

Vertex ID is fundamental in graph theory.

Look at every diagram that explains graph theory ever. Nodes are displayed with numbers representing their origin.

Traversing a list/maps to maps to maps to... (you get the idea) is wasteful and overly complicated.

I advise against this move. It seems like a simplification but the ramifications are significant.

clue · 2019-11-30T21:15:12Z

@phyrwork I think we agree that identity is an important concept in graph theory.

I still think that it makes sense to rely on language-level identity (object identity) instead of applying arbitrary identifiers on top of the existing objects. I've just merged #185 and have prepared related changes for our algorithms and graph rendering and the changeset looks quite reasonable (imho). I'll file these PRs as soon as possible and will link against this ticket to keep this discussion going.

If we feel that having explicit identifiers in here makes sense in the future (again), we can still add this later on without introducing a BC break.

Thanks for the discussion! 👍

clue added help wanted BC break labels Mar 8, 2015

clue mentioned this issue Mar 8, 2015

Support native cloning of Base objects #58

Closed

clue mentioned this issue Oct 12, 2015

Adding more vertices to a huge graph is very slow #136

Closed

clue added this to the v0.10.0 milestone Oct 12, 2015

clue self-assigned this Oct 12, 2015

clue modified the milestones: v0.10.0, v0.11.0 Sep 29, 2018

clue mentioned this issue Sep 27, 2019

Add a Group entity #168

Closed

clue modified the milestones: v0.11.0, v1.0.0 Sep 29, 2019

clue mentioned this issue Nov 29, 2019

Remove Vertex IDs, related mappings etc. #185

Merged

clue closed this as completed in #185 Nov 30, 2019

sandrokeil mentioned this issue Dec 18, 2019

Add vertex id XML attribute as graph attribute ? graphp/graphml#14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider dropping Vertex ID #131

Consider dropping Vertex ID #131

clue commented Mar 8, 2015

clue commented Oct 12, 2015

clue commented Oct 12, 2015

ghost commented Jan 13, 2017

clue commented Sep 29, 2018

phyrwork commented Oct 11, 2018

phyrwork commented Oct 11, 2018

clue commented Oct 11, 2018

phyrwork commented Oct 12, 2018

clue commented Nov 30, 2019

Consider dropping Vertex ID #131

Consider dropping Vertex ID #131

Comments

clue commented Mar 8, 2015

clue commented Oct 12, 2015

clue commented Oct 12, 2015

ghost commented Jan 13, 2017

clue commented Sep 29, 2018

phyrwork commented Oct 11, 2018

phyrwork commented Oct 11, 2018

clue commented Oct 11, 2018

phyrwork commented Oct 12, 2018

clue commented Nov 30, 2019