Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider dropping Vertex ID #131

Closed
clue opened this issue Mar 8, 2015 · 9 comments · Fixed by #185
Closed

Consider dropping Vertex ID #131

clue opened this issue Mar 8, 2015 · 9 comments · Fixed by #185
Assignees
Milestone

Comments

@clue
Copy link
Member

clue commented Mar 8, 2015

Some thoughts:

  • Ticket Use Attributes for all mathematical properties (weight, balance etc.) #114 will remove all mathematical properties from the Vertex and Edge classes, such as weight, balance etc. and will use general purpose Attributes instead.
  • This means that the Vertex ID will be the only property remaining that is not being stored in the Attributes. Arguably, this might cause confusion and limits how this library can be used.
  • Also, the way the Vertex ID is currently implemented, requires it to be unique within the whole graph. Among others, this means that we can not create an exact clone of the vertex within the same graph (see Support native cloning of Base objects #58).
  • The Vertex ID is often used as an index for arrays / maps. Because of this, it can only be set during construction time and can not be changed later on.

As an alternative, we should consider if dropping the Vertex ID altogether could be an option.

  • Most use cases can probably store an arbitrary identifier (or any number thereof) in the Attributes instead
  • I'm not sure if enforcing a unique key constraint is actually needed anywhere and even if so: is this necessarily part of this library?
@clue
Copy link
Member Author

clue commented Oct 12, 2015

The lack of feedback appears to indicate the Vertex ID will not be missed much :-)

I'll look into removing this and see how it works out.

@clue
Copy link
Member Author

clue commented Oct 12, 2015

For the reference: Dropping the vertex ID also helps avoiding performance issues when adding a huge number of vertices, see #136 for some more background.

@ghost
Copy link

ghost commented Jan 13, 2017

here is my feedback: I personally use and depend on the ID field a whole lot. The graph uniqueness enforcement comes in handy sometimes. Cloning a vertex within a graph is rather important. However, there should be a "flag" or a mechanism by which we one can choose to enforce uniqueness. As long as that is the case, I don't think anyone will miss the ID field in particular.

@clue clue modified the milestones: v0.10.0, v0.11.0 Sep 29, 2018
@clue
Copy link
Member Author

clue commented Sep 29, 2018

This ticket hasn't seen any updates in quite a while and I suppose it's about time for a progress update at least: I'm still planning to remove the ID attribute, but will postpone this to a later release simply because a release has been outstanding for quite some time and I would rather not introduce too many BC breaks in a single release. This should also give people plenty of time to plan ahead and be aware that they might be relying on a deprecated feature that will be removed with the following release.

@phyrwork
Copy link

I strongly oppose this change. Vertex identity is a key concept in basically every graph algorithm.

Even the basics like BFS and DFS rely on identity to know when cycles are encountered.

How else do you suggest doing vertex comparison in way that is portable between graph instances?

Suppose we have an algorithm that decomposes a graph into N subgraphs before doing further processing. Simple cycles (graphp/algorithms#33) for example.

A subgraph with all new vertex instances and no $id would only be identifiable by its object instance, which would be a different object to the equivalent vertex in the original graph.

Sure, you would be able to get the strongly connected components, and then the simple cycles within those strongly connected components, but there would be no way to relating the resulting vertices to the original ones.

Pretty much defeats the point of the exercise, no?

@phyrwork
Copy link

  • This means that the Vertex ID will be the only property remaining that is not being stored in the Attributes. Arguably, this might cause confusion and limits how this library can be used.

Identity is a core concept. Any other attributes are labels. We can't remove identity.

  • Also, the way the Vertex ID is currently implemented, requires it to be unique within the whole graph. Among others, this means that we can not create an exact clone of the vertex within the same graph (see Support native cloning of Base objects #58).

What's the point of having an identical clone of a vertex in a graph? That would just be redundant. This is very different to having a new vertex with new identity and a matching set of in/out edges.

  • The Vertex ID is often used as an index for arrays / maps. Because of this, it can only be set during construction time and can not be changed later on.

Yes, this is the whole point of identity.

@clue
Copy link
Member Author

clue commented Oct 11, 2018

@phyrwork Thank you for your input, I think you're bringing up some very good points here! 👍

This means that the Vertex ID will be the only property remaining that is not being stored in the Attributes. Arguably, this might cause confusion and limits how this library can be used.

Identity is a core concept. Any other attributes are labels. We can't remove identity.

I think we can agree that identity is indeed a core concept and that all other attributes are indeed just arbitrary attributes.

I think we can discuss the idea of whether identity needs to be a core concept in this library and I would argue that we can indeed remove IDs altogether.

Vertex identity is a key concept in basically every graph algorithm.
Even the basics like BFS and DFS rely on identity to know when cycles are encountered.

Admittedly, I agree that relying on the vertex ID is a convenient feature that is used extensively in our algorithms.

As an alternative, we can also rely on PHP's object instance identity using with the === comparator:

$graph = new Graph();
$a = $graph->createVertex();
$b = $graph->createVertex();

assert($a !== $b);

Similarly, many algorithms rely on marking vertices as "visited" and currently rely on the vertex ID for this. This can similarly be achieved by either using spl_object_hash() or SplObjectStorage:

$graph = new Graph();
$a = $graph->createVertex();
$b = $graph->createVertex();

$visited = [
    spl_object_hash($a) => true
];

assert(isset($visited[spl_object_hash($a)]));
assert(!isset($visited[spl_object_hash($b)]));

Both examples have the advantage that this is entirely built-in and nothing we have to worry about. Additionally, this actually ensures identity of an instance, not just identity within some graph. This is an important distinction because a number of places rely on identity when it comes to multiple independent (or cloned) graphs like this:

$graph = new Graph();
$a = $graph->createVertex();

$graph = new Graph();
$b = $graph->createVertex();

// these are never the same, even if they may have the same vertex ID
assert($a !== $b);

How else do you suggest doing vertex comparison in way that is portable between graph instances?

Suppose we have an algorithm that decomposes a graph into N subgraphs before doing further processing. Simple cycles (graphp/algorithms#33) for example.

A subgraph with all new vertex instances and no $id would only be identifiable by its object instance, which would be a different object to the equivalent vertex in the original graph.

Again a very good point. And arguably one that I suppose may be a bit more tricky to address. If possible, it may be best to avoid this altogether (depending on the algorithm of course). As an alternative, one option would be to map from the original vertex instance to the cloned vertex instance relying on vertex positions:

$graph = new Graph();
$graph->createVertex();
$graph->createVertex();

$clone = $graph->createGraphClone();

$original = $graph->getVertices();
$new = $clone->getVertices();

assert($original[0] !== $new[0]);
assert($original[1] !== $new[1]);

Similarly, one can easily build a map using the instance identifiers as in the previous example. Or as another alternative, we may even attach some arbitrary identifier attributes to the original graph before cloning and then use this to match from the cloned instances.

Admittedly, this may sound like a non-trivial change and one that still has to be evaluated to see if it's reasonable to change our algorithms like this and how much of a big deal this actually is. My gut feeling is that this sounds more challenging than it actually is, as not too many algorithms should rely on this in the first place.

Again, your input is very much appreciated! The suggested change is still under discussion and nothing is set in stone yet. Keep it coming, any input is welcome!

@phyrwork
Copy link

As an alternative, we can also rely on PHP's object instance identity using with the === comparator:

Yes, this is what I was getting at. But you're suggesting that vertex objects can be shared between graphs.

And how would this vertex ID be represented in a user friendly way when exporting to X format?

Not very user-friendly to export a huge string when you could just export 1,2,3,4...

Singleton vertex objects complicates a whole bunch of stuff.

This can similarly be achieved by either using spl_object_hash() or SplObjectStorage

one option would be to map from the original vertex instance to the cloned vertex instance relying on vertex positions

Similarly, one can easily build a map using the instance identifiers as in the previous example. Or as another alternative, we may even attach some arbitrary identifier attributes to the original graph before cloning and then use this to match from the cloned instances

Essentially you are suggesting maintaining a linked list of objects to their original objects in place of simply retaining vertex ID like every other graph library in existence does.

Vertex ID is fundamental in graph theory.

Look at every diagram that explains graph theory ever. Nodes are displayed with numbers representing their origin.

Traversing a list/maps to maps to maps to... (you get the idea) is wasteful and overly complicated.

I advise against this move. It seems like a simplification but the ramifications are significant.

@clue
Copy link
Member Author

clue commented Nov 30, 2019

@phyrwork I think we agree that identity is an important concept in graph theory.

I still think that it makes sense to rely on language-level identity (object identity) instead of applying arbitrary identifiers on top of the existing objects. I've just merged #185 and have prepared related changes for our algorithms and graph rendering and the changeset looks quite reasonable (imho). I'll file these PRs as soon as possible and will link against this ticket to keep this discussion going.

If we feel that having explicit identifiers in here makes sense in the future (again), we can still add this later on without introducing a BC break.

Thanks for the discussion! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants