-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New behavior of igraph::disjoint_union
#271
Comments
I've moved this to a new issue. I am actually not sure whether this is a problem on our side or on the side of igraph. One and a half years ago, I spotted an issue in how What we have to do on our side now: Check whether we need to adjust our code to be compatible with igraph's fix - or whether there is a problem in igraph's fix that we need to report to igraph again. |
I looked a little bit deeper into this issue:
$artifact.type
$artifact.type[[1]]
[1] "Mail" "Mail"
$artifact.type[[2]]
[1] "Mail" "Mail"
$artifact.type[[3]]
[1] "Mail" "Mail"
$artifact.type[[4]]
[1] "Mail" "Mail" vs. $artifact.type
[1] "Feature" "Feature" "Feature" "Feature" "Feature" The question now is: Why do we have different data structures for the artifact type for the different networks? The first one (i.e., the list) originates from the author network, while the second one (i.e., the character vector) originates from the artifact network. @maxloeffler Could you please investigate why we have different data structures for |
@maxloeffler I forgot about mentioning one additional check in our meeting: Could you please also check what happens in our test suite when you comment out the changes of this commit: a953555 With the new igraph version 2.1.1, the workaround introduced in that commit should not be necessary any more (at least, I'd hope so). However, even if it is not necessary any more, let's keep the workaround for a while not to abruptly cease coronet support of previous igraph versions. |
Reverting the mentioned commit leads to an additional 159 tests failing, so I assume it is still necessary. Also, I realized that in order to patch around in igraphs code of |
Damn. Did you run it with the edge-attribute-handling fix that solves the list vs. character vector problem? If so, that many tests failing is in stark contrast to my expectations, because the fix of igraph should make the mentioned commit needless. Could you please dive deeper into what's the problem in the 159 failing tests? It would be good to know whether there are similar list vs. vector situations that we did not come across yet, or whether something else is completely broken. Thanks! |
It depends. If it is just R code in a function that is directly executed by us, you could try to just redefine the entire function and call your function instead of the one from the package. If it is nested within some other functions or interacts with some non-public functions, than you might be right that recompiling could be necessary. But lets first deal with the edge attributes before moving on to new problems... |
Okay, seems like the 159 was fake news. I must have had some other change effect the outcome without my knowledge. Reverting the patch of a953555 does only raise one additional error. |
Regarding the edge attributes sometimes being of list type and other times being of character type, I found that calling |
I'm afraid you are right. This would involve numerous tests to be changed and checks to be adjusted. But there is one part I am not sure about:
Really? This only would happen if the attributes are vectors of multiple length already beforehand. If there is only one value, then it should be combined into a vector instead of a list. So, in general, we should check for each edge attribute (there is a limited number of them) whether this problem can occur at all. And if so, it might be necessary to wrap exactly these attributes into lists. |
$artifact.type
$artifact.type[[1]]
[1] "Mail" "Mail"
$artifact.type[[2]]
[1] "Mail" "Mail"
$artifact.type[[3]]
[1] "Mail" "Mail"
$artifact.type[[4]]
[1] "Mail" "Mail"
|
A quick update: I dug into the initial setting of edge attributes for all networks and found that it can vary how attributes are obtained and assinged. Therefore, I decided a good point to convert attributes to list is at the end of the Here is my preliminary code for reference: edges = igraph::as_data_frame(network, "edges")
for (attr in igraph::edge_attr_names(network)) {
if (!attr %in% names(EDGE.ATTR.HANDLING)) {
network = igraph::set_edge_attr(network, attr, value = as.list(edges[[attr]]))
}
} From these changes, around 140 tests are now failing. Half of them fail because they expect an edge attribute to be of type The reason for this is caused by a slightly different behavior of Let me know when you have further input or questions |
I started fixing the implementation to work with edge attributes all being in list format and stumbled upon yet another problem. But first of all just a note: Our networks are now not consistent anymore with Here is the problem I ran into: Lets say we want to add bipartite edges to a network via .. ..$ date :List of 5
.. .. ..$ : POSIXct[1:1], format: "2016-07-12 15:58:59"
.. .. ..$ : POSIXct[1:1], format: "2016-07-12 16:00:45"
.. .. ..$ : num 1.47e+09
.. .. ..$ : num 1.47e+09
.. .. ..$ : num 1.47e+09 When listifying first, this problem dissolves. Either we distinguish in code if the network already has edges and if yes only then 'listify', which in my opinion yields bad code or we remove (or adapt) the types of edge attributes for newly created empty networks. Instead of the date attribute being |
Hm, I am not really in favor of this solution. But it seems that it is, unfortunately, quite difficult to fix the tests without helper function. And just creating a new helper function for the tests does also not make sense. So, we need to make sure that the helper function itself is properly tested. If this is the case, I'd agree to use the helper function within the tests, but only with gritted teeth.
Removing the types of edge attributes would be a step backwards - because then everything would be of type On the other hand, when we implement the discussed changes, we make every edge attribute (except for those handled separately by the existing constant) to be of type list anyway––so the type for newly created empty networks would already be wrong if everything should be a list (which then needs to be the case also for empty networks right after network creation). In sum, no matter which solution we choose, we have to give up one of our principles: either we lose concrete types, or we lose type consistency. At the moment, it sounds to be least problematic to adjust case 1) such that empty networks have already listified edge attributes. This would involve that we would omit type information upon network construction, in general (except for those handled separately by the existing constant). In such a case, we would need to make sure that we don't run into any other problems with |
Im kinda sorry to only come with bad news week after week but upon fixing the rest of the issues I fell over a bug that is both grave and hard to fix on our side without hacky code. Let me illustrate the issue with some quick quotes and a bit code. This is from the igraph docs on disjoint_union:
Now as a quick recap, lets see how igraph does implement the edge attribute merging in version 2.1. Most importantly to notice is that they now use the attr <- list()
ec <- sapply(graphs, ecount)
cumec <- c(0, cumsum(ec))
for (i in seq_along(graphs)) {
ea <- edge.attributes(graphs[[i]])
exattr <- intersect(names(ea), names(attr)) # existing and present
noattr <- setdiff(names(attr), names(ea)) # existint and missing
newattr <- setdiff(names(ea), names(attr)) # new
for (a in seq_along(exattr)) {
attr[[exattr[a]]] <- vctrs::vec_c(attr[[exattr[a]]], ea[[exattr[a]]])
}
for (a in seq_along(noattr)) {
attr[[noattr[a]]] <- vctrs::vec_c(attr[[noattr[a]]], vctrs::unspecified(ec[[i]]))
}
for (a in seq_along(newattr)) {
attr[[newattr[a]]] <- vctrs::vec_c(vctrs::unspecified(cumec[[i]]), ea[[newattr[a]]])
}
}
edge.attributes(res) <- attr Finally, lets see how Browse[1]> str(vctrs::unspecified(1))
'vctrs_unspecified' logi NA
Browse[1]> str(x)
List of 4
$ : chr "<thread-13#8>"
$ : chr "<thread-13#8>"
$ : chr "<thread-13#9>"
$ : chr "<thread-13#9>"
Browse[1]> str(vctrs::vec_c(x, vctrs::unspecified(1)))
List of 5
$ : chr "<thread-13#8>"
$ : chr "<thread-13#8>"
$ : chr "<thread-13#9>"
$ : chr "<thread-13#9>"
$ : NULL For what reason soever we get a This time I see good option to fix this. Either, we post-convert |
You don't need to be sorry for that. Everything is just the way it is.
[...]
If this is neither in line with the docs of igraph, nor with the docs of vctrs, this may be something that we should report. The question is: Where should we report it? Maybe to both? Or just to igraph to let them decide on how to deal with that? And we should have an argument why we exactly need this behavior, to convince them that this something that should be fixed. But when we report this, we should also find out prior to reporting what happens with the vertex attributes in coronet when they would use vctrs also for vertex attributes - to be able to mention both problems in one issue.
No, I don't want to treat
I also don't like this solution. If this is really a problem of igraph or vctrs, we should ask them to fix these problems on their side. |
It is indeed not addressed in the docs of # Create network with a duplicate edge.
# Edges have the attr.one edge-attribute.
network.one = igraph::make_empty_graph() +
igraph::vertices("A", "B") +
igraph::edges(c("A", "B", "A", "B"), attr.one = "test")
# Simplify both edges into one.
# Use the "concat" strategy for the attr.one edge-attribute.
network.one = igraph::simplify(network.one, edge.attr.comb = list(attr.one = "concat"))
# Create second network without the attr.one edge-attribute.
network.two = igraph::make_empty_graph() +
igraph::vertices("C", "D") +
igraph::edges(c("C", "D"), attr.two = "test")
# Join both networks.
union = igraph::disjoint_union(network.one, network.two) After running this code you can observe how the non-existence of Browse[1]> str(igraph::as_data_frame(union,"edges"))
'data.frame': 2 obs. of 4 variables:
$ from : chr "A" "C"
$ to : chr "B" "D"
$ attr.one:List of 2
..$ : chr "test" "test"
..$ : NULL
$ attr.two: chr NA "test" This problem is really niche so we have to get a bit more creative if it does not suffice to say that it is not consistent with their documentation. Maybe you know of some statistic evaluation where they use I will now check what happens with the |
Using |
Since igraph version 2.1, when joining networks using 'igraph::disjoint_union', edge attributes of the joining networks require identical types. As simplifiying networks necessarily converts types of edge attributes to list when merging edges, attributes now have to be of type list by default. Edge attributes that are explicitly considered during simplification and, therefore, are not converted to lists are excluded from this rule. This works towards fixing se-sic#271. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Adjust the tests in accordance to converting edge attributes to list type in the implementation. This works towards fixing se-sic#271. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
'plyr::rbind.fill' uses NULL to fill missing values in lists. As we now use lists for most edge attributes, we need to handle this case separately to ensure missing values are filled with NAs instead. To fix this issue, we need to instantiate missing columns in dataframes with NAs before calling 'plyr::rbind.fill'. This operation is constant with respect to the amount of rows and should not impact performance too much. This works towards fixing se-sic#271. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Since igraph version 2.1, when joining networks using 'igraph::disjoint_union', edge attributes of the joining networks require identical types. As simplifiying networks necessarily converts types of edge attributes to list when merging edges, attributes now have to be of type list by default. Edge attributes that are explicitly considered during simplification and, therefore, are not converted to lists are excluded from this rule. This works towards fixing se-sic#271. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Adjust the tests in accordance to converting edge attributes to list type in the implementation. This works towards fixing se-sic#271. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
'plyr::rbind.fill' uses NULL to fill missing values in lists. As we now use lists for most edge attributes, we need to handle this case separately to ensure missing values are filled with NAs instead. To fix this issue, we need to instantiate missing columns in dataframes with NAs before calling 'plyr::rbind.fill'. This operation is constant with respect to the amount of rows and should not impact performance too much. This works towards fixing se-sic#271. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
This works towards fixing se-sic#271. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
This works towards fixing se-sic#271. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Since igraph version 2.1, when joining networks using 'igraph::disjoint_union', edge attributes of the joining networks require identical types. As simplifiying networks necessarily converts types of edge attributes to list when merging edges, attributes now have to be of type list by default. Edge attributes that are explicitly considered during simplification and, therefore, are not converted to lists are excluded from this rule. This works towards fixing se-sic#271. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Adjust the tests in accordance to converting edge attributes to list type in the implementation. This works towards fixing se-sic#271. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
'plyr::rbind.fill' uses NULL to fill missing values in lists. As we now use lists for most edge attributes, we need to handle this case separately to ensure missing values are filled with NAs instead. To fix this issue, we need to instantiate missing columns in dataframes with NAs before calling 'plyr::rbind.fill'. This operation is constant with respect to the amount of rows and should not impact performance too much. This works towards fixing se-sic#271. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
This works towards fixing se-sic#271. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Since igraph version 2.1, when joining networks using 'igraph::disjoint_union', edge attributes of the joining networks require identical types. As simplifiying networks necessarily converts types of edge attributes to list when merging edges, attributes now have to be of type list by default. Edge attributes that are explicitly considered during simplification and, therefore, are not converted to lists are excluded from this rule. This works towards fixing se-sic#271. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Adjust the tests in accordance to converting edge attributes to list type in the implementation. This works towards fixing se-sic#271. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
'plyr::rbind.fill' uses NULL to fill missing values in lists. As we now use lists for most edge attributes, we need to handle this case separately to ensure missing values are filled with NAs instead. To fix this issue, we need to instantiate missing columns in dataframes with NAs before calling 'plyr::rbind.fill'. This operation is constant with respect to the amount of rows and should not impact performance too much. This works towards fixing se-sic#271. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
This works towards fixing se-sic#271. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
The following comment has been posted by @maxloeffler in #260:
Unfortunately, somewhat about their new version breaks in our usage of
igraph::disjoint_union
(which worked prior to updating igraph). I researched a bit and found that the breaking change was introduced 4 months ago in this commit. I have not yet figured out a way to fix that issue.Concrete error description
Error Message:
When breaking the variables are instanciated as follows:
Originally posted by @maxloeffler in #260 (comment)
The text was updated successfully, but these errors were encountered: