-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combinatorial explosion in heteromeric assemblies with many entities #48
Comments
I think we need to use the interface quality information in the generation of assemblies from the graph. We need to exploit the basic requisites of relevant assemblies in terms of interfaces to reduce the search space. One of the requisites I see is that a relevant assembly must have at least one relevant interface per subunit. The issue here, then, is how to determine if an interface is relevant. We can use a score proportional to a combination of the three EPPIC scores, for example. The score threshold should be set very low to reject very few relevant assemblies (FN) and the majority of irrelevant ones (enough to make the problem computable). Now we can build another graph only with the relevant interfaces and run the same brute force algorithm. The assemblies generated will have, in addition to the topological requisites included in the algorithm, at least one relevant interface between each subunit. At this stage, for each assembly we can add all other interfaces that occur between the subunits in the graph, although they were not relevant (for completeness). |
To illustrate this point with an example: 2hda If we apply a cutoff of 200 A area to the interfaces included in the graph we obtain 5 assemblies: one monomer, one dimer (C2), two trimers (C3) and one hexamer (D3). |
@lafita, you're describing a heuristic for reducing the search space. I agree that applying a cutoff such that we don't bother engaging particularly poor interfaces (either judged by interface area, as you suggest, or by some combination of the geo/cs/cr scores) is likely to be the best solution. There are a few reasons I see that we have held off on implementing such heuristics until after finding a good assembly score (eppic-science#83):
|
Although the branch and bound algorithm will return the optimal solution, a single solution is obtained, while the heuristic will reduce the search space enough so that you can find a set of reasonable solutions (not guaranteed to contain the optimal, I agree). I responded more extensively in eppic-science#83. |
We have a solution for this now. There are alternative ideas discussed above, but let's close this for now and discuss those when necessary in new issues. |
With current exhaustive enumeration of assemblies, we have a problem of a combinatorial explosion when we have many entities.
Some examples:
A partial solution that would reduce the amount of assemblies would be not to enumerate the assemblies that are not-fully-covering. At the same time it is nice to have those non-fully-covering assemblies in cases where there seems to be co-crystallisation and the right prediction would be the separate monomers.
A better solution for the long run can be graph contraction at an early stage: there we would have also a combinatorial issue in deciding which edges to contract.
The text was updated successfully, but these errors were encountered: