Combinatorial explosion in heteromeric assemblies with many entities #48

josemduarte · 2015-07-17T10:44:51Z

With current exhaustive enumeration of assemblies, we have a problem of a combinatorial explosion when we have many entities.

Some examples:

4d10: 8 entities, 58 interfaces -> out of memory
3ulu: 7 entities, 18 interfaces -> completes, but produces 25194 assemblies!
1eo8: only 4 entities, 15 interfaces, but the list of assemblies is already huge (2649) and breaks the WUI with an out of memory error in jetty

A partial solution that would reduce the amount of assemblies would be not to enumerate the assemblies that are not-fully-covering. At the same time it is nice to have those non-fully-covering assemblies in cases where there seems to be co-crystallisation and the right prediction would be the separate monomers.

A better solution for the long run can be graph contraction at an early stage: there we would have also a combinatorial issue in deciding which edges to contract.

lafita · 2016-04-12T16:15:17Z

I think we need to use the interface quality information in the generation of assemblies from the graph. We need to exploit the basic requisites of relevant assemblies in terms of interfaces to reduce the search space.

One of the requisites I see is that a relevant assembly must have at least one relevant interface per subunit. The issue here, then, is how to determine if an interface is relevant. We can use a score proportional to a combination of the three EPPIC scores, for example. The score threshold should be set very low to reject very few relevant assemblies (FN) and the majority of irrelevant ones (enough to make the problem computable).

Now we can build another graph only with the relevant interfaces and run the same brute force algorithm. The assemblies generated will have, in addition to the topological requisites included in the algorithm, at least one relevant interface between each subunit. At this stage, for each assembly we can add all other interfaces that occur between the subunits in the graph, although they were not relevant (for completeness).

lafita · 2016-04-12T16:35:14Z

To illustrate this point with an example:

2hda
Homomeric D3 example, with 7 interfaces.
Interfaces 1 to 3 have more than 300 A area, while interfaces 4 to 7 have less than 150 A area.
The current code returns 12 assemblies: one monomer, four dimers (C2), three trimers (C3) and four hexamers (D3).

If we apply a cutoff of 200 A area to the interfaces included in the graph we obtain 5 assemblies: one monomer, one dimer (C2), two trimers (C3) and one hexamer (D3).

sbliven · 2016-04-12T17:25:07Z

@lafita, you're describing a heuristic for reducing the search space. I agree that applying a cutoff such that we don't bother engaging particularly poor interfaces (either judged by interface area, as you suggest, or by some combination of the geo/cs/cr scores) is likely to be the best solution.

There are a few reasons I see that we have held off on implementing such heuristics until after finding a good assembly score (eppic-science#83):

Given a linear score you can turn this heuristic into a provably correct branch-and-bound algorithm which will always return the best-scoring assembly
I'm somewhat concerned about the possibility that two borderline interfaces couldn't combine to make a nice assembly. For instance, you sometimes have 3-way "interfaces" where a tail of one chain bridges a weak interface between to other chains (1pmo is an ok example of this).
I still have the sense that we could ameliorate the full enumeration problem through clever edge reduction, but I need to spend a few mornings at the whiteboard to demonstrate or refute this. If so then we could avoid using heuristics in all but the largest cases.

lafita · 2016-04-14T16:26:45Z

Although the branch and bound algorithm will return the optimal solution, a single solution is obtained, while the heuristic will reduce the search space enough so that you can find a set of reasonable solutions (not guaranteed to contain the optimal, I agree).

I responded more extensively in eppic-science#83.

josemduarte · 2017-01-20T01:56:59Z

We have a solution for this now. There are alternative ideas discussed above, but let's close this for now and discuss those when necessary in new issues.

josemduarte added this to the 3.0 milestone Jul 22, 2015

josemduarte mentioned this issue Aug 7, 2015

Disjoint assemblies: how to treat them? #59

Open

josemduarte self-assigned this Mar 18, 2016

josemduarte mentioned this issue Jan 5, 2017

Problems with assemblies enumerated via contracted graph #148

Closed

josemduarte mentioned this issue Jan 20, 2017

Enumeration of heteromeric assemblies through contracted graph #154

Merged

josemduarte closed this as completed Jan 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combinatorial explosion in heteromeric assemblies with many entities #48

Combinatorial explosion in heteromeric assemblies with many entities #48

josemduarte commented Jul 17, 2015

lafita commented Apr 12, 2016

lafita commented Apr 12, 2016

sbliven commented Apr 12, 2016

lafita commented Apr 14, 2016

josemduarte commented Jan 20, 2017

Combinatorial explosion in heteromeric assemblies with many entities #48

Combinatorial explosion in heteromeric assemblies with many entities #48

Comments

josemduarte commented Jul 17, 2015

lafita commented Apr 12, 2016

lafita commented Apr 12, 2016

sbliven commented Apr 12, 2016

lafita commented Apr 14, 2016

josemduarte commented Jan 20, 2017