Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speeding up add function for large compounds #1107

Merged
merged 28 commits into from
Apr 28, 2023

Conversation

chrisiacovella
Copy link
Contributor

@chrisiacovella chrisiacovella commented Apr 21, 2023

PR Summary:

This refers to issue #1104 . This PR aims to improve the performance of the Compound.add function and loading routines that rely upon it.

The basic gist, as outlined in the issue above, is that when constructing a compound using the add function, the performance can degrade as the compound grows in size, due to the repeated merging (i.e., composing) of bond_graphs, specifically, merging a small with a large bond graph over and over again. This PR changes the underlying logic such that if a list of Compounds is passed to the add function, it will use the compose_all function to merge these bond_graphs together, before adding to the root Compound (and merging bond_graphs with the root compound). The compose_all function effectively scales with the number of compounds being merged.

This provides substantial speed improvements, as outlined in the issue.

Other additions, Compound.add now accepts a list for the label argument if compounds are provided in a list.

This is still a WIP, as tests need to be added for adding labels via a list, as well as adding in the updated load functions to stash compounds into lists (mdtraj conversion is basically complete and provides substantial speed up).

PR Checklist


  • Includes appropriate unit test(s)
  • Appropriate docstring(s) are added/updated
  • Code is (approximately) PEP8 compliant
  • Issue(s) raised/addressed?

@chrisiacovella chrisiacovella added WIP Work in Progress, not ready for merging enhancement labels Apr 21, 2023
mbuild/compound.py Fixed Show fixed Hide fixed
mbuild/compound.py Fixed Show fixed Hide fixed
if label is not None:
self.add(
child,
label=label_list[i],

Check failure

Code scanning / CodeQL

Potentially uninitialized local variable

Local variable 'label_list' may be used before it is initialized.
if (
len(temp_bond_graphs) != 0
and not isinstance(self, Port)
and children_bond_graph is not None

Check failure

Code scanning / CodeQL

Potentially uninitialized local variable

Local variable 'children_bond_graph' may be used before it is initialized.
@codecov
Copy link

codecov bot commented Apr 21, 2023

Codecov Report

Patch coverage: 97.56% and project coverage change: +0.10 🎉

Comparison is base (1211260) 89.39% compared to head (76bcdc8) 89.50%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1107      +/-   ##
==========================================
+ Coverage   89.39%   89.50%   +0.10%     
==========================================
  Files          61       61              
  Lines        6235     6305      +70     
==========================================
+ Hits         5574     5643      +69     
- Misses        661      662       +1     
Impacted Files Coverage Δ
mbuild/conversion.py 89.55% <93.54%> (+0.22%) ⬆️
mbuild/compound.py 97.07% <100.00%> (+0.12%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@chrisiacovella
Copy link
Contributor Author

chrisiacovella commented Apr 21, 2023

timings for loading mol2 files of different size:

N waters new time (s) old time (s)
1000 0.347 4.92
5000 1.46 266.1
10000 3.02 too slow

@chrisiacovella
Copy link
Contributor Author

Speed improvements for converting parmed are effectively the same as for mdtraj. Note, converting from gmso will need modification still, but that calls a function in gmso, so it will need to be a separate gmso PR.

@chrisiacovella chrisiacovella removed the WIP Work in Progress, not ready for merging label Apr 22, 2023
@@ -1899,6 +1963,58 @@
overwrite_nglview_default(widget)
return widget

def condense(self, inplace=True):

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns

Mixing implicit and explicit returns may indicate an error as implicit returns always return None.
@chrisiacovella
Copy link
Contributor Author

chrisiacovella commented Apr 25, 2023

I added in a condense function to Compound. This is similar to flatten, but it adds an intermediate level in the hierarchy based on connectivity. This refers to issue #1108 .

To reiterate the issue, take a compound that is like this:

Compound, 28 particles, 18 bonds, 4 children
├── [C x 1], 1 particles, 0 bonds, 0 children
├── [Compound x 1], 3 particles, 2 bonds, 1 children
│   └── [tip3p x 1], 3 particles, 2 bonds, 3 children
│       ├── [H2 x 1], 1 particles, 1 bonds, 0 children
│       ├── [H3 x 1], 1 particles, 1 bonds, 0 children
│       └── [O1 x 1], 1 particles, 2 bonds, 0 children
└── [Compound x 2], 12 particles, 8 bonds, 4 children
    └── [Compound x 4], 3 particles, 2 bonds, 1 children
        └── [tip3p x 1], 3 particles, 2 bonds, 3 children
            ├── [H2 x 1], 1 particles, 1 bonds, 0 children
            ├── [H3 x 1], 1 particles, 1 bonds, 0 children
            └── [O1 x 1], 1 particles, 2 bonds, 0 children

And make it this:

Compound, 28 particles, 18 bonds, 10 children
├── [C x 1], 1 particles, 0 bonds, 0 children
└── [tip3p x 9], 3 particles, 2 bonds, 3 children
    ├── [H2 x 1], 1 particles, 1 bonds, 0 children
    ├── [H3 x 1], 1 particles, 1 bonds, 0 children
    └── [O1 x 1], 1 particles, 2 bonds, 0 children

I still need to finish adding in tests for this; that will come in the next push.

Copy link
Contributor

@CalCraven CalCraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I think the only thing minorly missing is to ask if we need any additional tests for the conversions of different files into mBuild objects using this new method. Tbh, I think all of our current tests should be the same and cover anything that might have been breaking, so I think we're good to go.

I also locally ran the GMSO testing suite with this source of mBuild and it looks fine. The to_mbuild function can easily be changed to add the compounds as a list instead of one at a time.

mbuild/compound.py Outdated Show resolved Hide resolved
mbuild/compound.py Outdated Show resolved Hide resolved
mbuild/conversion.py Outdated Show resolved Hide resolved
mbuild/conversion.py Outdated Show resolved Hide resolved
@chrisiacovella
Copy link
Contributor Author

I addressed all the comments, including making one list_flatten helper function (doesn't add any real overhead).

Copy link
Member

@daico007 daico007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants