-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speeding up add function for large compounds #1107
Conversation
…composing with main compound
for more information, see https://pre-commit.ci
…conflict to begin with
for more information, see https://pre-commit.ci
if label is not None: | ||
self.add( | ||
child, | ||
label=label_list[i], |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable
if ( | ||
len(temp_bond_graphs) != 0 | ||
and not isinstance(self, Port) | ||
and children_bond_graph is not None |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #1107 +/- ##
==========================================
+ Coverage 89.39% 89.50% +0.10%
==========================================
Files 61 61
Lines 6235 6305 +70
==========================================
+ Hits 5574 5643 +69
- Misses 661 662 +1
☔ View full report in Codecov by Sentry. |
timings for loading mol2 files of different size:
|
Speed improvements for converting parmed are effectively the same as for mdtraj. Note, converting from gmso will need modification still, but that calls a function in gmso, so it will need to be a separate gmso PR. |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
I added in a condense function to Compound. This is similar to flatten, but it adds an intermediate level in the hierarchy based on connectivity. This refers to issue #1108 . To reiterate the issue, take a compound that is like this:
And make it this:
I still need to finish adding in tests for this; that will come in the next push. |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I think the only thing minorly missing is to ask if we need any additional tests for the conversions of different files into mBuild objects using this new method. Tbh, I think all of our current tests should be the same and cover anything that might have been breaking, so I think we're good to go.
I also locally ran the GMSO testing suite with this source of mBuild and it looks fine. The to_mbuild
function can easily be changed to add the compounds as a list instead of one at a time.
for more information, see https://pre-commit.ci
I addressed all the comments, including making one list_flatten helper function (doesn't add any real overhead). |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…into speedup_add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
PR Summary:
This refers to issue #1104 . This PR aims to improve the performance of the Compound.add function and loading routines that rely upon it.
The basic gist, as outlined in the issue above, is that when constructing a compound using the add function, the performance can degrade as the compound grows in size, due to the repeated merging (i.e., composing) of bond_graphs, specifically, merging a small with a large bond graph over and over again. This PR changes the underlying logic such that if a list of Compounds is passed to the add function, it will use the compose_all function to merge these bond_graphs together, before adding to the root Compound (and merging bond_graphs with the root compound). The compose_all function effectively scales with the number of compounds being merged.
This provides substantial speed improvements, as outlined in the issue.
Other additions, Compound.add now accepts a list for the label argument if compounds are provided in a list.
This is still a WIP, as tests need to be added for adding labels via a list, as well as adding in the updated load functions to stash compounds into lists (mdtraj conversion is basically complete and provides substantial speed up).
PR Checklist