-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lammps molecule number #1073
Lammps molecule number #1073
Conversation
Thank you for your PR! I like this new feature/addition but it seems like the logic need to be alter a bit (just looking at the failed unit test, this method failed to detect some the residues info of an existing test case). I will try to play with that method a bit and see what might be the cause of the issue ( |
Hey, I was thinking about it later, and I chose to pull residue names, but with further thought it would probably be better to use object IDs throughout this method ( |
That may worth a shot! |
Is there a more appropriate place to put |
I can see that method lives in |
…s, that uses nparticles and nbonds to discern.
The reason the test failed is because the parent Compound doesn't have a _box definition that the compounds were placed into. I changed the function to find compounds using the number of particles and bonds. |
Hi @ jaclark5, do you mind adding me as a collaborator on your repo and give me push access to this branch so I can push the fixes for the failing test? I am also exploring the |
mbuild/conversion.py
Outdated
@@ -1147,14 +1156,44 @@ def to_parmed( | |||
-------- | |||
parmed.structure.Structure : Details on the ParmEd Structure object | |||
""" | |||
|
|||
def _pull_hierarchical_attrs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the recursive approach that you are doing with this method! My only concern is that I am not sure _box == None
is a good indicator of to refer a residue, since many of us often convert a compound without a box to parmed and assign the box information later (during the writing out step). One method I think might be useful is Compound.is_independent()
, which basically indicate if a Compound has any external bond or not (i.e., if it's a stand-alone molecule), but it will require minor alteration of this method and might make not as versatile as you have it now (i.e., when you traveling down the hierarchy and bump into a Compound
that is not independent, the residue
should be its parent but not itself).
Also, I think I have tried to solve similar issue before (infer relevant residues/molecule information from the hierarchy) and came up with several different approaches:
- Bottom up approach, determine the lowest level
Compound
that form a full molecule
def _infer_molecules(compound):
"""Bottom up approach to infer residues"""
connected_subgraph = compound.bond_graph.connected_components()
residues = IndexedSet()
for molecule in connected_subgraph:
if len(molecule) == 1:
ancestors = [molecule[0]]
else:
ancestors = IndexedSet(molecule[0].ancestors())
for particle in molecule[1:]:
# This works because particle.ancestors traversed, and hence
# the lower level will be in the front.
# The intersection will be left at the end,
# ancestor of the first particle is used as reference.
# Hence, this called will return the lowest-level Compound]
# that is a molecule
ancestors = ancestors.intersection(
IndexedSet(particle.ancestors())
)
residues.add(ancestors[0].name)
return list(residues)
- Top down approach, always take the 2nd level Compound (if they are independent) as residue
def _infer_groups(compound):
"""Top down, second level if any is available"""
residues = IndexedSet()
if all(not child.children for child in compound.children):
residues.add(compound.name)
else:
residues.update([child.name for child in compound.children])
return residues
I believe the 1st approach should work in most cases, but the 2nd may be useful if you want better control of the grouping (e.g., a container make up of water, methanol and ethanol and you want to group methanol and ethanol under an alcohol group/Compound and water by itself).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the failing test, it's the test_resname_parmed
in test_compound.py
, specifically, when we try convert a compound (with not top level box) to parmed, so the infer_hierarchy
step stops short at determining the lower level residues. A quick fix for this will be to add a line to define a box for that compound/system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I actually made many changes to this branch already too but haven't gotten back to it either. I'll wrap up my changes and push them soon. I think it would be easier to brain storm at that point.
…nto lammps_molecule_number
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1073 +/- ##
==========================================
+ Coverage 90.54% 90.64% +0.09%
==========================================
Files 61 61
Lines 6190 6231 +41
==========================================
+ Hits 5605 5648 +43
+ Misses 585 583 -2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the new pull_residues()
method! Only one minor typo, but the rest LGTM! All tests seem to pass also.
Hey @CalCraven, if you have some time today, can you give this PR a review, we may be to merge it today (if not some time this week). |
Yes, I will get this reviewed today! |
I just updated the tests to complete the code coverage. However, something weird happened with the precommits where I need to update to python 3.10 so that isort can update to 5.12, so that poetry can function properly. |
Looks like they dropped python 3.7 recently. May have to update your environment to 3.8 so pre-commit can do it's thing. |
I was running 3.8.15. After a search for the error, it seems that I need to update to 3.10. There are many conflicts that Conda had been sorting through for awhile now. |
It sounds like we need to stat adding 3.10 test for mbuild (but that can be done in a different PR, I think we can ignore the pre-commit test here, assuming you already have that checks on your local). Regarding conda installation, we did realize it took a bit long to install the package with just |
Is there a way of getting around the precommit locally? This feature isn't letting me commit my most recent changes to the PR to complete the code coverage. |
Actually I was able to get it to run by changing the version of isort in the precommit config to 5.12.0 |
…nto lammps_molecule_number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most important note is the bug related to ring structures. Other minor wording changes in documentation.
I think I have a solution for the ring structures issues noted above. Here would be the changes to def catalog_bondgraph_type(compound, bond_graph=None):
"""Identify type of subgraph found at this stage of the compound.
Returns
-------
str:
"particle_graph" if compound is at the particle level,
"one_graph" if compound is a single molecule piece,
"multiple_graphs" if compound has multiple molecules
"""
if not compound.children:
return "particle_graph"
elif compound.bond_graph:
# check at the top level
multiple_connectionsBool = len(compound.bond_graph.connected_components()) == 1
else:
# at a subgraph level
multiple_connectionsBool = len(bond_graph.subgraph(compound).connected_components()) == 1
if multiple_connectionsBool:
return "one_graph"
else:
return "multiple_graphs"
def pull_residues(
compound,
segment_level=0,
include_base_level=False,
bond_graph=None
):
"""Pull residues from a Compound object.
Search class instance for completed compounds based on the number of
particles and bonds. If for example and peptide chain with three
individual peptides that were connected and hydrated with thirty water
molecules, the list will contain 31 residues. However, if the water and
the individual peptides should be labeled as individual residues, set
``segment_level==1`` to receive a list with 33 residues. Depending on
the method used to assemble the peptides, this procedure may continue to
set ``segment_level=2`` and breakdown the peptide into functional groups.
Setting ``include_base_level==True`` will allow this procedure to
function with coarse-grained systems, while the default behavior will
end a search at the level before atoms are obtained.
Parameters
----------
compound : obj
An instance of :class:`mbuild.compound.Compound`
segment_level : int, optional, default=0
Level of full residue architecture to be identified
include_base_level : bool, optional, default=False
Set whether a search should continue if the list of children are single particles.
bond_graph: obj
An instance of :class:`mbuild.BondGraph`.
The top level bondgraph that contains the compound to get residues from.
Returns
-------
residuesList : list
List of residue ids (str).
"""
residuesList = []
no_grandchildrenList = [not child.children for child in compound.children]
compound_graphtype = catalog_bondgraph_type(compound, bond_graph=bond_graph)
if (
compound_graphtype == "one_graph" and segment_level == 0
): # All atoms are bonded, at chosen tier
residuesList.append(id(compound))
elif compound_graphtype == "particle_graph": # Currently at the particle level
if include_base_level:
residuesList.append(id(compound))
else:
raise ValueError(
"Did not expect to reach single segments with argument "
"`include_base_level=False`. Consider setting this to True, "
"or check to see if the compound hierarchy is as expected."
)
elif (
all(no_grandchildrenList) and not include_base_level
): # One above base level
residuesList.append(id(compound))
else:
if compound_graphtype == "multiple_graphs": # Go to the next tier below
if segment_level > 0:
segment_level -= 1
elif segment_level < 0:
raise ValueError(
"`segment_level` must be greater than zero. "
"Something went wrong when parsing the hierachy of "
"the compound."
)
# Check the next tier of compounds.
for i, child in enumerate(compound.children):
if no_grandchildrenList[i]: # single atom particle
residuesList.append(id(child))
else:
residuesList.extend(
pull_residues(
child,
segment_level=segment_level,
include_base_level=include_base_level,
bond_graph=bond_graph
)
)
return residuesList
########TESTS#############
compound = mb.load("C1=CC=CC=C1", smiles=True)
assert catalog_bondgraph_type(compound) == "one_graph"
compound = mb.Compound([mb.load("C1=CC=CC=C1", smiles=True), mb.load("C1=CC=CC=C1", smiles=True)])
assert catalog_bondgraph_type(compound) == "multiple_graphs"
compound = mb.Compound([mb.load("C1=CC=CC=C1", smiles=True), mb.load("C1=CC=CC=C1", smiles=True)])
assert catalog_bondgraph_type(compound.children[1], compound.bond_graph) == "one_graph"
compound = mb.Compound([mb.load("C1=CC=CC=C1", smiles=True), mb.load("C1=CC=CC=C1", smiles=True)])
assert catalog_bondgraph_type(compound.children[1][0], compound.bond_graph) == "particle_graph"
compound = mb.load("C1=CC=CC=C1", smiles=True)
assert len(pull_residues(compound, compound.bond_graph)) == 1
compound = mb.Compound([mb.load("C1=CC=CC=C1", smiles=True), mb.load("C1=CC=CC=C1", smiles=True)])
assert len(pull_residues(compound, bond_graph=compound.bond_graph)) == 2
compound = mb.Compound([mb.load("C1=CC=CC=C1", smiles=True), mb.load("C1=CC=CC=C1", smiles=True)])
compound.add_bond([compound.children[0][0], compound.children[1][0]])
assert len(pull_residues(compound, bond_graph=compound.bond_graph)) == 1
compound = mb.load("CCC", smiles=True)
assert len(pull_residues(compound, bond_graph=compound.bond_graph)) == 1
compound = mb.Compound([mb.load("CCC", smiles=True), mb.load("CCC", smiles=True)])
assert len(pull_residues(compound, bond_graph=compound.bond_graph)) == 2
compound = mb.Compound([mb.load("CCC", smiles=True), mb.load("CCC", smiles=True)])
compound.add_bond([compound.children[0][0], compound.children[1][0]])
assert len(pull_residues(compound, bond_graph=compound.bond_graph)) == 1
n_waters = 100
compound = mb.fill_box(mb.load("O", smiles=True), n_waters, mb.Box([10, 10, 10]))
assert len(pull_residues(compound, bond_graph=compound.bond_graph)) == n_waters If this looks reasonable @jaclark5, I can take these changes and tests and push them directly to your branch |
Hey @CalCraven, that looks good. I did a quick check of your new function with the existing tests and it results in an error since H2O doesn't have a bond graph. |
I added the ability to edit for maintainers, but after you merge your addition I'd like to look at it again. |
…ng '' name. New logic for parsing graphtype using the number of connected components
…ps_molecule_number
for more information, see https://pre-commit.ci
…mbuild into lammps_molecule_number
It looks good to me! Are there any test for LJ systems? I think it might be good to have a CG system tested. |
I added a set of tests for loading in coarse grained structures. I think it works okay, it can start to get a little strange when you look to use coarse grained and atomistic structures, or you create an uneven hierarchy with molecules located at multiple different levels. Probably rare use cases, but good to have some defined behavior for them. |
Oh great, I thought of one more test I'd to run this afternoon. |
There's an issue with the catalog_bondgraph_type function. Single particle "atoms" i.e., electrolytes or single bead CG solvent won't show up in that list. For an atomistic system with water, the function will return "multigraph" and then identify those electrolytes. However for a single CG polymer in a solvent of single beads, the Compound reads as "one_graph" because none of the solvent molecules are on the bond_graph list. |
Ok, I just pushed a fix, I think this PR is complete :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That change looks good to me. I also think this PR is ready to go @daico007
Great work! Thank you for your PR @jaclark5! |
PR Summary:
Resolves #1072
This PR achieved the following:
Compound.to_parmed()
fromCompound.save()
Compound.to_parmed(infer_residues=True)
the defaultPoint 3 is achieved with a recursive hidden function so that a system could be built with a combination of assembled boxes and the compounds of each would still be found. An example of hierarchy and output could be:
System.children => [Box]
# A final box to be writtenBox.children => [Phase1, Phase2]
# Two boxes representing two phases to be combinedPhase1.children => [Water, Water, Water,.....]
# Water molecules don't have box definitionPhase2.children => [CO2, CO2,...]
residues = [Water, Water, Water,.....] + [CO2, CO2,...]
Additionally, if Phase2 was made from multiple boxes, then the search would continue to each of those tiers to find the compounds.
I've tested this locally with:
I'll add tests after I get comments.
PR Checklist