Lammps molecule number #1073

jaclark5 · 2022-12-11T17:48:54Z

PR Summary:

Resolves #1072

This PR achieved the following:

Allow keyword arguments to be passed to Compound.to_parmed() from Compound.save()
Make Compound.to_parmed(infer_residues=True) the default
Allow infer_residues to trigger a search for Compounds without box definitions, which is likely a complete compound.
Updated .gitignore to ignore .swp files
Update pep8 compliance in unrelated docstrings in compound.py

Point 3 is achieved with a recursive hidden function so that a system could be built with a combination of assembled boxes and the compounds of each would still be found. An example of hierarchy and output could be:

System.children => [Box] # A final box to be written
Box.children => [Phase1, Phase2] # Two boxes representing two phases to be combined
Phase1.children => [Water, Water, Water,.....] # Water molecules don't have box definition
Phase2.children => [CO2, CO2,...]

residues = [Water, Water, Water,.....] + [CO2, CO2,...]
Additionally, if Phase2 was made from multiple boxes, then the search would continue to each of those tiers to find the compounds.

I've tested this locally with:

import mbuild as mb
ch4 = mb.recipes.Alkane(n=1)
c4H10 = mb.recipes.Alkane(n=4)
ch4.name = "Methane"
c4H10.name = "nButane"
box = mb.Box(lengths=[5, 5, 5])
box_of_methane = mb.fill_box(compound=[ch4,c4H10], n_compounds=[1,2], box=box)
box_of_methane.save(
   "methane_ethane_box_1.lmp", 
   overwrite=True, 
   forcefield_files='utils/OPLSaa_alkanes.xml',
   parmed_kwargs={"infer_residues": True}
)

I'll add tests after I get comments.

PR Checklist

Includes appropriate unit test(s)
Appropriate docstring(s) are added/updated
Code is (approximately) PEP8 compliant
Issue(s) raised/addressed?

…finitions

daico007 · 2022-12-12T19:43:12Z

Thank you for your PR! I like this new feature/addition but it seems like the logic need to be alter a bit (just looking at the failed unit test, this method failed to detect some the residues info of an existing test case). I will try to play with that method a bit and see what might be the cause of the issue (_pull_hierarchical_attrs())

jaclark5 · 2022-12-13T16:04:56Z

Hey, I was thinking about it later, and I chose to pull residue names, but with further thought it would probably be better to use object IDs throughout this method (to_parmed), since multiply layers in the ancestry could have the same name and so name is easily ambiguous.

daico007 · 2022-12-13T16:53:21Z

That may worth a shot!

jaclark5 · 2022-12-13T20:22:19Z

Is there a more appropriate place to put _pull_hierarchical_attrs() since I tried to write it generally?

daico007 · 2022-12-13T20:55:36Z

I can see that method lives in compound.py. I believe it would be a very versatile method that can be utilized in other conversions!

…s, that uses nparticles and nbonds to discern.

jaclark5 · 2022-12-15T20:36:00Z

The reason the test failed is because the parent Compound doesn't have a _box definition that the compounds were placed into. I changed the function to find compounds using the number of particles and bonds.

daico007 · 2023-01-18T17:03:10Z

Hi @ jaclark5, do you mind adding me as a collaborator on your repo and give me push access to this branch so I can push the fixes for the failing test? I am also exploring the _pull_hierarchical_attrs method, in term of what attribute should be used as the default, since using box may not always work. (Sorry, I let this sit stale for a bit)

daico007 · 2023-01-18T20:50:23Z

mbuild/conversion.py

@@ -1147,14 +1156,44 @@ def to_parmed(
    --------
    parmed.structure.Structure : Details on the ParmEd Structure object
    """
+
+    def _pull_hierarchical_attrs(


I really like the recursive approach that you are doing with this method! My only concern is that I am not sure _box == None is a good indicator of to refer a residue, since many of us often convert a compound without a box to parmed and assign the box information later (during the writing out step). One method I think might be useful is Compound.is_independent(), which basically indicate if a Compound has any external bond or not (i.e., if it's a stand-alone molecule), but it will require minor alteration of this method and might make not as versatile as you have it now (i.e., when you traveling down the hierarchy and bump into a Compound that is not independent, the residue should be its parent but not itself).

Also, I think I have tried to solve similar issue before (infer relevant residues/molecule information from the hierarchy) and came up with several different approaches:

Bottom up approach, determine the lowest level Compound that form a full molecule

def _infer_molecules(compound): """Bottom up approach to infer residues""" connected_subgraph = compound.bond_graph.connected_components() residues = IndexedSet() for molecule in connected_subgraph: if len(molecule) == 1: ancestors = [molecule[0]] else: ancestors = IndexedSet(molecule[0].ancestors()) for particle in molecule[1:]: # This works because particle.ancestors traversed, and hence # the lower level will be in the front. # The intersection will be left at the end, # ancestor of the first particle is used as reference. # Hence, this called will return the lowest-level Compound] # that is a molecule ancestors = ancestors.intersection( IndexedSet(particle.ancestors()) ) residues.add(ancestors[0].name) return list(residues)

Top down approach, always take the 2nd level Compound (if they are independent) as residue

def _infer_groups(compound): """Top down, second level if any is available""" residues = IndexedSet() if all(not child.children for child in compound.children): residues.add(compound.name) else: residues.update([child.name for child in compound.children]) return residues

I believe the 1st approach should work in most cases, but the 2nd may be useful if you want better control of the grouping (e.g., a container make up of water, methanol and ethanol and you want to group methanol and ethanol under an alcohol group/Compound and water by itself).

Regarding the failing test, it's the test_resname_parmed in test_compound.py, specifically, when we try convert a compound (with not top level box) to parmed, so the infer_hierarchy step stops short at determining the lower level residues. A quick fix for this will be to add a line to define a box for that compound/system.

This looks great! I actually made many changes to this branch already too but haven't gotten back to it either. I'll wrap up my changes and push them soon. I think it would be easier to brain storm at that point.

…nto lammps_molecule_number

codecov · 2023-01-27T18:02:27Z

Codecov Report

Attention: Patch coverage is 95.55556% with 2 lines in your changes missing coverage. Please review.

Project coverage is 90.64%. Comparing base (3d7c93d) to head (5b3b49a).
Report is 101 commits behind head on main.

Files	Patch %	Lines
mbuild/conversion.py	95.55%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1073      +/-   ##
==========================================
+ Coverage   90.54%   90.64%   +0.09%     
==========================================
  Files          61       61              
  Lines        6190     6231      +41     
==========================================
+ Hits         5605     5648      +43     
+ Misses        585      583       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

daico007

I like the new pull_residues() method! Only one minor typo, but the rest LGTM! All tests seem to pass also.

mbuild/conversion.py

daico007 · 2023-01-30T14:58:16Z

Hey @CalCraven, if you have some time today, can you give this PR a review, we may be to merge it today (if not some time this week).

CalCraven · 2023-01-30T15:16:08Z

Yes, I will get this reviewed today!

jaclark5 · 2023-01-30T15:24:21Z

I just updated the tests to complete the code coverage. However, something weird happened with the precommits where I need to update to python 3.10 so that isort can update to 5.12, so that poetry can function properly.

CalCraven · 2023-01-30T16:52:55Z

Looks like they dropped python 3.7 recently. May have to update your environment to 3.8 so pre-commit can do it's thing.

jaclark5 · 2023-01-30T16:54:53Z

I was running 3.8.15. After a search for the error, it seems that I need to update to 3.10. There are many conflicts that Conda had been sorting through for awhile now.

daico007 · 2023-01-30T17:38:52Z

It sounds like we need to stat adding 3.10 test for mbuild (but that can be done in a different PR, I think we can ignore the pre-commit test here, assuming you already have that checks on your local). Regarding conda installation, we did realize it took a bit long to install the package with just conda and had actually started using mamba to do most of the installation in our lab, so it's probably make sense for use to update the instruction on mbuild readme.

jaclark5 · 2023-01-30T18:07:09Z

Is there a way of getting around the precommit locally? This feature isn't letting me commit my most recent changes to the PR to complete the code coverage.

jaclark5 · 2023-01-30T18:27:32Z

Actually I was able to get it to run by changing the version of isort in the precommit config to 5.12.0

…nto lammps_molecule_number

mbuild/tests/test_compound.py

CalCraven

The most important note is the bug related to ring structures. Other minor wording changes in documentation.

mbuild/conversion.py

mbuild/compound.py

mbuild/conversion.py

mbuild/tests/test_compound.py

CalCraven · 2023-01-31T18:32:28Z

I think I have a solution for the ring structures issues noted above. Here would be the changes to pull_residues()

def catalog_bondgraph_type(compound, bond_graph=None):
    """Identify type of subgraph found at this stage of the compound.
    
    Returns
    -------
    str:
        "particle_graph" if compound is at the particle level, 
        "one_graph" if compound is a single molecule piece,
        "multiple_graphs" if compound has multiple molecules
    """
    if not compound.children:
        return "particle_graph"
    elif compound.bond_graph:
        # check at the top level
        multiple_connectionsBool = len(compound.bond_graph.connected_components()) == 1
    else:
        # at a subgraph level
        multiple_connectionsBool = len(bond_graph.subgraph(compound).connected_components()) == 1
    if multiple_connectionsBool:
        return "one_graph" 
    else: 
        return "multiple_graphs"

def pull_residues(
    compound,
    segment_level=0,
    include_base_level=False,
    bond_graph=None
):
    """Pull residues from a Compound object.
    Search class instance for completed compounds based on the number of
    particles and bonds. If for example and peptide chain with three
    individual peptides that were connected and hydrated with thirty water
    molecules, the list will contain 31 residues. However, if the water and
    the individual peptides should be labeled as individual residues, set
    ``segment_level==1`` to receive a list with 33 residues. Depending on
    the method used to assemble the peptides, this procedure may continue to
    set ``segment_level=2`` and breakdown the peptide into functional groups.
    Setting ``include_base_level==True`` will allow this procedure to
    function with coarse-grained systems, while the default behavior will
    end a search at the level before atoms are obtained.

    Parameters
    ----------
    compound : obj
        An instance of :class:`mbuild.compound.Compound`
    segment_level : int, optional, default=0
        Level of full residue architecture to be identified
    include_base_level : bool, optional, default=False
        Set whether a search should continue if the list of children are single particles.
    bond_graph: obj
        An instance of :class:`mbuild.BondGraph`. 
        The top level bondgraph that contains the compound to get residues from.

    Returns
    -------
    residuesList : list
        List of residue ids (str).
    """
    residuesList = []
    no_grandchildrenList = [not child.children for child in compound.children]
    
    compound_graphtype = catalog_bondgraph_type(compound, bond_graph=bond_graph)
    
    if (
        compound_graphtype == "one_graph" and segment_level == 0
    ):  # All atoms are bonded, at chosen tier
        residuesList.append(id(compound))
    elif compound_graphtype == "particle_graph":  # Currently at the particle level
        if include_base_level:
            residuesList.append(id(compound))
        else:
            raise ValueError(
                "Did not expect to reach single segments with argument "
                "`include_base_level=False`. Consider setting this to True, "
                "or check to see if the compound hierarchy is as expected."
            )
    elif (
        all(no_grandchildrenList) and not include_base_level
    ):  # One above base level
        residuesList.append(id(compound))
    else:
        if compound_graphtype == "multiple_graphs":  # Go to the next tier below
            if segment_level > 0:
                segment_level -= 1
            elif segment_level < 0:
                raise ValueError(
                    "`segment_level` must be greater than zero. "
                    "Something went wrong when parsing the hierachy of "
                    "the compound."
                )

        # Check the next tier of compounds.
        for i, child in enumerate(compound.children):
            if no_grandchildrenList[i]:  # single atom particle
                residuesList.append(id(child))
            else:
                residuesList.extend(
                    pull_residues(
                        child,
                        segment_level=segment_level,
                        include_base_level=include_base_level,
                        bond_graph=bond_graph
                    )
                )

    return residuesList


########TESTS#############

compound = mb.load("C1=CC=CC=C1", smiles=True)
assert catalog_bondgraph_type(compound) == "one_graph"

compound = mb.Compound([mb.load("C1=CC=CC=C1", smiles=True), mb.load("C1=CC=CC=C1", smiles=True)])
assert catalog_bondgraph_type(compound) == "multiple_graphs"

compound = mb.Compound([mb.load("C1=CC=CC=C1", smiles=True), mb.load("C1=CC=CC=C1", smiles=True)])
assert catalog_bondgraph_type(compound.children[1], compound.bond_graph) == "one_graph"

compound = mb.Compound([mb.load("C1=CC=CC=C1", smiles=True), mb.load("C1=CC=CC=C1", smiles=True)])
assert catalog_bondgraph_type(compound.children[1][0], compound.bond_graph) == "particle_graph"

compound = mb.load("C1=CC=CC=C1", smiles=True)
assert len(pull_residues(compound, compound.bond_graph)) == 1

compound = mb.Compound([mb.load("C1=CC=CC=C1", smiles=True), mb.load("C1=CC=CC=C1", smiles=True)])
assert len(pull_residues(compound, bond_graph=compound.bond_graph)) == 2

compound = mb.Compound([mb.load("C1=CC=CC=C1", smiles=True), mb.load("C1=CC=CC=C1", smiles=True)])
compound.add_bond([compound.children[0][0], compound.children[1][0]])
assert len(pull_residues(compound, bond_graph=compound.bond_graph)) == 1

compound = mb.load("CCC", smiles=True)
assert len(pull_residues(compound, bond_graph=compound.bond_graph)) == 1

compound = mb.Compound([mb.load("CCC", smiles=True), mb.load("CCC", smiles=True)])
assert len(pull_residues(compound, bond_graph=compound.bond_graph)) == 2

compound = mb.Compound([mb.load("CCC", smiles=True), mb.load("CCC", smiles=True)])
compound.add_bond([compound.children[0][0], compound.children[1][0]])
assert len(pull_residues(compound, bond_graph=compound.bond_graph)) == 1

n_waters = 100
compound = mb.fill_box(mb.load("O", smiles=True), n_waters, mb.Box([10, 10, 10]))
assert len(pull_residues(compound, bond_graph=compound.bond_graph)) == n_waters

If this looks reasonable @jaclark5, I can take these changes and tests and push them directly to your branch

jaclark5 · 2023-01-31T19:16:57Z

Hey @CalCraven, that looks good. I did a quick check of your new function with the existing tests and it results in an error since H2O doesn't have a bond graph.

jaclark5 · 2023-01-31T19:20:03Z

I added the ability to edit for maintainers, but after you merge your addition I'd like to look at it again.

…ng '' name. New logic for parsing graphtype using the number of connected components

…ps_molecule_number

for more information, see https://pre-commit.ci

…ion.py

…mbuild into lammps_molecule_number

jaclark5 · 2023-02-01T23:12:12Z

It looks good to me! Are there any test for LJ systems? I think it might be good to have a CG system tested.

…ction

CalCraven · 2023-02-02T01:16:55Z

I added a set of tests for loading in coarse grained structures. I think it works okay, it can start to get a little strange when you look to use coarse grained and atomistic structures, or you create an uneven hierarchy with molecules located at multiple different levels. Probably rare use cases, but good to have some defined behavior for them.

jaclark5 · 2023-02-02T15:42:19Z

Oh great, I thought of one more test I'd to run this afternoon.

jaclark5 · 2023-02-03T15:22:21Z

There's an issue with the catalog_bondgraph_type function. Single particle "atoms" i.e., electrolytes or single bead CG solvent won't show up in that list. For an atomistic system with water, the function will return "multigraph" and then identify those electrolytes. However for a single CG polymer in a solvent of single beads, the Compound reads as "one_graph" because none of the solvent molecules are on the bond_graph list.

jaclark5 · 2023-02-03T15:49:08Z

Ok, I just pushed a fix, I think this PR is complete :)

CalCraven

That change looks good to me. I also think this PR is ready to go @daico007

daico007 · 2023-02-03T16:27:33Z

Great work! Thank you for your PR @jaclark5!

jaclark5 and others added 3 commits December 11, 2022 12:07

Allow kwargs for to_parmed to be passed through Compound.save()

8ad8bbd

infer_hierarchy takes the names of the first compounds without box de…

3c114ad

…finitions

Merge branch 'main' into lammps_molecule_number

4f40f1b

Merge branch 'main' into lammps_molecule_number

b3efabd

Merge branch 'main' into lammps_molecule_number

0fa2abc

jaclark5 added 2 commits December 14, 2022 19:19

Merge branch 'main' into lammps_molecule_number

1427a76

Update pull_hierarchical_attr to be an exposed function, pull_residue…

2a1af5b

…s, that uses nparticles and nbonds to discern.

Merge branch 'main' into lammps_molecule_number

5c3ea3e

Merge branch 'main' into lammps_molecule_number

03a3ad9

daico007 reviewed Jan 18, 2023

View reviewed changes

jaclark5 added 2 commits January 27, 2023 12:24

Use object ids in pull_residues

5d150c7

Merge branch 'lammps_molecule_number' of github.com:jaclark5/mbuild i…

07d0899

…nto lammps_molecule_number

daico007 approved these changes Jan 27, 2023

View reviewed changes

mbuild/conversion.py Outdated Show resolved Hide resolved

Merge branch 'main' into lammps_molecule_number

e3cb97f

code coverage

3b8fcbb

jaclark5 added 2 commits January 30, 2023 13:45

default to infer compounds

5391eff

Merge branch 'lammps_molecule_number' of github.com:jaclark5/mbuild i…

629dfcd

…nto lammps_molecule_number

github-advanced-security bot found potential problems Jan 30, 2023

View reviewed changes

mbuild/tests/test_compound.py Fixed Show resolved Hide resolved

CalCraven requested changes Jan 31, 2023

View reviewed changes

Revise docs and tests

adadb5d

CalCraven and others added 5 commits February 1, 2023 11:34

Fixed a bug with residues being named from parents with an empty stri…

a75aa9c

…ng '' name. New logic for parsing graphtype using the number of connected components

Merge branch 'main' of https://github.com/mosdef-hub/mbuild into lamm…

70fb932

…ps_molecule_number

[pre-commit.ci] auto fixes from pre-commit.com hooks

7b9a41c

for more information, see https://pre-commit.ci

Fixes for pre-commit config in doc string of pull_residues in convers…

1c91a67

…ion.py

Merge branch 'lammps_molecule_number' of https://github.com/jaclark5/…

cf2979e

…mbuild into lammps_molecule_number

Tests for coarse grained residue matching using new pull_residues fun…

41cf89e

…ction

Merge branch 'main' into lammps_molecule_number

cafc60f

Bug fix catalog_bondgraph_type

5b3b49a

jaclark5 requested a review from CalCraven February 3, 2023 15:51

CalCraven approved these changes Feb 3, 2023

View reviewed changes

daico007 merged commit c15fb1a into mosdef-hub:main Feb 3, 2023

jaclark5 deleted the lammps_molecule_number branch February 3, 2023 19:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lammps molecule number #1073

Lammps molecule number #1073

jaclark5 commented Dec 11, 2022 •

edited

Loading

daico007 commented Dec 12, 2022

jaclark5 commented Dec 13, 2022

daico007 commented Dec 13, 2022

jaclark5 commented Dec 13, 2022

daico007 commented Dec 13, 2022

jaclark5 commented Dec 15, 2022

daico007 commented Jan 18, 2023

daico007 Jan 18, 2023

daico007 Jan 18, 2023

jaclark5 Jan 18, 2023

codecov bot commented Jan 27, 2023 •

edited

Loading

daico007 left a comment

daico007 commented Jan 30, 2023

CalCraven commented Jan 30, 2023

jaclark5 commented Jan 30, 2023

CalCraven commented Jan 30, 2023

jaclark5 commented Jan 30, 2023

daico007 commented Jan 30, 2023

jaclark5 commented Jan 30, 2023 •

edited

Loading

jaclark5 commented Jan 30, 2023

CalCraven left a comment

CalCraven commented Jan 31, 2023

jaclark5 commented Jan 31, 2023

jaclark5 commented Jan 31, 2023

jaclark5 commented Feb 1, 2023

CalCraven commented Feb 2, 2023

jaclark5 commented Feb 2, 2023

jaclark5 commented Feb 3, 2023 •

edited

Loading

jaclark5 commented Feb 3, 2023

CalCraven left a comment

daico007 commented Feb 3, 2023

Lammps molecule number #1073

Lammps molecule number #1073

Conversation

jaclark5 commented Dec 11, 2022 • edited Loading

PR Summary:

PR Checklist

daico007 commented Dec 12, 2022

jaclark5 commented Dec 13, 2022

daico007 commented Dec 13, 2022

jaclark5 commented Dec 13, 2022

daico007 commented Dec 13, 2022

jaclark5 commented Dec 15, 2022

daico007 commented Jan 18, 2023

daico007 Jan 18, 2023

Choose a reason for hiding this comment

daico007 Jan 18, 2023

Choose a reason for hiding this comment

jaclark5 Jan 18, 2023

Choose a reason for hiding this comment

codecov bot commented Jan 27, 2023 • edited Loading

Codecov Report

daico007 left a comment

Choose a reason for hiding this comment

daico007 commented Jan 30, 2023

CalCraven commented Jan 30, 2023

jaclark5 commented Jan 30, 2023

CalCraven commented Jan 30, 2023

jaclark5 commented Jan 30, 2023

daico007 commented Jan 30, 2023

jaclark5 commented Jan 30, 2023 • edited Loading

jaclark5 commented Jan 30, 2023

CalCraven left a comment

Choose a reason for hiding this comment

CalCraven commented Jan 31, 2023

jaclark5 commented Jan 31, 2023

jaclark5 commented Jan 31, 2023

jaclark5 commented Feb 1, 2023

CalCraven commented Feb 2, 2023

jaclark5 commented Feb 2, 2023

jaclark5 commented Feb 3, 2023 • edited Loading

jaclark5 commented Feb 3, 2023

CalCraven left a comment

Choose a reason for hiding this comment

daico007 commented Feb 3, 2023

jaclark5 commented Dec 11, 2022 •

edited

Loading

codecov bot commented Jan 27, 2023 •

edited

Loading

jaclark5 commented Jan 30, 2023 •

edited

Loading

jaclark5 commented Feb 3, 2023 •

edited

Loading