Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Aflow structure prototype labels with moyopy #96

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

janosh
Copy link
Collaborator

@janosh janosh commented Jan 19, 2025

adds new function get_protostructure_label_from_moyopy in aviary/wren/utils.py (following get_protostructure_label_from_spglib signature)

@janosh janosh added the enhancement New feature or request label Jan 19, 2025
CompRhys and others added 5 commits January 19, 2025 22:24
…atch still seems wrong for moyopy. Remove fallback from moyopy as without spglib structure refinement it won't do anything.
- refactor moyopy import handling
- print offending structure in error messages test_wyckoff_ops assert messages
@janosh
Copy link
Collaborator Author

janosh commented Jan 21, 2025

9 out of 11 tests passing after 57372af. of the 2 still failing, the first still seems to have differently ordered Wyckoff letters despite my "fix". the 2nd seems more concerning as it gives an unexpected letter. pinging @lan496 in case you notice some misuse of moyopy in get_protostructure_label_from_moyopy. for context, i'm hoping this function can replace the unique structure prototype analysis of WBM for matbench-discovery here .

    @pytest.mark.parametrize("structure, expected", zip(TEST_STRUCTS, TEST_PROTOSTRUCTURES))
    def test_get_protostructure_label_from_moyopy(structure, expected):
        """Check that moyopy gives correct protostructure label simple cases."""
>       assert get_protostructure_label_from_moyopy(structure) == expected, (
            f"unexpected moyopy protostructure for {structure=}"
        )
E       AssertionError: unexpected moyopy protostructure for structure=Structure Summary
E         Lattice
E             abc : 3.9 3.9 3.9
E          angles : 90.0 90.0 90.0
E          volume : 59.318999999999996
E               A : 3.9 0.0 0.0
E               B : 0.0 3.9 0.0
E               C : 0.0 0.0 3.9
E             pbc : True True True
E         PeriodicSite: Sr (0.0, 0.0, 0.0) [0.0, 0.0, 0.0]
E         PeriodicSite: Ti (1.95, 1.95, 1.95) [0.5, 0.5, 0.5]
E         PeriodicSite: O (1.95, 1.95, 0.0) [0.5, 0.5, 0.0]
E         PeriodicSite: O (1.95, 0.0, 1.95) [0.5, 0.0, 0.5]
E         PeriodicSite: O (0.0, 1.95, 1.95) [0.0, 0.5, 0.5]
E       assert 'A3BC_cP5_221_c_a_b:O-Sr-Ti' == 'AB3C_cP5_221_a_c_b:O-Sr-Ti'
E         
E         - AB3C_cP5_221_a_c_b:O-Sr-Ti
E         ?   -           --
E         + A3BC_cP5_221_c_a_b:O-Sr-Ti
E         ?  +           ++

tests/test_wyckoff_ops.py:370: AssertionError
_ test_moyopy_spglib_consistency[ABC6D2_mC40_15_e_a_3f_f:Ca-Fe-O-Si] _

protostructure = 'ABC6D2_mC40_15_e_a_3f_f:Ca-Fe-O-Si'

    @pytest.mark.parametrize(
        "protostructure",
        PROTOSTRUCTURE_SET,
    )
    def test_moyopy_spglib_consistency(protostructure):
        """Check that moyopy and spglib give consistent results."""
        struct = get_random_structure_for_protostructure(protostructure)
    
        moyopy_label = get_protostructure_label_from_moyopy(struct)
        spglib_label = get_protostructure_label_from_spglib(struct)
    
>       assert moyopy_label == spglib_label, (
            f"spglib moyopy protostructure mismatch for {protostructure}"
        )
E       AssertionError: spglib moyopy protostructure mismatch for ABC6D2_mC40_15_e_a_3f_f:Ca-Fe-O-Si
E       assert 'ABC6D2_mC40_..._f:Ca-Fe-O-Si' == 'ABC6D2_mC40_..._f:Ca-Fe-O-Si'
E         
E         - ABC6D2_mC40_15_e_a_3f_f:Ca-Fe-O-Si
E         ?                  ^
E         + ABC6D2_mC40_15_e_c_3f_f:Ca-Fe-O-Si
E         ?                  ^

tests/test_wyckoff_ops.py:386: AssertionError

@lan496
Copy link

lan496 commented Jan 21, 2025

@janosh Thank you for letting me know! LGTM for moyopy usage.
I'm not familiar with the exact procedure for naming prototypes in Aflow, so I will look into it. But, it seems likely that Wyckoff position assignment in moyopy still has some bugs...

For STO (Pm-3m) case, Wyckoff positions 4a and 4b are equivalent, which may related to the failed test.
https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-normsets?from=wycksets&gnum=221

For monoclinic CaMgO6Si2 (C2/c) case, Wyckoff positions 4a, 4b, 4c, and 4d belong to the same Wyckoff set
https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-normsets?from=wycksets&gnum=15

@CompRhys
Copy link
Owner

CompRhys commented Jan 21, 2025

@janosh Thank you for letting me know! LGTM for moyopy usage. I'm not familiar with the exact procedure for naming prototypes in Aflow, so I will look into it. But, it seems likely that Wyckoff position assignment in moyopy still has some bugs...

The protostructure label here is a superset of the aflow label, it's constructed as f"{aflow_label}:{chemical_system}". The chemical_system is alphabetically sorted element symbols separated by commas. The aflow label has a few parts f"{anonymous_formula}_{pearson_symbol}_{spg_num}_{all_wyckoffs}". The anonymous_formula is the alphabetically sorted formula where the element symbols are replaced with capital letters "A", "B", "C", etc. Pearson symbol has standard definition (would be cool to see this upstreamed as just a property that can be pulled from moyo). Space group number likewise self explanatory. The all_wyckoffs string is a string of the Wyckoff letters for each element, again alphabetical order, if there are multiple non-equivalent sites with the same Wyckoff then we insert the number of that type of site. ABC6D2_mC40_15_e_c_3f_f:Ca-Fe-O-Si so here we have 3 f sites occupied by Oxygen.

For STO (Pm-3m) case, Wyckoff positions 4a and 4b are equivalent, which may related to the failed test.
https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-normsets?from=wycksets&gnum=221

For monoclinic CaMgO6Si2 (C2/c) case, Wyckoff positions 4a, 4b, 4c, and 4d belong to the same Wyckoff set
https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-normsets?from=wycksets&gnum=15

We have this canonicalize_element_wyckoffs function that is supposed to handle the Wyckoff set choices in order to be consistent.

@janosh
Copy link
Collaborator Author

janosh commented Jan 21, 2025

would be cool to see this upstreamed as just a property that can be pulled from moyo

on the topic of upstreaming things into moyopy, another nice attribute to add would be crystal_system on MoyoDataset to avoid the need for helper functions like get_crystal_system in this PR

def get_crystal_system(n: int) -> str:
    """Get the crystal system for the structure, e.g. (triclinic, orthorhombic,
    cubic, etc.).
    Mirrors method of SpacegroupAnalyzer.get_crystal_system().
    Args:
        n (int): Space group number
    Raises:
        ValueError: on invalid space group numbers < 1 or > 230.
    Returns:
        str: Crystal system for structure
    """
    # Not using isinstance(n, int) to allow 0-decimal floats
    if n != int(n) or not 0 < n < 231:
        raise ValueError(f"Received invalid space group {n}")

    if 0 < n < 3:
        return "triclinic"
    if n < 16:
        return "monoclinic"
    if n < 75:
        return "orthorhombic"
    if n < 143:
        return "tetragonal"
    if n < 168:
        return "trigonal"
    if n < 195:
        return "hexagonal"
    return "cubic"

@lan496
Copy link

lan496 commented Jan 22, 2025

I got the same Wyckoff positions from spglib and moyopy for AB3C_cP5_221_a_c_b and ABC6D2_mC40_15_e_e_3f_f 🤔
Here I manually downloaded the cif files from Aflow's encyclopedia.

Your tests seem to use pyxtal to randomly generate structures from Aflow's prototype label. Do you have a generated structure whose results of Wyckoff position assignment change in spglib and moyopy?

from pymatgen.core import Structure
from pymatgen.symmetry.analyzer import SpacegroupAnalyzer
from spglib import get_symmetry_dataset
from moyopy import MoyoDataset
from moyopy.interface import MoyoAdapter

def main(structure: Structure) -> None:
    for symprec in [1e-2, 1e-3, 1e-4, 1e-5]:
        sga = SpacegroupAnalyzer(structure)
        spglib_cell = sga._cell
        spglib_dataset = get_symmetry_dataset(spglib_cell, symprec=symprec)
        print(spglib_dataset.wyckoffs)

        moyo_cell = MoyoAdapter.from_structure(structure)
        moyo_dataset = MoyoDataset(moyo_cell, symprec=symprec)
        print(moyo_dataset.wyckoffs)

        assert spglib_dataset.wyckoffs == moyo_dataset.wyckoffs

if __name__ == '__main__':
    # https://aflowlib.org/prototype-encyclopedia/AB3C_cP5_221_a_c_b-001/
    # Pm-3m (No. 221)
    # Ca (1a)
    # O (3c)
    # Ti (1b)
    # structure = Structure.from_file('AB3C_cP5_221_a_c_b-001.cif')

    # https://aflowlib.org/prototype-encyclopedia/ABC6D2_mC40_15_e_e_3f_f-001/
    # C2/c (No. 15)
    # Ca (4e)
    # Fe (4e)
    # O1 (8f)
    # O2 (8f)
    # O3 (8f)
    # Si (8f)
    structure = Structure.from_file('ABC6D2_mC40_15_e_e_3f_f-001.cif')

    print(structure)
    main(structure)

@janosh
Copy link
Collaborator Author

janosh commented Jan 22, 2025

@lan496 thanks for taking a look. really appreciate your help. 👍

here are 2 CIF files that currently yield different results for spglib and moyopy with symprec = 1e-4

debug_structures.zip

import os
import warnings
from collections import Counter
from glob import glob

import moyopy
import spglib
from moyopy.interface import MoyoAdapter
from pymatgen.core import Structure

from aviary.wren.utils import (
    get_protostructure_label_from_moyopy,
    get_protostructure_label_from_spglib,
)

warnings.filterwarnings("ignore", category=DeprecationWarning, module="spglib")

# Directory containing debug structures
debug_dir = "tests/debug_structures"

# Load and analyze all CIF files
for cif_path in glob(os.path.join(debug_dir, "*.cif")):
    # Load structure
    structure = Structure.from_file(cif_path)

    # Get labels from both methods
    moyopy_label = get_protostructure_label_from_moyopy(structure)
    spglib_label = get_protostructure_label_from_spglib(structure)

    # Get detailed moyopy info
    symprec = 1e-4
    moyo_cell = MoyoAdapter.from_structure(structure)
    moyo_data = moyopy.MoyoDataset(moyo_cell, symprec=symprec)

    spglib_cell = (
        structure.lattice.matrix,
        structure.frac_coords,
        structure.atomic_numbers,
    )
    spglib_data = spglib.get_symmetry_dataset(spglib_cell, symprec=symprec)

    # Print results if they differ, delete file if they agree
    if moyopy_label != spglib_label:
        print(f"Structure: {structure.composition.reduced_formula}")

        spglib_wyckoff_counts = Counter(spglib_data.wyckoffs)
        spglib_wyckoff_str = " ".join(
            f"{letter}{count}" for letter, count in spglib_wyckoff_counts.items()
        )
        print(f"{spglib_wyckoff_str=}")

        moyo_wyckoff_counts = Counter(moyo_data.wyckoffs)
        moyo_wyckoff_str = " ".join(
            f"{letter}{count}" for letter, count in moyo_wyckoff_counts.items()
        )
        print(f"{moyo_wyckoff_str=}")

        diff1 = dict(moyo_wyckoff_counts - spglib_wyckoff_counts)
        diff2 = dict(spglib_wyckoff_counts - moyo_wyckoff_counts)
        print(f"differences: + {diff1} - {diff2}")

        print("Aflow labels:")
        print(f"Moyopy:  {moyopy_label}")
        print(f"Spglib:  {spglib_label}")
        print("-" * 80)
    else:
        os.remove(cif_path)
        print(f"Deleted {os.path.basename(cif_path)} - labels match")

@lan496
Copy link

lan496 commented Jan 23, 2025

Thank you! I've reproduced the mC40 structure gives difference wyckoff positions by spglib and moyopy (although they may still be equivalent up to Wyckoff sets). I'll look into what is happening.

@lan496
Copy link

lan496 commented Jan 27, 2025

Can you try pip install moyopy==0.3.1?
The wrong Wyckoff position assignment for the monoclinic would be partly resolved. It will happen for monoclinic structures with skewed basis vectors. Moyo's current implementation may make the skewed basis vectors during the standardization process. So, I continue to investigate.

@janosh
Copy link
Collaborator Author

janosh commented Jan 27, 2025

i bumped to 0.3.1 in CI 4d885bc

=========================== short test summary info ============================
FAILED tests/test_cgcnn_classification.py::test_cgcnn_clf - RuntimeError: Numpy is not available
FAILED tests/test_cgcnn_regression.py::test_cgcnn_regression - RuntimeError: Numpy is not available
FAILED tests/test_core.py::test_masked_mean - RuntimeError: Numpy is not available
FAILED tests/test_core.py::test_masked_std - RuntimeError: Numpy is not available
FAILED tests/test_roost_classification.py::test_roost_clf - RuntimeError: Numpy is not available
FAILED tests/test_roost_regression.py::test_roost_regression - RuntimeError: Numpy is not available
FAILED tests/test_wren_classification.py::test_wren_clf - RuntimeError: Numpy is not available
FAILED tests/test_wren_regression.py::test_wren_regression - RuntimeError: Numpy is not available
FAILED tests/test_wrenformer.py::test_wrenformer_regression - RuntimeError: Numpy is not available
FAILED tests/test_wrenformer.py::test_wrenformer_classification - RuntimeError: Numpy is not available
FAILED tests/test_wyckoff_ops.py::test_get_protostructure_label_from_moyopy[structure3-AB_hP4_186_b_b:O-Zn] - AssertionError: unexpected moyopy protostructure for structure=Structure Summary
  Lattice
      abc : 3.8227 3.8227 6.2607
   angles : 90.0 90.0 119.99999999999999
   volume : 79.23078495184231
        A : np.float64(3.8227) np.float64(0.0) np.float64(2.340728659550294e-16)
        B : np.float64(-1.911349999999999) np.float64(3.3105553110467745) np.float64(2.340728659550294e-16)
        C : np.float64(0.0) np.float64(0.0) np.float64(6.2607)
      pbc : True True True
  PeriodicSite: Zn (6.661e-16, 2.207, 2.341e-16) [0.3333, 0.6667, 0.0]
  PeriodicSite: O (1.911, 1.104, 2.347) [0.6667, 0.3333, 0.3748]
  PeriodicSite: Zn (1.911, 1.104, 3.13) [0.6667, 0.3333, 0.5]
  PeriodicSite: O (6.661e-16, 2.207, 5.477) [0.3333, 0.6667, 0.8748]
assert 'AB_hP4_186_2b_2b:O-Zn' == 'AB_hP4_186_b_b:O-Zn'
  
  - AB_hP4_186_b_b:O-Zn
  + AB_hP4_186_2b_2b:O-Zn
  ?            +  +
FAILED tests/test_wyckoff_ops.py::test_moyopy_spglib_consistency[ABC6D2_mC40_15_e_a_3f_f:Ca-Fe-O-Si] - AssertionError: spglib moyopy protostructure mismatch for ABC6D2_mC40_15_e_a_3f_f:Ca-Fe-O-Si
assert 'ABC6D2_mC40_..._f:Ca-Fe-O-Si' == 'ABC6D2_mC40_..._f:Ca-Fe-O-Si'
  
  - ABC6D2_mC40_15_e_a_3f_f:Ca-Fe-O-Si
  ?                  ^
  + ABC6D2_mC40_15_e_c_3f_f:Ca-Fe-O-Si
  ?                  ^
= 12 failed, 61 passed, 5 skipped, 3 xfailed, 13 xpassed in 160.64s (0:02:40) ==

@lan496
Copy link

lan496 commented Feb 1, 2025

For Wurtzite ZnO case, get_protostructure_label_from_moyopy seems to return wrong equivalent_wyckoff_labels from orbits [0, 1, 0, 1].

For ABC6D2_mC40_15.cif, spglib and moyopy return the same wyckoff sequences in my environment.

Moyopy:  ABC6D2_mC40_15_e_a_3f_f:Ca-Fe-O-Si
Spglib:  ABC6D2_mC40_15_e_a_3f_f:Ca-Fe-O-Si

Hmm..., moyopy still has some edge cases for monoclinic.

@lan496
Copy link

lan496 commented Feb 1, 2025

I see. Some random structure by pyxtal gives an edge case

Reduced Formula: CaFe
abc   :   7.653881  13.179314  11.741742
angles:  90.000000  38.914249  90.000000
pbc   :       True       True       True
Sites (8)
  #  SP      a        b     c
---  ----  ---  -------  ----
  0  Ca    0    0.57414  0.25
  1  Ca    0    0.42586  0.75
  2  Ca    0.5  0.07414  0.25
  3  Ca    0.5  0.92586  0.75
  4  Fe    0    0        0
  5  Fe    0    0        0.5
  6  Fe    0.5  0.5      0
  7  Fe    0.5  0.5      0.5

this structure results in different labels:

Moyopy:  AB_mC8_15_e_a:Ca-Fe
Spglib:  AB_mC8_15_e_c:Ca-Fe

In this case, moyopy looks to be correct: Ca (4e, y=0.57414) and Fe (4a). Things become complicated...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants