Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new default feature preset and updates for new matminer & pymatgen versions #101

Merged
merged 11 commits into from
Dec 21, 2022
9 changes: 8 additions & 1 deletion .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,15 @@ updates:
- package-ecosystem: pip
directory: "/"
schedule:
interval: daily
interval: monthly
open-pull-requests-limit: 10
target-branch: master
labels:
- dependency_updates
- package-ecosystem: github-actions
directory: "/"
schedule:
interval: monthly
target-branch: master
labels:
- CI
13 changes: 9 additions & 4 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,22 @@ jobs:
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- uses: actions/checkout@v1

- uses: actions/checkout@v3

- name: Set up Python 3.8
uses: actions/setup-python@v1
uses: actions/setup-python@v4
with:
python-version: 3.8
cache: 'pip'
cache-dependency-path: |
**/requirements*.txt

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install --ignore-installed .[test,dev]
pip install .[test,dev]
ml-evs marked this conversation as resolved.
Show resolved Hide resolved

- name: Run pre-commit
run: |
Expand All @@ -44,4 +49,4 @@ jobs:
- name: Run tests with pytest
run: |
# run tests with pytest, reporting coverage and timings
py.test -rs -vvv --durations=0 --cov=./modnet/
pytest -m "not slow" -rs -vvv --durations=0 --cov=./modnet/
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
### Custom
modnet/data/
.mypy_cache

### Python template
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
modnet/data/

# Distribution / packaging
build/
Expand Down
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ repos:
- id: check-symlinks
- id: end-of-file-fixer

- repo: https://gitlab.com/pycqa/flake8
rev: '3.9.2'
- repo: https://github.com/pycqa/flake8
rev: '6.0.0'
hooks:
- id: flake8
10 changes: 1 addition & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# MODNet: Material Optimal Descriptor Network

[![arXiv](https://img.shields.io/badge/arXiv-2004.14766-brightgreen)](https://arxiv.org/abs/2004.14766) [![Build Status](https://img.shields.io/github/workflow/status/ppdebreuck/modnet/Run%20tests?logo=github)](https://github.com/ppdebreuck/modnet/actions?query=branch%3Amaster+) [![Read the Docs](https://img.shields.io/readthedocs/modnet)](https://modnet.readthedocs.io/en/latest/)
[![arXiv](https://img.shields.io/badge/arXiv-2004.14766-brightgreen)](https://arxiv.org/abs/2004.14766) [![Build Status](https://img.shields.io/github/actions/workflow/status/ppdebreuck/modnet/ci.yml?logo=github&branch=main)](https://github.com/ppdebreuck/modnet/actions?query=branch%3Amaster+) [![Read the Docs](https://img.shields.io/readthedocs/modnet)](https://modnet.readthedocs.io/en/latest/)

<a name="introduction"></a>
## Introduction
Expand Down Expand Up @@ -47,14 +47,6 @@ Activate the environment:
conda activate modnet
```

Then, install pymatgen v2020.8.13 with conda, which will bundle several pre-built dependencies (e.g., numpy, scipy):

```shell
conda install -c conda-forge pymatgen=2020.8.13
```

(you could alternatively do this step with `pip install pymatgen==2020.8.13`).

Finally, install MODNet from PyPI with `pip`:

```shell
Expand Down
2 changes: 1 addition & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
sphinx~=4.4
sphinx~=5.3
sphinx-rtd-theme~=1.0
sphinxcontrib-napoleon~=0.7
2 changes: 1 addition & 1 deletion modnet/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.1.13"
__version__ = "0.2.0~develop"
8 changes: 4 additions & 4 deletions modnet/featurizers/featurizers.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ def featurize(self, df: pd.DataFrame) -> pd.DataFrame:

Arguments:
df: the input dataframe with a `"structure"` column
containing `pymatgen.Structure` objects.
containing pymatgen `Structure` objects.

Returns:
The featurized DataFrame.
Expand Down Expand Up @@ -137,7 +137,7 @@ def featurize_composition(self, df: pd.DataFrame) -> pd.DataFrame:

Arguments:
df: the input dataframe with a `"structure"` column
containing `pymatgen.Structure` objects.
containing pymatgen `Structure` objects.

Returns:
pandas.DataFrame: the decorated DataFrame, or an empty
Expand Down Expand Up @@ -184,7 +184,7 @@ def featurize_structure(self, df: pd.DataFrame) -> pd.DataFrame:

Arguments:
df: the input dataframe with a `"structure"` column
containing `pymatgen.Structure` objects.
containing pymatgen `Structure` objects.

Returns:
pandas.DataFrame: the decorated DataFrame.
Expand All @@ -206,7 +206,7 @@ def featurize_site(

Arguments:
df: the input dataframe with a `"structure"` column
containing `pymatgen.Structure` objects.
containing pymatgen `Structure` objects.
aliases: optional dictionary to map matminer output column
names to new aliases, mostly used for
backwards-compatibility.
Expand Down
16 changes: 14 additions & 2 deletions modnet/featurizers/presets/__init__.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,20 @@
__all__ = ("FEATURIZER_PRESETS",)
__all__ = (
"FEATURIZER_PRESETS",
"DEFAULT_FEATURIZER",
"DEFAULT_COMPOSITION_ONLY_FEATURIZER",
)

from typing import Dict, Type
from .debreuck_2020 import DeBreuck2020Featurizer, CompositionOnlyFeaturizer
from .matminer_2023 import Matminer2023Featurizer, CompositionOnlyMatminer2023Featurizer
from modnet.featurizers import MODFeaturizer

FEATURIZER_PRESETS = {
DEFAULT_FEATURIZER: str = "Matminer2023"
DEFAULT_COMPOSITION_ONLY_FEATURIZER: str = "CompositionOnlyMatminer2023"

FEATURIZER_PRESETS: Dict[str, Type[MODFeaturizer]] = {
"DeBreuck2020": DeBreuck2020Featurizer,
"CompositionOnly": CompositionOnlyFeaturizer,
"Matminer2023": Matminer2023Featurizer,
"CompositionOnlyMatminer2023": CompositionOnlyMatminer2023Featurizer,
}
89 changes: 70 additions & 19 deletions modnet/featurizers/presets/debreuck_2020.py
Original file line number Diff line number Diff line change
@@ -1,26 +1,59 @@
""" This submodule contains the DeBreuck2020Featurizer class. """

import numpy as np
from pymatgen.core.periodic_table import Element
from pymatgen.analysis.local_env import VoronoiNN
import modnet.featurizers
import contextlib
import warnings


class DeBreuck2020Featurizer(modnet.featurizers.MODFeaturizer):
"""Featurizer presets used for the paper 'Machine learning
materials properties for small datasets' by Pierre-Paul De Breuck,
Geoffroy Hautier & Gian-Marco Rignanese, arXiv:2004.14766 (2020).
"""Featurizer presets used for the paper

**Materials property prediction for limited datasets enabled
by feature selection and joint learning with MODNet**,
Pierre-Paul De Breuck, Geoffroy Hautier & Gian-Marco Rignanese
npj Comp. Mat. 7(1) 1-8 (2021)
10.1038/s41524-021-00552-2

Uses most of the featurizers implemented by matminer at the time of
writing with their default hyperparameters and presets.

"""

def __init__(self, fast_oxid=False):
super().__init__()
package_version_requirements = {"matminer": "==0.6.2"}

def __init__(self, fast_oxid: bool = False):
"""Creates the featurizer and imports all featurizer functions.

Parameters:
fast_oxid: Whether to use the accelerated oxidation state parameters within
pymatgen when constructing features that constrain oxidation states such
that all sites with the same species in a structure will have the same
oxidation state (recommended if featurizing any structure
with large unit cells).

"""
import matminer

if matminer.__version__ != self.package_version_requirements[
"matminer"
].replace("==", ""):
warnings.warn(
f"The {self.__class__.__name__} preset was written for and tested only with matminer{self.package_version_requirements['matminer']}.\n"
"Newer versions of matminer will not work, and older versions may not be compatible with newer MODNet versions due to other conflicts.\n"
"To use this featurizer robustly, please install `modnet==0.1.13` with its pinned dependencies.\n\n"
"This preset will now be initialised without importing matminer featurizers to enable use with existing previously featurized data, "
"but attempts to perform further featurization will result in an error."
)

else:
super().__init__()
self.load_featurizers()
self.fast_oxid = fast_oxid

def load_featurizers(self):
with contextlib.redirect_stdout(None):
ml-evs marked this conversation as resolved.
Show resolved Hide resolved
from pymatgen.analysis.local_env import VoronoiNN
from matminer.featurizers.composition import (
AtomicOrbitals,
AtomicPackingEfficiency,
Expand Down Expand Up @@ -117,13 +150,14 @@ def __init__(self, fast_oxid=False):
OPSiteFingerprint(),
VoronoiFingerprint(),
)
self.fast_oxid = fast_oxid

def featurize_composition(self, df):
"""Applies the preset composition featurizers to the input dataframe,
renames some fields and cleans the output dataframe.

"""
from pymatgen.core.periodic_table import Element

df = super().featurize_composition(df)

_orbitals = {"s": 1, "p": 2, "d": 3, "f": 4}
Expand Down Expand Up @@ -151,18 +185,21 @@ def featurize_structure(self, df):

df = super().featurize_structure(df)

dist = df["RadialDistributionFunction|radial distribution function"].iloc[0][
"distances"
][:50]
for i, d in enumerate(dist):
_rdf_key = "RadialDistributionFunction|radial distribution function|d_{:.2f}".format(
d
)
df[_rdf_key] = df[
"RadialDistributionFunction|radial distribution function"
].apply(lambda x: x["distribution"][i])
if "RadialDistributionFunction|radial distribution function" in df:
dist = df["RadialDistributionFunction|radial distribution function"].iloc[
0
]["distances"][:50]
for i, d in enumerate(dist):
_rdf_key = "RadialDistributionFunction|radial distribution function|d_{:.2f}".format(
d
)
df[_rdf_key] = df[
"RadialDistributionFunction|radial distribution function"
].apply(lambda x: x["distribution"][i])

df = df.drop("RadialDistributionFunction|radial distribution function", axis=1)
df = df.drop(
"RadialDistributionFunction|radial distribution function", axis=1
)

_crystal_system = {
"cubic": 1,
Expand Down Expand Up @@ -210,6 +247,20 @@ def featurize_site(self, df):


class CompositionOnlyFeaturizer(DeBreuck2020Featurizer):
"""This subclass simply disables structure and site-level features
from the main `DeBreuck2020Featurizer` class.

**Materials property prediction for limited datasets enabled
by feature selection and joint learning with MODNet**
Pierre-Paul De Breuck, Geoffroy Hautier & Gian-Marco Rignanese
npj Comp. Mat. 7(1) 1-8 (2021)
10.1038/s41524-021-00552-2

Uses most of the featurizers implemented by matminer at the time of
writing with their default hyperparameters and presets.

"""

def __init__(self):
super().__init__()
self.oxid_composition_featurizers = ()
Expand Down
Loading