Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update docs with updated matbench info #282

Merged
merged 6 commits into from
May 3, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ automatminer is an automatic prediction engine for materials properties.
|:----------:|:-------------:|:------:|:------:|
| [![CircleCI](https://img.shields.io/circleci/project/github/hackingmaterials/automatminer/master.svg)](https://circleci.com/gh/hackingmaterials/automatminer) | [![Codacy Badge](https://img.shields.io/codacy/coverage/aa63dd7aa85e480bbe0e924a02ad1540.svg?colorB=brightgreen)](https://www.codacy.com/app/ardunn/automatminer) | [![Codacy Badge](https://img.shields.io/codacy/grade/aa63dd7aa85e480bbe0e924a02ad1540.svg)](https://www.codacy.com/app/ardunn/automatminer) | [![PyPI version](https://img.shields.io/pypi/v/automatminer.svg?colorB=blue)](https://pypi.org/project/automatminer/) |

- **Website (including documentation):** <http://hackingmaterials.lbl.gov/automatminer/>
- **Matbench benchmark datasets**: <https://hackingmaterials.lbl.gov/automatminer/datasets.html>
- **Website (including documentation):** <https://hackingmaterials.lbl.gov/automatminer/>
- **Help/Support:** <https://discuss.matsci.org/c/matminer>
- **Source:** <https://github.com/hackingmaterials/automatminer>

Expand Down
12 changes: 12 additions & 0 deletions automatminer_dev/matbench/get_info.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
from matminer.datasets.dataset_retrieval import load_dataset, get_available_datasets, get_all_dataset_info
datasets = get_available_datasets(print_format=None)

for dataset in datasets:
if "matbench_" in dataset:
df = load_dataset(dataset)

target_col = [col for col in df.columns if col not in ["structure", "composition"]][0]
print(f" * - :code:`{dataset}`\n - :code:`{target_col}`\n - {df.shape[0]}")


# print(get_all_dataset_info("matbench_steels"))
158 changes: 148 additions & 10 deletions docs/_sources/datasets.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Overview
------------

MatBench is an `ImageNet <http://www.image-net.org>`_ for materials science; a
set of 13 benchmarking ML problems for fair comparison, across a wide domain of
set of 13 supervised ML tasks for benchmarking and fair comparison spanning a wide domain of
inorganic materials science applications.

.. image:: _static/matbench_pie_charts.png
Expand All @@ -20,18 +20,141 @@ For now, you can still access the benchmark datasets. See the "Accessing MatBenc
section for more info.



Accessing MatBench
------------------

We have made the MatBench benchmark publicly available via the `matminer
datasets repository <https://hackingmaterials.lbl.gov/matminer/dataset_summary.html>`_
(and also via `Figshare <https://figshare.com/account/home#/projects/67337>`_).
and through the `Materials Project MPContribs-ML Deployment <https://ml.materialsproject.org>`_.
All the Matbench datasets begin with :code:`matbench_`.


Here's a full list of the 13 datasets in Matbench v0.1:

.. list-table::
:align: left
:header-rows: 1

* - dataset name
- target column
- number of samples
- task type
- download link
* - :code:`matbench_dielectric`
- :code:`n`
- 4764
- regression
- `link <https://ml.materialsproject.org/matbench_dielectric.json.gz>`_
* - :code:`matbench_expt_gap`
- :code:`gap expt`
- 4604
- regression
- `link <https://ml.materialsproject.org/matbench_expt_gap.json.gz>`_
* - :code:`matbench_expt_is_metal`
- :code:`is_metal`
- 4921
- classification
- `link <https://ml.materialsproject.org/matbench_expt_is_metal.json.gz>`_
* - :code:`matbench_glass`
- :code:`gfa`
- 5680
- classification
- `link <https://ml.materialsproject.org/matbench_glass.json.gz>`_
* - :code:`matbench_jdft2d`
- :code:`exfoliation_en`
- 636
- regression
- `link <https://ml.materialsproject.org/matbench_jdft2d.json.gz>`_
* - :code:`matbench_log_gvrh`
- :code:`log10(G_VRH)`
- 10987
- regression
- `link <https://ml.materialsproject.org/matbench_log_gvrh.json.gz>`_
* - :code:`matbench_log_kvrh`
- :code:`log10(K_VRH)`
- 10987
- regression
- `link <https://ml.materialsproject.org/matbench_log_kvrh.json.gz>`_
* - :code:`matbench_mp_e_form`
- :code:`e_form`
- 132752
- regression
- `link <https://ml.materialsproject.org/matbench_mp_e_form.json.gz>`_
* - :code:`matbench_mp_gap`
- :code:`gap pbe`
- 106113
- regression
- `link <https://ml.materialsproject.org/matbench_mp_gap.json.gz>`_
* - :code:`matbench_mp_is_metal`
- :code:`is_metal`
- 106113
- classification
- `link <https://ml.materialsproject.org/matbench_mp_is_metal.json.gz>`_
* - :code:`matbench_perovskites`
- :code:`e_form`
- 18928
- regression
- `link <https://ml.materialsproject.org/matbench_perovskites.json.gz>`_
* - :code:`matbench_phonons`
- :code:`last phdos peak`
- 1265
- regression
- `link <https://ml.materialsproject.org/matbench_phonons.json.gz>`_
* - :code:`matbench_steels`
- :code:`yield strength`
- 312
- regression
- `link <https://ml.materialsproject.org/matbench_steels.json.gz>`_


Getting dataset info
--------------------

You can get more info (such as the meaning of column names, brief cleaning
procedures, etc.) on a dataset with :code:`matminer.datasets.get_all_dataset_info`:

.. code-block:: python

from matminer.datasets import get_all_dataset_info

# Get dataset info from matminer
info = get_all_dataset_info("matbench_steels")

# Check out the info about the dataset.
print(info)

You can download the datasets with the :code:`matminer.datasets.load_dataset`
function; the names of the datasets are named :code:`matbench-*` where :code:`*`
is the name of the benchmark problem.

Here's the MatBench benchmark for predicting refractive index (calculated with
.. code-block:: text

Dataset: matbench_steels
Description: Matbench v0.1 dataset for predicting steel yield strengths from chemical composition alone. Retrieved from Citrine informatics. Deduplicated.
Columns:
composition: Chemical formula.
yield strength: Target variable. Experimentally measured steel yield strengths, in GPa.
Num Entries: 312
Reference: https://citrination.com/datasets/153092/
Bibtex citations: ['@misc{Citrine Informatics,\ntitle = {Mechanical properties of some steels},\nhowpublished = {\\url{https://citrination.com/datasets/153092/},\n}']
File type: json.gz
Figshare URL: https://ml.materialsproject.org/matbench_steels.json.gz


You can also view all the Matbench datasets on the matminer
`Dataset Summary page <https://hackingmaterials.lbl.gov/matminer/dataset_summary.html>`_ (search
for "matbench").


(Down)loading datasets
-----------------------

While you can download the zipped json datasets via the download links above, we
recommend using matminer's tools to load datasets. Matminer intelligently manages the
dataset downloads in its central folder and provides methods for robustly loading dataframes containing
pymatgen primitives such as structures.

You can load the datasets with the :code:`matminer.datasets.load_dataset`
function; the function accepts the dataset name as an argument.
Here's an example of loading the Matbench task for predicting refractive index (calculated with
DFPT) from crystal structure.

.. code-block:: python
Expand All @@ -46,6 +169,25 @@ DFPT) from crystal structure.
print(df)


.. code-block:: text

structure n
0 [[4.29304147 2.4785886 1.07248561] S, [4.2930... 1.752064
1 [[3.95051434 4.51121437 0.28035002] K, [4.3099... 1.652859
2 [[-1.78688104 4.79604117 1.53044621] Rb, [-1... 1.867858
3 [[4.51438064 4.51438064 0. ] Mn, [0.133... 2.676887
4 [[-4.36731958 6.8886097 0.50929706] Li, [-2... 1.793232
... ...
4759 [[ 2.79280881 0.12499663 -1.84045389] Ca, [-2... 2.136837
4760 [[0. 5.50363806 3.84192106] O, [4.7662... 2.690619
4761 [[0. 0. 0.] Ba, [ 0.23821924 4.32393487 -0.35... 2.811494
4762 [[0. 0.18884638 0. ] K, [0. ... 1.832887
4763 [[0. 0. 0.] Cs, [2.80639641 2.80639641 2.80639... 2.559279
[4764 rows x 2 columns]


This loads the dataframe in this format:

:code:`df` (:code:`matbench_dielectric`)

.. list-table::
Expand All @@ -64,8 +206,4 @@ DFPT) from crystal structure.
- ...


Find all the MatBench problem names and info
`here <https://hackingmaterials.lbl.gov/matminer/dataset_summary.html>`_ (search
for "matbench").

*Note: Larger datasets will take several minutes to load.*
2 changes: 1 addition & 1 deletion docs/_static/documentation_options.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
var DOCUMENTATION_OPTIONS = {
URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'),
VERSION: '1.0.2.20191110',
VERSION: '1.0.3.20191111',
LANGUAGE: 'None',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
Expand Down
2 changes: 1 addition & 1 deletion docs/advanced.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>Advanced Usage &#8212; Automatminer 1.0.2.20191110 documentation</title>
<title>Advanced Usage &#8212; Automatminer 1.0.3.20191111 documentation</title>
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
Expand Down
2 changes: 1 addition & 1 deletion docs/automatminer.automl.config.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>automatminer.automl.config package &#8212; Automatminer 1.0.2.20191110 documentation</title>
<title>automatminer.automl.config package &#8212; Automatminer 1.0.3.20191111 documentation</title>
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
Expand Down
2 changes: 1 addition & 1 deletion docs/automatminer.automl.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>automatminer.automl package &#8212; Automatminer 1.0.2.20191110 documentation</title>
<title>automatminer.automl package &#8212; Automatminer 1.0.3.20191111 documentation</title>
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
Expand Down
2 changes: 1 addition & 1 deletion docs/automatminer.automl.tests.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>automatminer.automl.tests package &#8212; Automatminer 1.0.2.20191110 documentation</title>
<title>automatminer.automl.tests package &#8212; Automatminer 1.0.3.20191111 documentation</title>
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
Expand Down
19 changes: 16 additions & 3 deletions docs/automatminer.featurization.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>automatminer.featurization package &#8212; Automatminer 1.0.2.20191110 documentation</title>
<title>automatminer.featurization package &#8212; Automatminer 1.0.3.20191111 documentation</title>
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
Expand Down Expand Up @@ -246,8 +246,21 @@ <h2>Submodules<a class="headerlink" href="#submodules" title="Permalink to this
state.</p></li>
<li><p><strong>multiiindex</strong> (<a class="reference external" href="https://docs.python.org/3/library/functions.html#bool" title="(in Python v3.8)"><em>bool</em></a>) – If True, returns a multiindexed dataframe. Not
recommended for use in MatPipe.</p></li>
<li><p><strong>n_jobs</strong> (<a class="reference external" href="https://docs.python.org/3/library/functions.html#int" title="(in Python v3.8)"><em>int</em></a>) – The number of parallel jobs to use during featurization
for each featurizer. Default is n_cores</p></li>
<li><p><strong>do_precheck</strong> (<a class="reference external" href="https://docs.python.org/3/library/functions.html#bool" title="(in Python v3.8)"><em>bool</em></a>) – Execute a precheck on each featurizer before
featurizing with it. See matminer prechecking for more info.</p></li>
<li><p><strong>n_jobs</strong> (<a class="reference external" href="https://docs.python.org/3/library/functions.html#int" title="(in Python v3.8)"><em>int</em></a>) – <p>The number of parallel jobs to use during featurization
for each featurizer. Default is n_cores</p>
<blockquote>
<div><p>composition_col=”composition”,</p>
</div></blockquote>
</p></li>
<li><p><strong>composition_col</strong> (<a class="reference external" href="https://docs.python.org/3/library/stdtypes.html#str" title="(in Python v3.8)"><em>str</em></a>) – Name of the column containing structures to be
featurized.</p></li>
<li><p><strong>structure_col</strong> (<a class="reference external" href="https://docs.python.org/3/library/stdtypes.html#str" title="(in Python v3.8)"><em>str</em></a>) – featurized</p></li>
<li><p><strong>bandstructure</strong> (<a class="reference external" href="https://docs.python.org/3/library/stdtypes.html#str" title="(in Python v3.8)"><em>str</em></a>) – Name of the column containing bandstructures to
be featurized.</p></li>
<li><p><strong>dos_col</strong> (<a class="reference external" href="https://docs.python.org/3/library/stdtypes.html#str" title="(in Python v3.8)"><em>str</em></a>) – Name of the column containing density of states obejcts
to be featurized.</p></li>
</ul>
</dd>
</dl>
Expand Down
2 changes: 1 addition & 1 deletion docs/automatminer.featurization.tests.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>automatminer.featurization.tests package &#8212; Automatminer 1.0.2.20191110 documentation</title>
<title>automatminer.featurization.tests package &#8212; Automatminer 1.0.3.20191111 documentation</title>
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
Expand Down
2 changes: 1 addition & 1 deletion docs/automatminer.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>automatminer package &#8212; Automatminer 1.0.2.20191110 documentation</title>
<title>automatminer package &#8212; Automatminer 1.0.3.20191111 documentation</title>
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
Expand Down
2 changes: 1 addition & 1 deletion docs/automatminer.preprocessing.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>automatminer.preprocessing package &#8212; Automatminer 1.0.2.20191110 documentation</title>
<title>automatminer.preprocessing package &#8212; Automatminer 1.0.3.20191111 documentation</title>
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
Expand Down
2 changes: 1 addition & 1 deletion docs/automatminer.preprocessing.tests.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>automatminer.preprocessing.tests package &#8212; Automatminer 1.0.2.20191110 documentation</title>
<title>automatminer.preprocessing.tests package &#8212; Automatminer 1.0.3.20191111 documentation</title>
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
Expand Down
2 changes: 1 addition & 1 deletion docs/automatminer.tests.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>automatminer.tests package &#8212; Automatminer 1.0.2.20191110 documentation</title>
<title>automatminer.tests package &#8212; Automatminer 1.0.3.20191111 documentation</title>
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
Expand Down
2 changes: 1 addition & 1 deletion docs/automatminer.utils.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>automatminer.utils package &#8212; Automatminer 1.0.2.20191110 documentation</title>
<title>automatminer.utils package &#8212; Automatminer 1.0.3.20191111 documentation</title>
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
Expand Down
2 changes: 1 addition & 1 deletion docs/automatminer.utils.tests.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>automatminer.utils.tests package &#8212; Automatminer 1.0.2.20191110 documentation</title>
<title>automatminer.utils.tests package &#8212; Automatminer 1.0.3.20191111 documentation</title>
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
Expand Down
2 changes: 1 addition & 1 deletion docs/basic.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>Basic Usage &#8212; Automatminer 1.0.2.20191110 documentation</title>
<title>Basic Usage &#8212; Automatminer 1.0.3.20191111 documentation</title>
<link rel="stylesheet" href="_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
Expand Down
Loading