diff --git a/automatminer/examples/MSE_Example.ipynb b/automatminer/examples/MSE_Example.ipynb new file mode 100644 index 00000000..4cfa65ee --- /dev/null +++ b/automatminer/examples/MSE_Example.ipynb @@ -0,0 +1,145 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Mean Squared Error Example\n", + "\n", + "The following example uses the elastic_tensor_2015 dataset and a default config to create a MatPipe. This MatPipe is used to benchmark the target property K_VRH. We use the resulting information to determine the mean squared error.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setting up the Dataframe\n", + "\n", + "Use the load dataset function to get access to the dataset in Automatminer. In this example, we will be loading the elastic_tensor_2015 dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from matminer.datasets.dataset_retrieval import load_dataset\n", + "\n", + "df = load_dataset(\"elastic_tensor_2015\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Use get_preset_config to use different pre-built configurations for a MatPipe. The options include production, default, fast, and debug. Specific details about each config can be seen in presets.py. In this example, we will be using the debug config for a short program runtime. Then, we will pass in the parameter as an argument of MatPipe to get a MatPipe object." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from automatminer.presets import get_preset_config\n", + "from automatminer.pipeline import MatPipe\n", + "\n", + "debug_config = get_preset_config(\"debug\")\n", + "pipe = MatPipe(**debug_config)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The preset automatminer uses pre-defined column names 'composition' and 'structure' to find the composition and structure columns. You can change these by editing your config." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df = df.rename(columns={\"formula\": \"composition\"})[[\"composition\", \"structure\", \"K_VRH\"]]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Benchmarking\n", + "\n", + "In this example, we are performing an ML benchmark using MatPipe in order to see how well AutoML can predict a certain target property. The target property we will be benchmarking in this example is K_VRH. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "predicted = pipe.benchmark(df, \"K_VRH\", test_spec=0.2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Calculating MSE\n", + "\n", + "The predicted variable is a dataframe that contains several columns, including actual property values and predicted property values. In this example, we will use the actual K_VRH data and the predicted K_VRH data in order to see how well the benchmarking went." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y_true = predicted[\"K_VRH\"]\n", + "y_test = predicted[\"K_VRH predicted\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Use the mean_squared_error function from sklearn to calculate the mean squared error of predicted vs. actual K_VRH data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.metrics.regression import mean_squared_error\n", + "\n", + "mse = mean_squared_error(y_true, y_test)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}