-
Notifications
You must be signed in to change notification settings - Fork 50
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #158 from ADA110/example_mse
Added Examples Folder and MSE Example
- Loading branch information
Showing
1 changed file
with
145 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,145 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Mean Squared Error Example\n", | ||
"\n", | ||
"The following example uses the elastic_tensor_2015 dataset and a default config to create a MatPipe. This MatPipe is used to benchmark the target property K_VRH. We use the resulting information to determine the mean squared error.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Setting up the Dataframe\n", | ||
"\n", | ||
"Use the load dataset function to get access to the dataset in Automatminer. In this example, we will be loading the elastic_tensor_2015 dataset." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from matminer.datasets.dataset_retrieval import load_dataset\n", | ||
"\n", | ||
"df = load_dataset(\"elastic_tensor_2015\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Use get_preset_config to use different pre-built configurations for a MatPipe. The options include production, default, fast, and debug. Specific details about each config can be seen in presets.py. In this example, we will be using the debug config for a short program runtime. Then, we will pass in the parameter as an argument of MatPipe to get a MatPipe object." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from automatminer.presets import get_preset_config\n", | ||
"from automatminer.pipeline import MatPipe\n", | ||
"\n", | ||
"debug_config = get_preset_config(\"debug\")\n", | ||
"pipe = MatPipe(**debug_config)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The preset automatminer uses pre-defined column names 'composition' and 'structure' to find the composition and structure columns. You can change these by editing your config." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"df = df.rename(columns={\"formula\": \"composition\"})[[\"composition\", \"structure\", \"K_VRH\"]]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Benchmarking\n", | ||
"\n", | ||
"In this example, we are performing an ML benchmark using MatPipe in order to see how well AutoML can predict a certain target property. The target property we will be benchmarking in this example is K_VRH. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"predicted = pipe.benchmark(df, \"K_VRH\", test_spec=0.2)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Calculating MSE\n", | ||
"\n", | ||
"The predicted variable is a dataframe that contains several columns, including actual property values and predicted property values. In this example, we will use the actual K_VRH data and the predicted K_VRH data in order to see how well the benchmarking went." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"y_true = predicted[\"K_VRH\"]\n", | ||
"y_test = predicted[\"K_VRH predicted\"]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Use the mean_squared_error function from sklearn to calculate the mean squared error of predicted vs. actual K_VRH data." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from sklearn.metrics.regression import mean_squared_error\n", | ||
"\n", | ||
"mse = mean_squared_error(y_true, y_test)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.6.5" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |