Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Examples Folder and MSE Example #158

Merged
merged 21 commits into from
Jan 17, 2019
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions automatminer/automl/adaptors.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,9 +118,9 @@ def fit(self, df, target, **fit_kwargs):
self._features = df.drop(columns=target).columns.tolist()
self._ml_data = {"X": X, "y": y}
self.fitted_target = target
self.logger.info("TPOT fitting started.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there these logger changes happening here? In this PR we shouldn't do this

self._logger.info("TPOT fitting started.")
self._backend = self._backend.fit(X, y, **fit_kwargs)
self.logger.info("TPOT fitting finished.")
self._logger.info("TPOT fitting finished.")
return self

@property
Expand Down Expand Up @@ -219,7 +219,7 @@ def predict(self, df, target):
X = df[self._features].values # rectify feature order
y_pred = self._backend.predict(X)
df[target + " predicted"] = y_pred
self.logger.debug("Prediction finished successfully.")
self._logger.debug("Prediction finished successfully.")
return df


Expand Down
45 changes: 45 additions & 0 deletions automatminer/examples/mse_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
import unittest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Also is cool that it's in a test, but I'm afraid it will just confuse people who come to use it (ie, "woah, it is weird they put this test here, I wonder where the example is"). I don't think we will be frequently running this test anyway (especially because you are using the default config, not the debug config which is much faster).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of being in a test, could you write out the file in a simple script (or even better, notebook), with perhaps a few more comments? Not everyone is particularly familiar with pandas/automatminer/matminer/machine learning, and there are a few areas that won't make sense to someone unfamiliar with this stack.

For example, when we are renaming the formula column, just add a comment saying "The preset automatminer uses pre-defined column names 'composition' and 'structure' to find the composition and structure columns. You can change these by editing your config"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import os

from automatminer.pipeline import MatPipe
from automatminer.presets import get_preset_config
from matminer.datasets.dataset_retrieval import load_dataset
from sklearn.metrics.regression import mean_squared_error


@unittest.skipIf("CI" in os.environ.keys(), "Test too intensive for CircleCI.")
class MSE_Example(unittest.TestCase):

"""
The following example uses the elastic_tensor_2015 dataset and a
default config to create a MatPipe. This MatPipe is used to benchmark
the target property K_VRH.

The unit tests confirm that the output of the benchmark is not empty.
They also ensure that, based on this specific example, the mean
squared error is between 0 and 500.

For debugging purposes, you can use the debug config instead. In
addition, make the range of the mean squared error be 0 - 1000 rather
than 0 - 500.

"""

def test_mse_example(self):
df = load_dataset("elastic_tensor_2015")
default_config = get_preset_config("default")
pipe = MatPipe(**default_config)
df = df.rename(columns={"formula": "composition"})[["composition", "structure", "K_VRH"]]
predicted = pipe.benchmark(df, "K_VRH", test_spec=0.2)
self.assertTrue(not predicted.empty)

y_true = predicted["K_VRH"]
y_test = predicted["K_VRH predicted"]
mse = mean_squared_error(y_true, y_test)
print("MSE: " + str(mse))
self.assertTrue(mse < 500)
self.assertTrue(mse > 0)


if __name__ == '__main__':
unittest.main()