-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Examples Folder and MSE Example #158
Changes from 19 commits
567f187
70a9d18
f0c4057
30287fe
931074d
28703cf
775f1e4
c527473
4f55c32
165a9f9
e544cd8
7aa0874
c73be2f
60f78c6
18db1e9
f26ccb3
59b78e0
537a0cf
bb55a09
a8be96a
de77e28
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
import unittest | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks good. Also is cool that it's in a test, but I'm afraid it will just confuse people who come to use it (ie, "woah, it is weird they put this test here, I wonder where the example is"). I don't think we will be frequently running this test anyway (especially because you are using the default config, not the debug config which is much faster). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of being in a test, could you write out the file in a simple script (or even better, notebook), with perhaps a few more comments? Not everyone is particularly familiar with pandas/automatminer/matminer/machine learning, and there are a few areas that won't make sense to someone unfamiliar with this stack. For example, when we are renaming the formula column, just add a comment saying "The preset automatminer uses pre-defined column names 'composition' and 'structure' to find the composition and structure columns. You can change these by editing your config" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
import os | ||
|
||
from automatminer.pipeline import MatPipe | ||
from automatminer.presets import get_preset_config | ||
from matminer.datasets.dataset_retrieval import load_dataset | ||
from sklearn.metrics.regression import mean_squared_error | ||
|
||
|
||
@unittest.skipIf("CI" in os.environ.keys(), "Test too intensive for CircleCI.") | ||
class MSE_Example(unittest.TestCase): | ||
|
||
""" | ||
The following example uses the elastic_tensor_2015 dataset and a | ||
default config to create a MatPipe. This MatPipe is used to benchmark | ||
the target property K_VRH. | ||
|
||
The unit tests confirm that the output of the benchmark is not empty. | ||
They also ensure that, based on this specific example, the mean | ||
squared error is between 0 and 500. | ||
|
||
For debugging purposes, you can use the debug config instead. In | ||
addition, make the range of the mean squared error be 0 - 1000 rather | ||
than 0 - 500. | ||
|
||
""" | ||
|
||
def test_mse_example(self): | ||
df = load_dataset("elastic_tensor_2015") | ||
default_config = get_preset_config("default") | ||
pipe = MatPipe(**default_config) | ||
df = df.rename(columns={"formula": "composition"})[["composition", "structure", "K_VRH"]] | ||
predicted = pipe.benchmark(df, "K_VRH", test_spec=0.2) | ||
self.assertTrue(not predicted.empty) | ||
|
||
y_true = predicted["K_VRH"] | ||
y_test = predicted["K_VRH predicted"] | ||
mse = mean_squared_error(y_true, y_test) | ||
print("MSE: " + str(mse)) | ||
self.assertTrue(mse < 500) | ||
self.assertTrue(mse > 0) | ||
|
||
|
||
if __name__ == '__main__': | ||
unittest.main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there these logger changes happening here? In this PR we shouldn't do this