-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to use built-in pickle
for saving AutoMLSearch
#2463
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2463 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 283 283
Lines 25555 25568 +13
=======================================
+ Hits 25453 25466 +13
Misses 102 102
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christopherbunn Looks good! The original issue also mentions doing something similar for pipelines. Is the plan is to do that in this PR or a follow up?
@@ -1331,7 +1347,7 @@ def load(file_path): | |||
AutoSearchBase object | |||
""" | |||
with open(file_path, "rb") as f: | |||
return cloudpickle.load(f) | |||
return pickle.load(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can save with cloudpickle and read with pickle?! Wow, I had no idea that works hehe.
I feel like we should accept an argument here for the "pickle type"? Feels weird to offer a choice of library for save
but not respect that in load
. Not blocking though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah I agree, it does feel symmetrical to do so. I think it would be a no-op though since it looks like the doc for cloudpickle just recommends using the standard python pickler for loading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to pickle using a jupyter notebook and I'm getting the following, am I doing something wrong?:
automl_ = AutoMLSearch(X, y, problem_type="regression")
automl_.search()
automl_.save("test.pkl", pickle_type="pickle")
I get:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-12-e1254c6fb959> in <module>
----> 1 automl_.save("test.pkl", pickle_type="pickle")
~/Desktop/evalml/evalml/automl/automl_search.py in save(self, file_path, pickle_type, pickle_protocol)
1335
1336 with open(file_path, "wb") as f:
-> 1337 pkl_lib.dump(self, f, protocol=pickle_protocol)
1338
1339 @staticmethod
TypeError: can't pickle module objects
It works if I use cloudpickle 🤔
with open(file_path, "wb") as f: | ||
cloudpickle.dump(self, f, protocol=pickle_protocol) | ||
pkl_lib.dump(self, f, protocol=pickle_protocol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the user uses a cloudpickle protocol while trying to use pickle? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently cloudpickles can be opened by the regular pickling library (according to their example on their README.md and the doc string for cloudpickle.py
in the attached screenshot)! I didn't know this before so that's pretty neat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So cool 🤩
@angela97lin Are you running in a jupyter notebook? The following works for me when running from a script: from evalml.demos import load_diabetes
from evalml.automl import AutoMLSearch
X, y = load_diabetes()
automl = AutoMLSearch(X, y, "regression")
automl.search()
automl.save("automl.pkl", pickle_type="pickle")
automl2 = automl.load("automl.pkl") |
Hmm, that's so odd! I am able to generate a pickle with the test and through a script version of my test Jupyter notebook. At the same time, I am also running into the same error as you when I try to run it interactively in a Jupyter notebook. Let me look into this... |
@freddyaboulton @christopherbunn Yup, running via jupyter notebook. What I was curious about was if it was possible to save in a file and load in completely different file so I wanted to have two separate notebooks to test but ran into this, heh. |
@freddyaboulton @angela97lin I think I figured out the issue: in the notebook environment, we have the search iteration plot appear. However, in the test and script environment, we aren't showing a search iteration plot. I am able to pickle the search in Jupyter when i set I'm guessing that for some reason the plot doesn't support pickling? This is a bit odd considering that plotly has support for pickling per this PR on their repo. I'll see if I can find a workaround or a deeper root cause in Plotly. |
@christopherbunn Wow, good find! If you can't find a good workaround, I think it could also be fine to say that the plots will not be pickled and set |
I think the issue is that the We might want to update the AutoMLSearch User Guide to talk about also using pickle to save automl and the limitations. Would also give us "coverage" for pickling in a jupyter notebook env which our unit tests don't cover. |
Turns out I was wrong, the Jupyter notebook version uses a I think the best path is to actually clear out the |
b464ece
to
b1f58a1
Compare
a725486
to
e52dd22
Compare
e52dd22
to
833e034
Compare
Resolves #2174