Add ability to use built-in `pickle` for saving AutoMLSearch #2463

christopherbunn · 2021-06-30T20:08:06Z

Resolves #2174

codecov · 2021-06-30T20:11:20Z

Codecov Report

Merging #2463 (833e034) into main (143c83a) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #2463     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        283     283             
  Lines      25555   25568     +13     
=======================================
+ Hits       25453   25466     +13     
  Misses       102     102

Impacted Files	Coverage Δ
evalml/automl/automl_search.py	`99.4% <100.0%> (+0.1%)`	⬆️
evalml/automl/pipeline_search_plots.py	`100.0% <100.0%> (ø)`
evalml/tests/automl_tests/test_automl.py	`99.8% <100.0%> (+0.1%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 143c83a...833e034. Read the comment docs.

freddyaboulton

@christopherbunn Looks good! The original issue also mentions doing something similar for pipelines. Is the plan is to do that in this PR or a follow up?

freddyaboulton · 2021-06-30T23:01:06Z

evalml/automl/automl_search.py

@@ -1331,7 +1347,7 @@ def load(file_path):
            AutoSearchBase object
        """
        with open(file_path, "rb") as f:
-            return cloudpickle.load(f)
+            return pickle.load(f)


So we can save with cloudpickle and read with pickle?! Wow, I had no idea that works hehe.

I feel like we should accept an argument here for the "pickle type"? Feels weird to offer a choice of library for save but not respect that in load. Not blocking though.

Hmm yeah I agree, it does feel symmetrical to do so. I think it would be a no-op though since it looks like the doc for cloudpickle just recommends using the standard python pickler for loading.

angela97lin

I'm trying to pickle using a jupyter notebook and I'm getting the following, am I doing something wrong?:

automl_ = AutoMLSearch(X, y, problem_type="regression")
automl_.search()
automl_.save("test.pkl", pickle_type="pickle")

I get:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-e1254c6fb959> in <module>
----> 1 automl_.save("test.pkl", pickle_type="pickle")

~/Desktop/evalml/evalml/automl/automl_search.py in save(self, file_path, pickle_type, pickle_protocol)
   1335 
   1336         with open(file_path, "wb") as f:
-> 1337             pkl_lib.dump(self, f, protocol=pickle_protocol)
   1338 
   1339     @staticmethod

TypeError: can't pickle module objects

It works if I use cloudpickle 🤔

angela97lin · 2021-07-01T01:55:03Z

evalml/automl/automl_search.py

        with open(file_path, "wb") as f:
-            cloudpickle.dump(self, f, protocol=pickle_protocol)
+            pkl_lib.dump(self, f, protocol=pickle_protocol)


What if the user uses a cloudpickle protocol while trying to use pickle? 🤔

Apparently cloudpickles can be opened by the regular pickling library (according to their example on their README.md and the doc string for cloudpickle.py in the attached screenshot)! I didn't know this before so that's pretty neat.

So cool 🤩

freddyaboulton · 2021-07-01T14:22:36Z

@angela97lin Are you running in a jupyter notebook?

The following works for me when running from a script:

from evalml.demos import load_diabetes
from evalml.automl import AutoMLSearch

X, y = load_diabetes()

automl = AutoMLSearch(X, y, "regression")
automl.search()
automl.save("automl.pkl", pickle_type="pickle")
automl2 = automl.load("automl.pkl")

christopherbunn · 2021-07-01T14:34:13Z

I'm trying to pickle using a jupyter notebook and I'm getting the following, am I doing something wrong?:

automl_ = AutoMLSearch(X, y, problem_type="regression")
automl_.search()
automl_.save("test.pkl", pickle_type="pickle")

I get:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-e1254c6fb959> in <module>
----> 1 automl_.save("test.pkl", pickle_type="pickle")

~/Desktop/evalml/evalml/automl/automl_search.py in save(self, file_path, pickle_type, pickle_protocol)
   1335 
   1336         with open(file_path, "wb") as f:
-> 1337             pkl_lib.dump(self, f, protocol=pickle_protocol)
   1338 
   1339     @staticmethod

TypeError: can't pickle module objects

It works if I use cloudpickle 🤔

Hmm, that's so odd! I am able to generate a pickle with the test and through a script version of my test Jupyter notebook. At the same time, I am also running into the same error as you when I try to run it interactively in a Jupyter notebook. Let me look into this...

angela97lin · 2021-07-01T14:37:58Z

@freddyaboulton @christopherbunn Yup, running via jupyter notebook. What I was curious about was if it was possible to save in a file and load in completely different file so I wanted to have two separate notebooks to test but ran into this, heh.

christopherbunn · 2021-07-01T14:59:45Z

@freddyaboulton @angela97lin I think I figured out the issue: in the notebook environment, we have the search iteration plot appear. However, in the test and script environment, we aren't showing a search iteration plot. ~~As a result, the jupyter notebook version has the plot stored in automl.search_iteration_plot whereas the script version has None in this variable.~~ Edit: this was incorrect, see below comment.

I am able to pickle the search in Jupyter when i set automl.search_iteration_plot=None.

I'm guessing that for some reason the plot doesn't support pickling? This is a bit odd considering that plotly has support for pickling per this PR on their repo. I'll see if I can find a workaround or a deeper root cause in Plotly.

angela97lin · 2021-07-01T15:34:31Z

@christopherbunn Wow, good find! If you can't find a good workaround, I think it could also be fine to say that the plots will not be pickled and set automl.search_iteration_plot=None before saving, not sure if others have different opinions on that 🤷‍♀️

freddyaboulton · 2021-07-01T15:38:41Z

I think the issue is that the PipelineSearchPlot has a reference to plotly.graph_objects, which is a module. Maybe we can refactor that? But I think not saving the plot is also a valid approach.

We might want to update the AutoMLSearch User Guide to talk about also using pickle to save automl and the limitations. Would also give us "coverage" for pickling in a jupyter notebook env which our unit tests don't cover.

christopherbunn · 2021-07-01T15:49:02Z

Turns out I was wrong, the Jupyter notebook version uses a SearchIterationPlot object whereas the script/test version stores a plotly figure (which is pickleable). The error is raised because the _go class variable in SearchIterationPlot stores the module representing the plotly.graph_objects class.

I think the best path is to actually clear out the _go class variable after the init since this is the only place where this is used. Initial testing show that pickling works once this is done. I'll test a few edge cases but if it looks good I'll push it up.

christopherbunn marked this pull request as ready for review June 30, 2021 22:30

auto-assign bot assigned christopherbunn Jun 30, 2021

christopherbunn requested review from angela97lin, chukarsten and dsherry June 30, 2021 22:30

freddyaboulton approved these changes Jun 30, 2021

View reviewed changes

angela97lin reviewed Jul 1, 2021

View reviewed changes

christopherbunn force-pushed the 2174_pickling_search branch 2 times, most recently from b464ece to b1f58a1 Compare July 1, 2021 16:38

christopherbunn requested a review from angela97lin July 1, 2021 17:29

christopherbunn force-pushed the 2174_pickling_search branch from a725486 to e52dd22 Compare July 6, 2021 15:58

christopherbunn added 10 commits July 7, 2021 10:56

Added pickle as option to save AutoML search

c6eef38

Updated release notes

dd4d6d7

Lint fixes

299b140

Lint round 2

962382d

Covered the ValueError test case

c1352e7

Lint fixes

72da20e

Cleared out plotly go import

004b69d

Added pickle type to load

4b0e2cd

Added check for core dep

e0cc73a

Removed import

833e034

christopherbunn force-pushed the 2174_pickling_search branch from e52dd22 to 833e034 Compare July 7, 2021 14:58

christopherbunn merged commit d5b8602 into main Jul 7, 2021

chukarsten mentioned this pull request Jul 22, 2021

v0.29.0 #2536

Merged

christopherbunn mentioned this pull request Sep 15, 2021

Add new ensembler component #2653

Merged

freddyaboulton deleted the 2174_pickling_search branch May 13, 2022 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to use built-in `pickle` for saving AutoMLSearch #2463

Add ability to use built-in `pickle` for saving AutoMLSearch #2463

christopherbunn commented Jun 30, 2021

codecov bot commented Jun 30, 2021 •

edited

Loading

freddyaboulton left a comment

freddyaboulton Jun 30, 2021

christopherbunn Jul 1, 2021

angela97lin left a comment •

edited

Loading

angela97lin Jul 1, 2021

christopherbunn Jul 1, 2021

angela97lin Jul 1, 2021

freddyaboulton commented Jul 1, 2021

christopherbunn commented Jul 1, 2021

angela97lin commented Jul 1, 2021

christopherbunn commented Jul 1, 2021 •

edited

Loading

angela97lin commented Jul 1, 2021

freddyaboulton commented Jul 1, 2021

christopherbunn commented Jul 1, 2021

Add ability to use built-in pickle for saving AutoMLSearch #2463

Add ability to use built-in pickle for saving AutoMLSearch #2463

Conversation

christopherbunn commented Jun 30, 2021

codecov bot commented Jun 30, 2021 • edited Loading

Codecov Report

freddyaboulton left a comment

Choose a reason for hiding this comment

freddyaboulton Jun 30, 2021

Choose a reason for hiding this comment

christopherbunn Jul 1, 2021

Choose a reason for hiding this comment

angela97lin left a comment • edited Loading

Choose a reason for hiding this comment

angela97lin Jul 1, 2021

Choose a reason for hiding this comment

christopherbunn Jul 1, 2021

Choose a reason for hiding this comment

angela97lin Jul 1, 2021

Choose a reason for hiding this comment

freddyaboulton commented Jul 1, 2021

christopherbunn commented Jul 1, 2021

angela97lin commented Jul 1, 2021

christopherbunn commented Jul 1, 2021 • edited Loading

angela97lin commented Jul 1, 2021

freddyaboulton commented Jul 1, 2021

christopherbunn commented Jul 1, 2021

Add ability to use built-in `pickle` for saving AutoMLSearch #2463

Add ability to use built-in `pickle` for saving AutoMLSearch #2463

codecov bot commented Jun 30, 2021 •

edited

Loading

angela97lin left a comment •

edited

Loading

christopherbunn commented Jul 1, 2021 •

edited

Loading