Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickling error of config function #441

Closed
SumNeuron opened this issue Mar 11, 2019 · 12 comments
Closed

Pickling error of config function #441

SumNeuron opened this issue Mar 11, 2019 · 12 comments
Labels

Comments

@SumNeuron
Copy link

In a previous Issue I showed requested support for the multiprocessing library to help launch multiple experiments.

# this works
def run_ex(print_usage=False, *args, **kwargs):
    # assumes kwargs is a strict subset of your config
    ex.run(config_updates=kwargs)
    if print_usage:
        usage = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
        print('resources used: {}'.format(usage))
    gc.collect()

# processes default to 1 as if using GPU backend, TF will likely crash
def pool_run(*args, processes=1, print_usage=False, **kwargs):
    with multiprocessing.Pool(processes=processes) as pool:
        results = pool.apply(run_ex, (print_usage, *args), kwargs)
    gc.collect()
    return results 

pool_run(print_usage=True, **options)

If I make the slight change to this:

# now I pass the experiment function so that these functions can be re-used

def run_ex(ex, print_usage=False, *args, **kwargs):
    ex.run(config_updates=kwargs)
    if print_usage:
        usage = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
        print('resources used: {}'.format(usage))
    gc.collect()

def pool_run(ex, *args, processes=1, print_usage=False, **kwargs):
    with multiprocessing.Pool(processes=processes) as pool:
        results = pool.apply(run_ex, (ex, print_usage, *args), kwargs)
    gc.collect()
    return results

pool_run(ex, print_usage=True, **options) 
# throws error

the error I get is:


PicklingError                             Traceback (most recent call last)
<ipython-input-5-2bab0a1545e1> in <module>

---> 14 pool_run(ex, processes=1, print_usage=True, **opts)


~/Projects/my/module/file.py in pool_run(ex, processes, print_usage, *args, **kwargs)
     10 def pool_run(ex, *args, processes=1, print_usage=False, **kwargs):
     11     with multiprocessing.Pool(processes=processes) as pool:
---> 12         results = pool.apply(run_ex, (ex, print_usage, *args), kwargs)
     13     gc.collect()
     14     return results

/anaconda3/lib/python3.6/multiprocessing/pool.py in apply(self, func, args, kwds)
    257         '''
    258         assert self._state == RUN
--> 259         return self.apply_async(func, args, kwds).get()
    260 
    261     def map(self, func, iterable, chunksize=None):

/anaconda3/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
    642             return self._value
    643         else:
--> 644             raise self._value
    645 
    646     def _set(self, i, obj):

/anaconda3/lib/python3.6/multiprocessing/pool.py in _handle_tasks(taskqueue, put, outqueue, pool, cache)
    422                         break
    423                     try:
--> 424                         put(task)
    425                     except Exception as e:
    426                         job, idx = task[:2]

/anaconda3/lib/python3.6/multiprocessing/connection.py in send(self, obj)
    204         self._check_closed()
    205         self._check_writable()
--> 206         self._send_bytes(_ForkingPickler.dumps(obj))
    207 
    208     def recv_bytes(self, maxlength=None):

/anaconda3/lib/python3.6/multiprocessing/reduction.py in dumps(cls, obj, protocol)
     49     def dumps(cls, obj, protocol=None):
     50         buf = io.BytesIO()
---> 51         cls(buf, protocol).dump(obj)
     52         return buf.getbuffer()
     53 

PicklingError: Can't pickle <function config at 0x1c35b18ea0>: it's not the same object as my.module.experiment.config

@JarnoRFB
Copy link
Collaborator

Could you provide a minimal complete script of the not working example including your ex setup? I thought I had successfully forked processes passing the experiment object...

@SumNeuron
Copy link
Author

@SumNeuron
Copy link
Author

@JarnoRFB any ideas?

@Qwlouse
Copy link
Collaborator

Qwlouse commented Mar 21, 2019

This problem is probably related to the fact that wrapt functions cannot be pickled. Since sacred uses wrapt internally for the captured functions, that will make an experiment (or even captured functions) unpicklable.

@SumNeuron
Copy link
Author

@Qwlouse ok... do you have suggestion for another approach for pool_run then? That functionality is something I would like to compartmentalize away from writing of an experiment itself, so having to have the ex in scope is a bit of an annoyance

@SumNeuron
Copy link
Author

@Qwlouse or could Sacred not use wrapt functions? I want to combine ray tune with sacred and that requires a pickleable function...

@flukeskywalker
Copy link

Sacred works very well with ray tune. You can import the experiment inside the trainable function, apply config updates provided by tune, and then run the experiment.

@SumNeuron
Copy link
Author

@flukeskywalker can you provide an example / colab?

@flukeskywalker
Copy link

Here's an example adapted from code I've used successfully, assuming train.py is a regular sacred experiment file in the same directory that constructs an experiment ex. I don't use the tune reporter since reporting is done through sacred.

Note: the sleep line is a hack I use to prevent multiple experiments from being assigned the same ID. This issue should no longer exist according to sacred devs, but somehow I still faced it so I use this simple fix.

import ray
import ray.tune as tune
from sacred.observers import MongoObserver


def train(config, reporter):
    import time, random
    time.sleep(random.uniform(0.0, 10.0))
    from train import ex
    ex.observers.append(MongoObserver.create(db_name='my_db'))
    config['verbose'] = False
    ex.run(config_updates=config)
    result = ex.current_run.result
    print(f'Types of result is {type(result)}')
    reporter(result=result, done=True)

ray.init(num_cpus=64, num_gpus=0)

if __name__ == '__main__':
    tune.register_trainable("train_func", train)
    tune.run_experiments({
        'my_experiment': {
            'run': 'train_func',
            'stop': {'result': 1000},
            'config': {
                'n_layers': tune.grid_search([5, 6]),
                'batch_size': tune.grid_search([256, 1024, 2048]),
            },
            'resources_per_trial': {"cpu": 1, "gpu": 0},
            'num_samples': 10,
        }
    })

@SumNeuron
Copy link
Author

@flukeskywalker so I made a simple project based on your guidance and it works! Thanks :) However it only works if everything is in the same level module.

Here is a more "complete" project setup, and currently tune isn't running.

https://gitlab.com/SumNeuron/extune

please take a look

@flukeskywalker
Copy link

Sorry, I don't have time to look into the code these days. My guess is that this is related to the module being in the Python path for each ray actor, but that's all I can say at this point.

I personally do not prefer organizing code the way you are doing in extune. I usually have the Sacred experiment and the tune script(s) in the project's base dir. This is simple and avoids another level of organization. Of course, you may like things differently.

@stale
Copy link

stale bot commented May 8, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants