Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel sampling with Theano Op fails on MacOS 10.15 #4085

Closed
gmingas opened this issue Sep 7, 2020 · 9 comments
Closed

Parallel sampling with Theano Op fails on MacOS 10.15 #4085

gmingas opened this issue Sep 7, 2020 · 9 comments

Comments

@gmingas
Copy link
Contributor

gmingas commented Sep 7, 2020

Hi,

I am running the piece of code shown below on MacOS 10.15.6 with Python 3.6.11 and the latest version of the master branch in the pymc3-dev.

It is a toy example of using a Theano Op to compute a forward model and then use the output as the mean of a multivariate Normal.

I get the error shown under the code. When I sample with cores=1 in sample() the error disappears and sampling happens normally.

I played a bit with it and when I rebased the code, removing the commits from PR #3991, the error stops happening. Not sure which part of the changes breaks this and it might happen in MacOS only (a collaborator who works in Linux does not have the same issue).

This might be connected to #4053 #3844 or #3140 but it was not clear it is the same issue so decided to post separately.

Code:

import numpy as np
import theano.tensor as tt
import pymc3 as pm

if __name__ == '__main__':    
    size = 50
    true_intercept = 1
    true_slope = 2
    sigma = 1
    x = np.linspace(0, 1, size)    
    true_regression_line = true_intercept + true_slope * x    
    y = true_regression_line + np.random.normal(0, sigma ** 2, size)
    s = np.identity(y.shape[0])
    np.fill_diagonal(s, sigma ** 2)

    # MCMC parameters
    ndraws = 300
    ntune = 100
    nsub = 5
    nchains = 4
    seed = 98765

    class ForwardModel1(tt.Op):
        itypes = [tt.dvector]
        otypes = [tt.dvector]

        def __init__(self, x):
            self.x = x

        def perform(self, node, inputs, outputs):
            intercept = inputs[0][0]
            x_coeff = inputs[0][1]
            temp = intercept + x_coeff * self.x
            outputs[0][0] = temp

    with pm.Model():
        Sigma_e = pm.Data('Sigma_e', s)
        
        intercept = pm.Normal('Intercept', 0, sigma=20)
        x_coeff = pm.Normal('x', 0, sigma=20)
        theta = tt.as_tensor_variable([intercept, x_coeff])

        f = ForwardModel1(x)
        
        likelihood = pm.MvNormal('y', mu=f(theta), cov=Sigma_e, observed=y)
        
        trace = pm.sample(draws=ndraws, step=pm.Metropolis(),
                          chains=nchains, tune=ntune,
                          discard_tuned_samples=True,
                          random_seed=seed)

Error/traceback:

Traceback (most recent call last):
  File "/Users/gmingas/projects/pymc3/pymc3/parallel_sampling.py", line 114, in _unpickle_step_method
    self._step_method = pickle.loads(self._step_method)
AttributeError: Can't get attribute 'ForwardModel1' on <module '__mp_main__' from '/Users/gmingas/projects/pymc3/pymc3/examples/test.py'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/gmingas/projects/pymc3/pymc3/parallel_sampling.py", line 135, in run
    self._unpickle_step_method()
  File "/Users/gmingas/projects/pymc3/pymc3/parallel_sampling.py", line 116, in _unpickle_step_method
    raise ValueError(unpickle_error)
ValueError: The model could not be unpickled. This is required for sampling with more than one core and multiprocessing context spawn or forkserver.
"""

The above exception was the direct cause of the following exception:

ValueError: The model could not be unpickled. This is required for sampling with more than one core and multiprocessing context spawn or forkserver.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "test.py", line 64, in <module>
    random_seed=seed)
  File "/Users/gmingas/projects/pymc3/pymc3/sampling.py", line 540, in sample
    trace = _mp_sample(**sample_args, **parallel_args)
  File "/Users/gmingas/projects/pymc3/pymc3/sampling.py", line 1461, in _mp_sample
    for draw in sampler:
  File "/Users/gmingas/projects/pymc3/pymc3/parallel_sampling.py", line 492, in __iter__
    draw = ProcessAdapter.recv_draw(self._active)
  File "/Users/gmingas/projects/pymc3/pymc3/parallel_sampling.py", line 365, in recv_draw
    raise error from old_error
RuntimeError: Chain 0 failed.
@AlexAndorra
Copy link
Contributor

Hi @gmingas, and thanks for reporting! This looks like a familiar issue but I thought it appeared with python 3.8 🤦
Did you try changing the context with the new mp_ctx kwarg in sample: mp_ctx='forkserver' (the default on MacOS is now spawn, IIRC).
I'm also pinging @aseyboldt, as this could be a bug, since the error stops when you remove the commits from PR #3991

@aseyboldt
Copy link
Member

You need to move ForwardModel1 outside of the if __name__ == '__main__', so that the definition is available when unpicking the model in the worker processes.
On Windows and Mac this is expected behavior unfortunately, since starting multiprocessing workers using fork is not recommended on mac anymore.
I'm closing this, but if the change I suggested doesn't fix it, we should reopen.

@gmingas
Copy link
Contributor Author

gmingas commented Sep 9, 2020

Moving ForwardModel1 does work indeed (while changing mp_ctx does not).

Nevertheless, when I run the same code in a Jupyter notebook, I get the same error (despite having moved ForwardModel1). From looking online it seems like this is a known issue with multiprocessing with a known workaround (see here) - although it is supposed to happen only on Windows and not Mac.

But since it is desirable in the pymc3 notebooks folder to have notebooks that contain all the code without importing from other files, is there a different workaround that would allow me to keep all the code inside my notebook and still run on a Mac?

@aseyboldt
Copy link
Member

aseyboldt commented Sep 9, 2020 via email

@gmingas
Copy link
Contributor Author

gmingas commented Sep 9, 2020

Thanks a lot @aseyboldt and @AlexAndorra for the comments, they were really helpful. I will try what you recommended.

@axiezai
Copy link

axiezai commented Mar 3, 2021

Hi everyone,

I am running into the same issue on MacOS Catalina, Python version 3.7.9 and pymc3 version 3.11.0:

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
ValueError: The model could not be unpickled. This is required for sampling with more than one core and multiprocessing context spawn or forkserver.

I tried the suggested fixes here, using mp_ctx="fork" or mp_ctx="forkserver", also trying pickle_backend="dill" with the multiprocessing context. However, none of these solves the issue, and I keep getting the same exception message.

Is the only fix at this point to try my script on a Linux machine? Thanks in advance for your time and help.

@gmingas
Copy link
Contributor Author

gmingas commented Mar 4, 2021

The only thing that worked for me in MacOS is described in my post above (i.e. moving ForwardModel1 outside the if __name__ == '__main__' in a script rather than in a notebook). Did you try setting mp_ctx="spawn"?

Tagging @mikkelbue

@axiezai
Copy link

axiezai commented Mar 4, 2021

Ah yes, I transferred over to a script which resolved the error, I guess avoid notebooks with stuff that compiles? Thank you for the help.

@mikkelbue
Copy link
Contributor

Responding to tag from @gmingas
I'm sorry, I haven't seen this error before, and I can claim that I understand it. Something you define in your Op is sensitive to pickling. It is hard to tell without the context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants