Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in parallelized computations involving symbolic functions #27492

Open
egourgoulhon opened this issue Mar 15, 2019 · 22 comments
Open

Bug in parallelized computations involving symbolic functions #27492

egourgoulhon opened this issue Mar 15, 2019 · 22 comments

Comments

@egourgoulhon
Copy link
Member

Sage fails in parallelized computations on tensor fields involving symbolic functions (without any symbolic function in the tensor components, everything is OK). Here is an example with Sage 9.3.beta6 (... indicates truncated output, see the attachment for the full log):

sage: Parallelism().set(nproc=2)                                                                    
sage: M = Manifold(2, 'M')                                                                          
sage: X.<x,y> = M.chart()                                                                           
sage: t = M.tensor_field(0, 2)                                                                      
sage: t[0,0], t[0,1], t[1,1] = function('f')(x), x+y, x-y                                           
sage: v = M.vector_field(1 + x*y, -x^2)                                                             
sage: s = t.contract(v)  # parallelized computation occurs here
---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/eric/sage/9.3.develop/local/lib/python3.8/site-packages/sage/interfaces/interface.py", line 718, in __init__
    self._name = parent._create(value, name=name)
  File "/home/eric/sage/9.3.develop/local/lib/python3.8/site-packages/sage/interfaces/maxima_lib.py", line 604, in _create
    self.set(name, value)
...
  File "sage/libs/ecl.pyx", line 352, in sage.libs.ecl.ecl_safe_funcall (build/cythonized/sage/libs/ecl.c:5735)
    raise RuntimeError("ECL says: {}".format(
RuntimeError: ECL says: THROW: The catch MACSYMA-QUIT is undefined.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "sage/misc/fpickle.pyx", line 104, in sage.misc.fpickle.call_pickled_function (build/cythonized/sage/misc/fpickle.c:2384)
    res = eval("f(*args, **kwds)",sage.all.__dict__, {'args':args, 'kwds':kwds, 'f':f})
  File "<string>", line 1, in <module>
  File "/home/eric/sage/9.2.develop/local/lib/python3.7/site-packages/sage/tensor/modules/comp.py", line 2329, in make_Contraction
    sm += this[[ind_s]] * other[[ind_o]]
...
 File "/home/eric/sage/9.3.develop/local/lib/python3.8/site-packages/sage/interfaces/interface.py", line 720, in __init__
    raise TypeError(x)
TypeError: ECL says: THROW: The catch MACSYMA-QUIT is undefined.
"""

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
<ipython-input-9-d1a8f1c47ea1> in <module>
----> 1 s = t.contract(v)  # parallelized computation occurs here
...
/usr/lib/python3.8/multiprocessing/pool.py in next(self, timeout)
    866         if success:
    867             return value
--> 868         raise value
    869 
    870     __next__ = next                    # XXX

TypeError: ECL says: THROW: The catch MACSYMA-QUIT is undefined.

The full error message is attached.

This issue has been reported in various places previously. With old Python-2 Sage it resulted in a silent error, see
this sage-devel post (see also this one).

CC: @man74cio @rwst @mkoeppe @tscrim

Component: symbolics

Keywords: parallelization, symbolic functions

Issue created by migration from https://trac.sagemath.org/ticket/27492

@egourgoulhon
Copy link
Member Author

comment:1

Attachment: error_message_py3.txt

@embray
Copy link
Contributor

embray commented Mar 25, 2019

comment:2

Ticket retargeted after milestone closed (if you don't believe this ticket is appropriate for the Sage 8.8 release please retarget manually)

@embray embray modified the milestones: sage-8.7, sage-8.8 Mar 25, 2019
@embray
Copy link
Contributor

embray commented Jun 14, 2019

comment:3

As the Sage-8.8 release milestone is pending, we should delete the sage-8.8 milestone for tickets that are not actively being worked on or that still require significant work to move forward. If you feel that this ticket should be included in the next Sage release at the soonest please set its milestone to the next release milestone (sage-8.9).

@embray embray removed this from the sage-8.8 milestone Jun 14, 2019
@nbruin
Copy link
Contributor

nbruin commented Oct 27, 2019

comment:4

To help you along with this: it's probably a matter that the existence of the symbolic function 'f' isn't properly coordinated across the inter-process communication, so that a new symbolic function, also called 'f' is created at some point:

sage: a_new=s0.operands()[2]
sage: a
f(x)
sage: a_new
f(x)
sage: a - a
0
sage: a - a_new #this indicates something fishy
-f(x) + f(x)
sage: bool( a==a_new)
True
sage: s0.subs({a_new: a})
-x^3 - (x^2 - x*f(x))*y + f(x)
sage: s0.subs({a_new: a}).coefficient(a,1)
x*y + 1

If may pickling that's (part of) the problem:

sage: function('f')(x)-function('f')(x)
0
sage: function('f')(x)-SR('f(x)')
0
sage: loads(dumps(SR('f(x)')))-SR('f(x)')
0
sage: function('f')(x)-SR('f(x)')
f(x) - f(x)
sage: loads(dumps(function('f')(x)))-function('f')(x)
-f(x) + f(x)
sage: loads(dumps(function('f')(x)))-SR('f(x)')
0

as you can see, the different ways of creating a symbolic function start out as giving the same result. However, as soon as pickling has been involved, something changes, and results are not compatible any more. from that point onwards, SR('f(x)') seems to produce results consistent with pickling, but function('f')(x) is reliably not compatible (and in fact, further investigation shows that function('f')(x) across different invocations of pickle returns unidentical results. So there's some naste cache wiping/corruption going on somewhere.

You can see part of it here:

sage: explain_pickle(dumps(function('f')(x)))
pg_Expression = unpickle_global('sage.symbolic.expression', 'Expression')
si = unpickle_newobj(pg_Expression, ())
unpickle_build(si, (0r, ['x'], 'GARC\x03\tfunction\x00class\x00symbol\x00x\x00name\x00seq\x00python\x00f\x00sage_ex\x00\x01\x08\x01\x02\x02\n\x02"\x03\x04\n\x00+\x001\x00"\x07'))
si

As you can see, there's a rather opague string involved. The unpickle_build is part of the getstate/setstate protocol, and reading sage.symbolic.expression.Expression.__setstate__ and sage.symbolic.expression.Expression.__getstate__ shows you that a Pynac archive is involved. It's not hard to imagine the coordination problems that can arise when using such low-level tools. This needs a pynac expert. See pynac/pynac#349 (if github is preferred for tracking pynac)

@egourgoulhon

This comment has been minimized.

@DeRhamSource
Copy link
Mannequin

DeRhamSource mannequin commented Nov 21, 2019

comment:7

Since false results are produced, it seems to be a severe bug. On this, I'd suggest either to solve it for milestone 9 or at least to disable the feature or give it a warning in the documentation until a solution is found.

@egourgoulhon
Copy link
Member Author

comment:8

Replying to @DeRhamSource:

Since false results are produced, it seems to be a severe bug. On this, I'd suggest either to solve it for milestone 9 or at least to disable the feature or give it a warning in the documentation until a solution is found.

Fortunately, with Python 3 (and hence Sage 9.0), there is no longer any silent error: a TypeError exception is raised instead. So there is no need to disable a feature (parallelization) that is massively used in tensor calculus on manifolds (see e.g. many of these examples).

@DeRhamSource
Copy link
Mannequin

DeRhamSource mannequin commented Nov 21, 2019

comment:9

Ah I see. And in conclusion, python2 will not even be supported anymore?

Sorry for interfering then. :)

@egourgoulhon
Copy link
Member Author

comment:10

Replying to @DeRhamSource:

Ah I see. And in conclusion, python2 will not even be supported anymore?

Python 2 is almost dead: https://pythonclock.org/

@mkoeppe
Copy link
Contributor

mkoeppe commented Aug 1, 2020

comment:13

What about this ticket?

@mjungmath
Copy link

comment:14

This bug is really annoying in terms of applications. You need parallelization to speed things up, especially in this case. I thought maybe you have some ideas to contribute. If not, my apologies.

@mkoeppe
Copy link
Contributor

mkoeppe commented Aug 1, 2020

comment:15

Could you try to update & clarify the ticket description please? It's OK to delete everything related to python2. We no longer support it.

@egourgoulhon

This comment has been minimized.

@egourgoulhon
Copy link
Member Author

Attachment: error_message_Sage_9.2.beta6.txt

@mkoeppe
Copy link
Contributor

mkoeppe commented Aug 4, 2020

comment:17

Well, there are lots of libraries involved in symbolics, so lots of questions.

I guess the first question to ask is whether the parallelization that you use involves several threads in one process, or several processes.

The reported error comes from Maxima, which runs in ECL. Is ECL prepared and configured for threadsafe operation? Are the parts of Maxima that the symbolics code uses threadsafe? (The assumptions code that I touched in #30074 is definitely not threadsafe.)

@egourgoulhon
Copy link
Member Author

comment:18

See #31047 for a related issue.

@egourgoulhon

This comment has been minimized.

@egourgoulhon
Copy link
Member Author

comment:20

Replying to @mkoeppe:

Could you try to update & clarify the ticket description please? It's OK to delete everything related to python2. We no longer support it.

Done.

@egourgoulhon
Copy link
Member Author

comment:21

Replying to @mkoeppe:

Well, there are lots of libraries involved in symbolics, so lots of questions.

I guess the first question to ask is whether the parallelization that you use involves several threads in one process, or several processes.

The parallelization is based on the Python module multiprocessing, so it involves several processes.

The reported error comes from Maxima, which runs in ECL. Is ECL prepared and configured for threadsafe operation? Are the parts of Maxima that the symbolics code uses threadsafe? (The assumptions code that I touched in #30074 is definitely not threadsafe.)

There is no error when the involved symbolic expressions do not contain any symbolic function, so I would say that is not an issue with Maxima and parallelization, but rather an issue with Sage's symbolic functions (cf. comment:4 and #31047).

@egourgoulhon

This comment has been minimized.

@egourgoulhon
Copy link
Member Author

Attachment: error_message_Sage_9.3.beta6.txt

@dexsda
Copy link

dexsda commented Oct 4, 2024

This bug is still afflicting parallel computations, which is a significant drawback. Has there been any progress on this at all?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants