-
-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPFR test failures on Solaris 10 update 4 on host 't2' #6453
Comments
comment:1
I should have stated the versions of the software. On t2, gcc was configured as:
kirkby@t2:[~] $ cat /etc/release kirkby@t2:[~] $ uname -a sage-4.1.alpha2, mpfr-2.4.1 with no patches applied. |
config.log showing no optimisation on the command line. Still 20 tests fail |
comment:2
Attachment: config.log One 'fix' appears to be to use a gcc no later than 4.2.4 See table below of the number of failures with each version of gcc. gcc-3.4.3 : 0 MPFR failures On my own machine, there are none no matter what version of gcc I use. At the time I first reported this, 't2' run Solaris 10 update 4. It is now running Solaris 10 update 7, but this has not changed the result. |
comment:3
Sorry, here is the data in a table.
|
comment:4
This is a compiler bug. See http://websympa.loria.fr/wwsympa/arc/mpfr/2009-07/msg00049.html.
|
comment:6
Since this only appears so far to have happened on a Solaris machine with a T2+ processor, and the patch will slow down the code on every machine, I will create a patch which is only applied on Solaris and only on the sun4v architecture. There is reason to believe it will only affect this type of processor too - not all SPARC processors. (Apparently memset is implemented differently on that). I think that's a better solution than to apply it everywhere. Leave it with me. I'll do this. Dave |
Author: Paul Zimmermann, David Kirkby |
Changed keywords from none to compiler bug |
comment:7
I've implemented Paul's patch, but in a way which should cause minimal impact on performance.
The machine 't2' is sun4v, as it has two Sun T2+ processors. Hence the patch will not be applied on more common machines like my Sun Blade 2000 with its UltraSPARC II processors, which is a sun4u system.
When we distribute binaries for Sage, I suggest they should be built on a Sun with the very first release of Solaris 10 (i.e. NOT 't2', which runs Solaris 10 update 7). I can set up an old machine for that purpose. When building binaries we should set INCLUDE_MPFR_PATCH to 1, so the patch is applied. Then the code should work, on any Solaris 10 SPARC system. (Since my Sun Blade 2000 is not sun4v, the patch would not normally be installed, but I would force it to be installed in this case). How to test.
There is a reasonably comprehensive set of messages, which are output on any Solaris system. The last few messages are specific to the machine's architecture and setting of the variable INCLUDE_MPFR_PATCH. Here's the specific last part of the message on my personal machine, which is sun4u, and so not the troublesome sun4v, so there is no need for me to patch this. (I say troublesome, but I expect this is a compiler bug).
$ export INCLUDE_MPFR_PATCH=foobar
A few others things are done in this patch.
|
comment:8
I forgot to give the location of the patch ! All relevant files are in the directory: http://sage.math.washington.edu/home/kirkby/Solaris-fixes/mpfr/ The actual patch is http://sage.math.washington.edu/home/kirkby/Solaris-fixes/mpfr/mpfr-2.4.1p0.spkg |
comment:9
I updated the package changing just some text, which appeared to have caused some confusion to a potential reviewer. In particular, I did not enable any checks in MPFR - they were already enabled. So my patch will not slow the build process. Previously I posted outputs from my own Sun Blade 2000 computer, which is a sun4u architecture and so did NOT need patching.
Here's the default output seen on 't2' with this new patch:
Notice how it differs from that on my own machine ? Here's the output from 'uname' and 'arch -k' on both my Sun Blade 2000 (kestrel) and a Sun T5240 (t2) First 't2'
now my home machine 'kestrel', which is actually a Blade 2000, not a Blade 1000 as the output says. The two machines share the same motherboard.
On either system it is possible to override the default. I've put the code in /tmp/kirkby/sage-4.1/ on 't2' and changed the permissions of all files so any user can write to them. It will allow testing by others. (Of course it could break too, if two people start testing together, but that is a risk I will take). It should be noted that were are not really any closer to solving this, as at least 4 explanations have been given by different people:
Note, although I don't believe there is any plans to support Solaris 9, I don't actually see why Sage should not work on a Solaris 9 system. A Solaris 9 system could be using the even older sun4m architecture. Although I've not checked it, the updated .spkg file should work on that too, but will not apply the patch by default. Dave |
comment:10
This looks more and more like a Solaris bug, as the following bit of code, compiled in 32-bit mode should add the number 2 raised to the power of 31 to another number of 2 raised to the power of 31 so get a total of 2 raised to the power of 32, which is 0 in 32-bit mode. But it dumps core on 't2', and not on my Sun Blade 2000 or any other non-Solaris machine for which people have tested this. [ You might wonder why I spelled out 'raised to the power of' but the formatting goes a bit crazy if I write 2^31 plus 2^31 is equal to 2^32) perhaps you see that] It also dumps core with the Sun compiler.
I still believe in the short term, the fix here should be applied. Even if it can't be guaranteed to work in every case, it is allows MPFR to build and pasts all the tests, it will allow progress to be made in Sage. The fact I have been careful to only apply this to sun4v machines, should ensure it has no impact on anywhere else. Dave |
comment:12
David: Here's an updated SPKG with all changes committed in your name: http://sage.math.washington.edu/home/mvngu/patch/mpfr-2.4.1.p0.spkg I have renamed it from |
Merged: Sage 4.1.1.alpha1 |
comment:13
Successfully compiles on Solaris/t2. All 148 tests passed. (It even builds OK on Linux.) |
Reviewer: Minh Van Nguyen |
comment:14
For the record, this is not a compiler bug, but a bug in Sun's implementation of memset() on the sun4v architecture (i.e. the CoolThreads machines). The bug has been confirmed by Sun. |
comment:15
The confirmation from Sun can be found at this sage-devel thread. |
Upstream: Reported upstream. Developers acknowledge bug. |
Changed upstream from Reported upstream. Developers acknowledge bug. to Fixed upstream, in a later stable release. |
comment:17
Oops, Sun have fixed this, not just acknowledged the bug. Hence I'm changing the 'Report Upstream' to reflect this. |
I found that when trying to build Sage on t2.math.washington.edu there
are problems with 'mpfr', with 20 out of 148 test failures.
I downloaded the mpfr 2.4.1 source, compiled that with the same gcc
optimisation level as used in Sage (-O2). Again mpfr failed 20 tests.
I then changed to an optimsation level of 1 in the MPFR source (outside sage). Again 20 tests failed.
I then used no optimsisation, which resultsed in 100% pass rate.
On my Blade 2000 (hostname kestrel), things are very different, as the following table shows.
(kestrel runs Solaris 10 update 6)
(t2 runs Solaris 10 update 4)
I assumed this problem was due to optimisation in Sage and that removing the optimisation on Solaris would solve this. But that is not the case.
The reason for the failures is still unknown.
There may be some advantage in recompiling mpir with lower optimisation, despite the fact mpir did pass all tests, since this mpfr relies upon mpir.
Upstream: Fixed upstream, in a later stable release.
CC: @sagetrac-drkirkby @zimmermann6 nguyenminh2@gmail.com
Component: porting: Solaris
Keywords: compiler bug
Author: Paul Zimmermann, David Kirkby
Reviewer: Minh Van Nguyen
Merged: Sage 4.1.1.alpha1
Issue created by migration from https://trac.sagemath.org/ticket/6453
The text was updated successfully, but these errors were encountered: