Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add patches for LLVM 3.8.0 #15417

Merged
merged 3 commits into from
Apr 22, 2016
Merged

Add patches for LLVM 3.8.0 #15417

merged 3 commits into from
Apr 22, 2016

Conversation

tkelman
Copy link
Contributor

@tkelman tkelman commented Mar 9, 2016

I think everything in llvm-3.7.1.patch and llvm-3.7.1_2.patch made it upstream for 3.8.0, aside from the DLL Makefile fix. Will have to test with cmake + shared libraries on Windows too.

I'm borrowing a patch from Rust to fix compilation with the win32-threads mingw-w64 variant that we've been using, ref rust-lang/rust#30448 (comment) and rust-lang/llvm@69ef168. If we do want to use of the parallel codegen support in LLVM (when possible) that this patch is disabling, we can make the patch Windows-only.

nalimilan referenced this pull request Mar 9, 2016
And make ORC JIT the default for those LLVM versions. This does
not activate the new LLVM versions by default yet, to give time
to update everything on CI, etc., but doing the actual activation
is a one line change after this.
@yuyichao
Copy link
Contributor

yuyichao commented Mar 9, 2016

Also http://reviews.llvm.org/D17165 or I believe the backtrace tests in replutil will fail.

@tkelman
Copy link
Contributor Author

tkelman commented Mar 9, 2016

Haven't tried this anywhere other than win32 yet. If you want to push that patch here before I can get to it, be my guest.

@nalimilan
Copy link
Member

Indeed, I confirm the replutil test fails with vanilla LLVM 3.8. Currently retrying with the patch.

@tkelman
Copy link
Contributor Author

tkelman commented Mar 10, 2016

Confirmed on Linux, patch added. Somebody should email LLVM's release manager and ask for D17165 (rL260791) to be backported to the release_38 branch so it makes it into 3.8.1.

@nalimilan
Copy link
Member

RPM builds went fine with the patch (except for one random failure).

@tkelman
Copy link
Contributor Author

tkelman commented Mar 10, 2016

Win64 actually segfaults in the subarray test, so something is wrong here. Happens with or without the D17165 patch though.

@yuyichao
Copy link
Contributor

Is it repeatable? Reproducible with single test? Possible to get a backtrace? =)

@tkelman
Copy link
Contributor Author

tkelman commented Mar 10, 2016

Yes, yes, taking forever under gdb so don't know yet.

@tkelman
Copy link
Contributor Author

tkelman commented Mar 11, 2016

Is it expected that something that takes 10 minutes outside gdb would take 12 hours and counting inside it? #14846 maybe?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 11, 2016

yes, that's a known bug in gdb

@tkelman
Copy link
Contributor Author

tkelman commented Mar 12, 2016

Does LLDB work on Windows or should I delta-debug test/subarray.jl outside of gdb to get a smaller repro?

@yuyichao
Copy link
Contributor

Maybe you can "bisect" the test to see if there's a smaller repro.

@tkelman
Copy link
Contributor Author

tkelman commented Mar 12, 2016

That's what delta debugging is. The technique predates bisecting and is where git et al got the idea from.

@JeffBezanson
Copy link
Sponsor Member

If you don't need to debug jitted code, you can patch around the gdb performance problem with

--- a/src/jitlayers.cpp
+++ b/src/jitlayers.cpp
@@ -187,7 +187,7 @@ void NotifyDebugger(jit_code_entry *JITCodeEntry)
     }
     __jit_debug_descriptor.first_entry = JITCodeEntry;
     __jit_debug_descriptor.relevant_entry = JITCodeEntry;
-    __jit_debug_register_code();
+    //__jit_debug_register_code();
 }

@tkelman
Copy link
Contributor Author

tkelman commented Mar 13, 2016

Turned out not too bad to reduce it:

$ usr/bin/julia-debug -e 'versioninfo()'
Julia Version 0.5.0-dev+3133
Commit c1e81c66* (2016-03-13 12:47 UTC)
DEBUG build
Platform Info:
  System: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.8.0 (ORCJIT, skylake)

Tony@LAPTOP-O230JCFF ~/julia
$ echo 'sub(sub(reshape(1:13^3, 13, 13, 13), 3:7, 6, :), 1:2:5, :, 1:2:5)' > repro.jl     
Tony@LAPTOP-O230JCFF ~/julia
$ gdb --args usr/bin/julia-debug repro.jl
GNU gdb (GDB) (Cygwin 7.10.1-1) 7.10.1
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-cygwin".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from usr/bin/julia-debug...done.
(gdb) r
Starting program: /home/Tony/julia/usr/bin/julia-debug repro.jl
[New Thread 732.0x1870]
[New Thread 732.0x1b2c]
[New Thread 732.0x2744]
[New Thread 732.0x2974]
[New Thread 732.0x2a40]
[New Thread 732.0x1358]
[New Thread 732.0x22d8]
[New Thread 732.0x1acc]
[New Thread 732.0x1748]
[New Thread 732.0x28c0]
[New Thread 732.0xf8]

Program received signal SIGSEGV, Segmentation fault.
0x0000000322ce0515 in julia_reindex_1190 (V=..., idxs=..., subidxs=...)
    at essentials.jl:25
25      essentials.jl: No such file or directory.
(gdb) bt
#0  0x0000000322ce0515 in julia_reindex_1190 (V=..., idxs=..., subidxs=...)
    at essentials.jl:25
#1  julia_reindex_1190 (V=..., idxs=..., subidxs=...) at subarray.jl:130
#2  0x0000000322ce0632 in jlcall_reindex_1190 ()
#3  0x000000006ea10812 in jl_call_method_internal (meth=0x1065c70e0, args=0x146e000,
    nargs=4) at /home/Tony/julia/src/julia_internal.h:69
#4  0x000000006ea16bc8 in jl_apply_generic (args=0x146e000, nargs=4)
    at /home/Tony/julia/src/gf.c:1853
#5  0x0000000322cc635a in julia_sub_1177 (V=..., I...=<optimized out>) at subarray.jl:91
#6  julia_sub_1177 (V=..., I...=0x1072372d0) at subarray.jl:193
#7  0x000000006ea10812 in jl_call_method_internal (meth=0x106d6b920, args=0x146e210,
    nargs=5) at /home/Tony/julia/src/julia_internal.h:69
#8  0x000000006ea16bc8 in jl_apply_generic (args=0x146e210, nargs=5)
    at /home/Tony/julia/src/gf.c:1853
#9  0x000000006ea2820d in do_call (args=0x106e870d0, nargs=5, locals=0x0, nl=0,
    ngensym=0) at /home/Tony/julia/src/interpreter.c:65
#10 0x000000006ea28acb in eval (e=0x106ee0ff0, locals=0x0, nl=0, ngensym=0)
    at /home/Tony/julia/src/interpreter.c:185
#11 0x000000006ea27e38 in jl_interpret_toplevel_expr (e=0x106ee0ff0)
    at /home/Tony/julia/src/interpreter.c:25
#12 0x000000006ea4735d in jl_toplevel_eval_flex (e=0x106ee0ef0, fast=1)
    at /home/Tony/julia/src/toplevel.c:541
#13 0x000000006ea1b700 in jl_parse_eval_all (
    fname=0x106e86ed0 "C:\\cygwin64\\home\\Tony\\julia\\repro.jl", len=36, content=0x0,
    contentlen=0) at /home/Tony/julia/src/ast.c:784
#14 0x000000006ea4755a in jl_load (
    fname=0x106e86ed0 "C:\\cygwin64\\home\\Tony\\julia\\repro.jl", len=36)
    at /home/Tony/julia/src/toplevel.c:579
#15 0x000000006ea475b1 in jl_load_ (str=0x106eabc90)
    at /home/Tony/julia/src/toplevel.c:585
#16 0x00000003228b8051 in julia_include_649 (
    fname=<error reading variable: Cannot access memory at address 0x1>) at boot.jl:264
#17 0x000000006ea10812 in jl_call_method_internal (meth=0x103548dd0, args=0x146efe0,
    nargs=2) at /home/Tony/julia/src/julia_internal.h:69
#18 0x000000006ea16bc8 in jl_apply_generic (args=0x146efe0, nargs=2)
    at /home/Tony/julia/src/gf.c:1853
#19 0x0000000322a57a30 in julia_include_from_node1_850 (_path=...) at loading.jl:417
#20 0x0000000322a57da0 in jlcall_include_from_node1_850 ()
#21 0x000000006ea10812 in jl_call_method_internal (meth=0x105936cc0, args=0x146f440,
    nargs=2) at /home/Tony/julia/src/julia_internal.h:69
---Type <return> to continue, or q <return> to quit---
#22 0x000000006ea16bc8 in jl_apply_generic (args=0x146f440, nargs=2)
    at /home/Tony/julia/src/gf.c:1853
#23 0x00000003228d1cff in julia_process_options_577 (opts=...) at client.jl:262
#24 0x00000003228d4e2c in julia__start_575 () at client.jl:318
#25 0x00000003228d5b2d in jlcall.start_575 ()
#26 0x000000006ea10812 in jl_call_method_internal (meth=0x105b55ae0, args=0x146fbd0,
    nargs=1) at /home/Tony/julia/src/julia_internal.h:69
#27 0x000000006ea16bc8 in jl_apply_generic (args=0x146fbd0, nargs=1)
    at /home/Tony/julia/src/gf.c:1853
#28 0x000000000040165c in jl_apply (args=0x146fbd0, nargs=1)
    at /home/Tony/julia/ui/../src/julia.h:1263
#29 0x00000000004029ed in true_main (argc=1, argv=0x4639f38)
    at /home/Tony/julia/ui/repl.c:544
#30 0x000000000040305c in wmain (argc=1, argv=0x4639f38, envp=0x4638350)
    at /home/Tony/julia/ui/repl.c:656
#31 0x000000000040140c in __tmainCRTStartup ()
    at /usr/src/debug/mingw64-x86_64-runtime-4.0.5-1/crt/crtexe.c:329
#32 0x000000000040153b in mainCRTStartup ()
    at /usr/src/debug/mingw64-x86_64-runtime-4.0.5-1/crt/crtexe.c:212

@tkelman
Copy link
Contributor Author

tkelman commented Mar 13, 2016

Does not segfault at -O1. @code_llvm output for -O1 and default (-O2) here: https://gist.github.com/0582c044eaeefd78c503

@yuyichao
Copy link
Contributor

Anything holding back this PR? There has been frequent breakage on llvm 3.8 recently and while we may not be able to use llvm 3.8 by default due to windows (or whatever) issues, having this merged will make testing easier.

@tkelman
Copy link
Contributor Author

tkelman commented Mar 31, 2016

Doesn't hurt to merge it I suppose. I'll test again to see whether the segfault is still an issue.

edit: doesn't build right now

/home/Tony/julia/src/debuginfo.cpp: In member function ‘virtual void JuliaJITEventListener::_NotifyObjectEmitted(const llvm::object::ObjectFile&, const llvm::object::ObjectFile&, const llvm::RuntimeDyld::LoadedObjectInfo&)’:
/home/Tony/julia/src/debuginfo.cpp:340:53: error: request for member ‘get’ in ‘Section.llvm::object::content_iterator<content_type>::operator-><llvm::object::SectionRef>()->llvm::object::SectionRef::getAddress()’, which is of non-class type ‘uint64_t {aka long long unsigned int}’
                 SectionAddr = Section->getAddress().get();
                                                     ^
Makefile:99: recipe for target 'debuginfo.o' failed

@tkelman
Copy link
Contributor Author

tkelman commented Apr 2, 2016

12c18c9 fixes the build failure, but sub(sub(reshape(1:13^3, 13, 13, 13), 3:7, 6, :), 1:2:5, :, 1:2:5) still segfaults.

@yuyichao
Copy link
Contributor

yuyichao commented Apr 8, 2016

The debug info patch should be replaced/combined with the one in http://reviews.llvm.org/D18583 .

@tkelman tkelman force-pushed the tk/llvm3.8patches branch 2 times, most recently from 376436b to 72d2739 Compare April 12, 2016 06:13
@tkelman
Copy link
Contributor Author

tkelman commented Apr 12, 2016

Okay, done but do double-check. Needs a dependency since the patches touch the same files. Can you contact the LLVM release manager and ask for these things to be backported to the release_38 branch so they are included in 3.8.1?

The Windows segfault looks like it may have gone away so we might actually be able to upgrade if we want. Just have to prepare the binaries for Windows and OSX CI. nevermind, it's still there, my Make.user wasn't right

@yuyichao
Copy link
Contributor

Needs a dependency since the patches touch the same files.

Might actually be easier to merge the two. Otherwise LGTM.

Can you contact the LLVM release manager and ask for these things to be backported to the release_38 branch so they are included in 3.8.1?

Not sure about the right procedure and I thought @Keno have already done that?

@Keno
Copy link
Member

Keno commented Apr 12, 2016

I never made a formal request, so it probably got list. Do note that D17165 got reverted and replaced by a different patch.

@Keno
Copy link
Member

Keno commented Apr 12, 2016

Nevermind, the patch that replaces it is D18583, which is the second patch here.

@yuyichao yuyichao mentioned this pull request Apr 13, 2016
@@ -414,7 +414,7 @@ LLVM_FLAGS += --enable-libcpp
endif # USE_LIBCPP
ifeq ($(OS), WINNT)
LLVM_FLAGS += --with-extra-ld-options="-Wl,--stack,8388608" LDFLAGS=""
LLVM_CPPFLAGS += -D__USING_SJLJ_EXCEPTIONS__ -D__CRT__NO_INLINE
LLVM_CPPFLAGS += -D__USING_SJLJ_EXCEPTIONS__ -D__CRT__NO_INLINE -DMINGW_HAS_SECURE_API=1
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this for? according to google, this flag will cause the addition of a dependency on msvcrt80+

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

citation needed. the 32 bit mingw-w64 compilers used to be built without the secure api enabled by default in the cygwin packaging. that's been fixed now, but requires updating the cygwin packages on the buildbots before we can remove this. @staticfloat what's the process for updating cygwin on the windows buildbots?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the so-called secure api is a Windows Vista / msvcrt80+ feature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's implemented in mingw-w64's runtime. Only works on Vista or newer, but LLVM's unit tests are now assuming it's present without checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't you logged in through cygwin's ssh though? to really update everything, you need to update cygwin without any cygwin applications running

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly enough, cygwin will kill your ssh session, and continue the update. I then restart the server through Horizon, and it seems to work just fine. A cleaner solution is to login over RDP and do everything in cmd.exe of course, but that's a much bigger hassle because it requires you to have the auto-generated windows password written down, etc...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clever cygwin. works for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did that manage to break codesigning? the win32 buildbot apparently needs to be woken back up. what is Horizon and could you send me credentials via email so I can do some of these remote admin things?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming at you over email.

remove secure api flag that isn't needed if cygwin is up to date
@tkelman
Copy link
Contributor Author

tkelman commented Apr 22, 2016

I concatenated the two patches together, hopefully that'll work. Still help wanted on the win64 subarray segfault, but if this is green then it shouldn't hurt anything to merge it.

@yuyichao
Copy link
Contributor

For the segfault, the backtrace suggests that it happens in reindex and not sub directly. It might be helpful to "inline" the sub method to further reduce the test case.

@tkelman
Copy link
Contributor Author

tkelman commented Apr 22, 2016

There's also a new segfault in bitarray:

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x3b57e5e1 -- [inline] at .\bitarray.jl:1936
map at .\tuple.jl:84
while loading C:\cygwin64\home\Tony\julia\test\bitarray.jl, in expression starting on line 1198
[inline] at .\bitarray.jl:1936
map at .\tuple.jl:84
[inline] at .\bitarray.jl:1936
map at .\tuple.jl:84
unknown function (ip: 000000003B57E2A0)
[inline] at /home/Tony/julia/src/home/Tony/julia/src\julia_internal.h:69
jl_call_method_internal at /home/Tony/julia/src/home/Tony/julia/src\gf.c:1430
cat at .\bitarray.jl:1936
[inline] at /home/Tony/julia/src/home/Tony/julia/src\julia_internal.h:69
jl_call_method_internal at /home/Tony/julia/src/home/Tony/julia/src\gf.c:1430
[inline] at /home/Tony/julia/src/home/Tony/julia/src\julia.h:1339
jl_apply at /home/Tony/julia/src/home/Tony/julia/src\builtins.c:505
check_bitop at C:\cygwin64\home\Tony\julia\test\bitarray.jl:14
[inline] at /home/Tony/julia/src/home/Tony/julia/src\julia_internal.h:69
jl_call_method_internal at /home/Tony/julia/src/home/Tony/julia/src\gf.c:1430
do_call at /home/Tony/julia/src/home/Tony/julia/src\interpreter.c:58
eval at /home/Tony/julia/src/home/Tony/julia/src\interpreter.c:181
jl_toplevel_eval_flex at /home/Tony/julia/src/home/Tony/julia/src\toplevel.c:535
jl_parse_eval_all at /home/Tony/julia/src/home/Tony/julia/src\ast.c:794
[inline] at .\essentials.jl:82
include_string at .\loading.jl:379
unknown function (ip: 000000000166B143)
[inline] at /home/Tony/julia/src/home/Tony/julia/src\julia_internal.h:69
jl_call_method_internal at /home/Tony/julia/src/home/Tony/julia/src\gf.c:1430
include_from_node1 at .\loading.jl:428
unknown function (ip: 00000000233CD54F)
[inline] at /home/Tony/julia/src/home/Tony/julia/src\julia_internal.h:69
jl_call_method_internal at /home/Tony/julia/src/home/Tony/julia/src\gf.c:1430
[inline] at .\util.jl:179
runtests at C:\cygwin64\home\Tony\julia\test\testdefs.jl:7
#16 at C:\cygwin64\home\Tony\julia\test\runtests.jl:36
unknown function (ip: 0000000001660036)
[inline] at /home/Tony/julia/src/home/Tony/julia/src\julia_internal.h:69
jl_call_method_internal at /home/Tony/julia/src/home/Tony/julia/src\gf.c:1430
[inline] at /home/Tony/julia/src/home/Tony/julia/src\julia.h:1339
jl_apply at /home/Tony/julia/src/home/Tony/julia/src\builtins.c:505
[inline] at .\promotion.jl:229
#290 at .\multi.jl:1017
run_work_thunk at .\multi.jl:747
[inline] at .\multi.jl:1017
#289 at .\event.jl:46
[inline] at /home/Tony/julia/src/home/Tony/julia/src\julia_internal.h:69
jl_call_method_internal at /home/Tony/julia/src/home/Tony/julia/src\gf.c:1430
[inline] at /home/Tony/julia/src/home/Tony/julia/src\julia.h:1339
jl_apply at /home/Tony/julia/src/home/Tony/julia/src\task.c:249
Allocations: 212296270 (Pool: 211359900; Big: 936370); GC: 686
Worker 4 terminated.    From worker 6:       * sorting               in  94.31 seconds, maxrss  712.34 MB
        From worker 4:       * bitarray             ERROR (unhandled task failure): EOFError: read end of file

@vtjnash
Copy link
Sponsor Member

vtjnash commented Apr 22, 2016

checking code_llvm(STDOUT, Base.reindex, (Base.SubArray{Int64, 3, Base.ReshapedArray{Int64, 3, Base.UnitRange{Int64}, Tuple{}}, Tuple{Base.UnitRange{Int64}, Base.NoSlice, Base.Colon}, false}, Tuple{Base.UnitRange{Int64}, Base.NoSlice, Base.Colon}, Tuple{Base.StepRange{Int64, Int64}, Base.Colon, Base.StepRange{Int64, Int64}}), false, true), i think this is an llvm bug

we emitted:

  %45 = load %StepRange.10* %44, align 16, !dbg !35, !tbaa %jtbaa_immut

llvm emitted:

        movq    32(%r15), %rdi
        movaps  40(%r15), %xmm0

(where %r15 was %3 is a 16-byte aligned jl_value_t*)

@tkelman tkelman merged commit 3399d44 into master Apr 22, 2016
@tkelman tkelman deleted the tk/llvm3.8patches branch April 22, 2016 10:24
@tkelman
Copy link
Contributor Author

tkelman commented Apr 23, 2016

may 25th is going to be the deadline for requesting backports into llvm 3.8.1: http://lists.llvm.org/pipermail/llvm-dev/2016-April/098637.html

@tkelman
Copy link
Contributor Author

tkelman commented May 4, 2016

The bitarray segfault went away with #16011, maybe that just changes the code paths getting hit though?

@yuyichao
Copy link
Contributor

I'm seeing a similar segfault on linux x64 too. This seems to be a LLVM 3.8 bug when splitting an aligned aggregate load into scalar loads.

It transforms

  %26 = load %StepRange.12, %StepRange.12* %25, align 16, !tbaa !6

into

  %.elt = getelementptr inbounds %StepRange.12, %StepRange.12* %16, i64 0, i32 0
  %.unpack = load i64, i64* %.elt, align 16
  %.elt2 = getelementptr inbounds %StepRange.12, %StepRange.12* %16, i64 0, i32 1
  %.unpack3 = load i64, i64* %.elt2, align 16
  %.elt4 = getelementptr inbounds %StepRange.12, %StepRange.12* %16, i64 0, i32 2
  %.unpack5 = load i64, i64* %.elt4, align 16

which is impossible since the three addresses separated by 8 cannot be all 16 bytes aligned....

This seems to be fixed on 3.9 so there's hope that there's a patch that we can backport...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants