Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support building of system image in binary builds #9376

Merged
merged 7 commits into from
Dec 25, 2014

Conversation

staticfloat
Copy link
Member

I'm going to wait for travis to turn green before I merge

@tkelman
Copy link
Contributor

tkelman commented Dec 16, 2014

Hey, that was fast, thanks. That segfault's no good though...

@staticfloat
Copy link
Member Author

Yep. I did a bit of a hack-n-slash port, which is why I wanted to see what Travis thought of it. ;)

@staticfloat
Copy link
Member Author

Looks like this is good to go. I'll squash then merge.

@staticfloat
Copy link
Member Author

On second thought, nobody but me has used this yet. I'll let it settle a little and see if there are improvements that need making.

@tkelman
Copy link
Contributor

tkelman commented Dec 16, 2014

On second thought, nobody but me has used this yet. I'll let it settle a little and see if there are improvements that need making.

Yeah I like that plan

@few
Copy link
Contributor

few commented Dec 16, 2014

Nice feature! Did you test it on windows? I see too possible issues.

  1. You call WinRPM.installed(), but my WinRPM version doesn't have this function.
  2. There's something wrong with the binutils installed by WinRPM
    It installs two ld.exe, but one of them has size 0. (Maybe it's intended to be a symlink or hardlink)
    And that's the one you return.
    ok: WinRPM\deps\usr\i686-w64-mingw32\sys-root\mingw\i686-w64-mingw32\bin\ld.exe
    broken: WinRPM\deps\usr\i686-w64-mingw32\sys-root\mingw\bin

@tkelman
Copy link
Contributor

tkelman commented Dec 16, 2014

No I haven't had a chance to test this yet, thank you @few for doing some testing before I could get to it!

If you download the 0.4 nightlies from https://status.julialang.org/ this should be built-in, so let's make sure this is working properly on master then we can update this PR with the final version.

Does WinRPM not have an installed function? Whoops.

Yeah we should be going deeper into sys-root\mingw$(Sys.ARCH)-w64-mingw32\bin. I think the sys-root\mingw\bin is where the Linux-host copy of mingw ld (or a symlink to it) lives on a cross-compile system.

@few
Copy link
Contributor

few commented Dec 17, 2014

Now trying with 0.4 nightly:
Here is what it takes to get find_system_linker() to work:
1)

julia> find_system_linker()
ERROR: error compiling find_system_linker: unsupported or misplaced expression "using" in function find_system_linker

I guess using "using" is not possible inside a function.

  1. WinRPM has no installed() function

  2. The PATH separator on windows is ; and not :

  3. The path to ld.exe needs another "$(Sys.ARCH)-w64-mingw32" before "bin".

Now onto running build_sysimg("E:\sys").

  1. libssp is called libssp-0 on windows
  2. I get undefined references
E:\sys.o:./sysimg.jl:295: undefined reference to `_alloca'
E:\sys.o:./sysimg.jl:295: undefined reference to `__divdi3'
E:\sys.o:./sysimg.jl:295: undefined reference to `__fixunsdfdi'
E:\sys.o:./sysimg.jl:295: undefined reference to `__moddi3'
E:\sys.o:./sysimg.jl:295: undefined reference to `_resetstkoflw'
  1. I couldn't figure out how to fix those. So I ended up installing gcc (using WinRPM) and called that. This produced a sys.dll.

So much for the success.

  1. When using the new sys.dll, julia (reproducibly) segfaults right at startup.
Starting program: E:\Programme\Julia-0.4.0-dev\bin\julia-debug.exe -J E:\\sys.dll
[New Thread 1756.0x768]
[New Thread 1756.0x278]
[Thread 1756.0x278 exited with code 0]
DW_FORM_strp pointing outside of .debug_str section [in module E:\sys.dll]

Program received signal SIGSEGV, Segmentation fault.
0x6cd0357e in ptrhash_peek_bp (h=0x4fc1264, key=0x4fc1478) at ptrhash.c:24
24      ptrhash.c: No such file or directory.
(gdb) bt
#0  0x6cd0357e in ptrhash_peek_bp (h=0x4fc1264, key=0x4fc1478) at ptrhash.c:24
#1  0x6cd03622 in ptrhash_get (h=0x4fc1264, key=0x4fc1478) at ptrhash.c:24
#2  0x6cc9abcc in jl_get_binding_ (m=0x4fc1258, var=0x4fc1478, st=0x0) at module.c:134
#3  0x6cc9ad19 in jl_get_binding (m=0x4fc1258, var=0x4fc1478) at module.c:160
#4  0x6cc9b61e in jl_get_global (m=0x4fc1258, var=0x4fc1478) at module.c:322
#5  0x6ccdb5d0 in jl_restore_system_image (fname=0x268c58 "E:\\sys.dll") at dump.c:1411
#6  0x6ccd2a36 in _julia_init (rel=JL_IMAGE_CWD) at init.c:984
#7  0x6ccd374f in julia_init (rel=JL_IMAGE_CWD) at task.c:275
#8  0x0040226d in wmain (argc=0, argv=0x265cec, envp=0x2686f0) at repl.c:355
#9  0x004013f0 in __tmainCRTStartup () at /usr/src/debug/mingw64-i686-runtime-3.3.0-1/crt/crtexe.c:329
#10 0x7c81776f in RegisterWaitForInputIdle () from C:\WINDOWS\system32\kernel32.dll
#11 0x00000000 in ?? ()
  1. Feature request:
    Could you separate the link step in build_sysimg() into its own function? This way it becomes possible to repeat it without the other (long running) steps in case something went wrong there.

@tkelman
Copy link
Contributor

tkelman commented Dec 17, 2014

Ouch. Okay, this clearly needs a lot of work still. I was wanting to get it merged to master sooner rather than later since it kept getting conflicts from everyone else wanting to add new command-line flags (seriously can we try to stop doing that, am I the only one who'd rather see fewer of them, not more?), and probably doesn't break anything that was previously working.

@few
Copy link
Contributor

few commented Dec 17, 2014

Forget the segfault. This was me passing -J sys.dll instead of -J sys.ji

@jakebolewski
Copy link
Member

everyone else wanting to add new command-line flags (seriously can we try to stop doing that, am I the only one who'd rather see fewer of them, not more?),

This PR adds new command line flag as well...

@tkelman
Copy link
Contributor

tkelman commented Dec 17, 2014

This PR adds new command line flag as well...

Right which is why the version for master had to be rebased so many times. I don't think this PR is breaking (actually, it might break contrib/windows/prepare-julia-env.bat which should probably be updated too, even though it's being replaced here), so I think it should be okay for backporting considering the advantages it brings. Rigorous semver would need a minor-number bump, but we're still in 0.x.y.

@staticfloat do you mind if I commit Windows fixes to this branch?

@staticfloat
Copy link
Member Author

Go right ahead. Coding is a low priority for me, so I might not take a look at this for a day or two, although I will try and do something about the OSX nightlies.

@tkelman
Copy link
Contributor

tkelman commented Dec 18, 2014

Was out today but I'll work on the Windows side of this over the next few days.

Considering #9380 this should not be merged until the mac issues on master can be resolved.

@tkelman
Copy link
Contributor

tkelman commented Dec 21, 2014

This is proving to be tricky. If I add a -v to the $(CXX) invocation that links sys.dll in a from-source build, it's actually running this:

 /usr/lib/gcc/x86_64-w64-mingw32/4.8.2/collect2.exe --sysroot=/usr/x86_64-w64-mingw32/sys-root
-m i386pep --shared -Bdynamic -e DllMainCRTStartup --enable-auto-image-base
-o /home/Tony/julia/usr/lib/julia/sys.dll /usr/x86_64-w64-mingw32/sys-root/mingw/lib/../lib/dllcrt2.o
/usr/x86_64-w64-mingw32/sys-root/mingw/lib/../lib/crtbegin.o -L/home/Tony/julia/usr/lib/julia
-L/home/Tony/julia/usr/lib -L/home/Tony/julia/usr/bin -L/usr/lib/gcc/x86_64-w64-mingw32/4.8.2
-L/usr/lib/gcc/x86_64-w64-mingw32/4.8.2/../../../../x86_64-w64-mingw32/lib/../lib
-L/usr/x86_64-w64-mingw32/sys-root/mingw/lib/../lib
-L/usr/lib/gcc/x86_64-w64-mingw32/4.8.2/../../../../x86_64-w64-mingw32/lib
-L/usr/x86_64-w64-mingw32/sys-root/mingw/lib /home/Tony/julia/usr/lib/julia/sys.o
--unresolved-symbols ignore-all -ljulia -lssp -lstdc++ -lmingw32 -lgcc_s -lgcc -lmoldname -lmingwex
-lmsvcrt -ladvapi32 -lshell32 -luser32 -lkernel32 -lmingw32 -lgcc_s -lgcc -lmoldname -lmingwex
-lmsvcrt /usr/x86_64-w64-mingw32/sys-root/mingw/lib/../lib/crtend.o

So we probably need to ensure the mingw-runtime is installed? Any better ideas than requiring WinRPM.install("gcc") and linking with the compiler?

{ "build", required_argument, 0, 'b' },
{ "lisp", no_argument, &lisp_prompt, 1 },
{ "help", no_argument, 0, 'h' },
{ "sysimage", required_argument, 0, 'J' },
{ "code-coverage", optional_argument, 0, 'c' },
{ "cpu-target", required_argument, 0, 'C' },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

following gcc/clang, we should probably call this -march=

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:/ where was the bikeshedding in #8074 ? Does getopt let you do long args with a single dash? If not then the closest we could get would be --march=.... We could look into switching to whatever LLVM uses for command-line option parsing, but that's not a right-before-the-tag type change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ew. That's such a terrible option name and we don't support the single dash form. I'd prefer to use better names, although in this case I can kind of see the argument for matching the compiler weirdo names like march.

@skumagai
Copy link
Contributor

@tkelman as per your comment in #9380, I've started to build this PR. What I'm doing is

git checkout origin/release-0.3   
git rebase origin/sf/build_sysimg3.0
make clean && make

Because I started to build from fresh clone, this will take for a while to finish. I'll let you know the result soon.

@tkelman
Copy link
Contributor

tkelman commented Dec 21, 2014

@skumagai thanks so much! very very helpful.

@skumagai
Copy link
Contributor

@tkelman The build generated working julia (including successful make check).

@staticfloat
Copy link
Member Author

Thanks for picking up the torch on windows, Tony. And thank you for testing stuff out for us Seiji. Very helpful!

@tkelman
Copy link
Contributor

tkelman commented Dec 21, 2014

Still not quite working. I'm going to try adding most of that crt junk to FLAGS and see if we can get away with just binutils and runtime.

This commit adds a few new pieces of functionality:

* The `contrib/build_sysimg.jl` script which builds a new system image.  This method can save the system image wherever the user desires, e.g. it could be stored in `~/.julia`, to allow for per-user system images each customized with packages in their own `userimg.jl`.  Or on a restricted system, this allows for creation of a system image without root access.

* The removal of compile-time `JULIA_CPU_TARGET`, in favor of runtime `--cpu-target`/`-C` command-line flags which default to `"native"` but can be set to `"native"`, `"i386"`, `"core2"` or any other LLVM cpu target.  This allows the creation of a system image targeting user-supplied cpu features, e.g. `cd base; ../julia -C i386 --build /tmp/sys_i386 sysimg.jl`.

* I implemented runtime selection of the cpu target by adding a new member to the `jl_compileropts_t` structure called `cpu_target`.

* Because all julia executables are now created equal, (rather than before where a julia executable needed to have the same `JULIA_CPU_TARGET` set internally as the system image had when it was built) we need to know what CPU feature set the system image is targeting before we initialize code generation.  So a new function `jl_get_system_image_cpu_target()` is exported, which does exactly what it sounds like.

* I added newlines to the end of a few error messages.

* I found an old parser option `-T` which hadn't been removed yet, so I took the opportunity to do so.

* Documentation on this has been added to doc/devdocs/sysimg.rst

Forgot missing colon
@tkelman
Copy link
Contributor

tkelman commented Dec 25, 2014

I think that all unnecessary changes to Base that we have in here, (including samefile) should be removed

Wha? There are no changes to Base here. A few changes to src that I think should be okay, otherwise it's just docs and Makefile tweaks.

Doesn't adding the .ji extension in the second samefile call satisfy the issue about nonexistent files, at least for the purposes we're using it for here?

I don't think we should push the release back for this. If we don't like the state this PR is in, we should push this PR back to the next release.

I'd be fine, though slightly disappointed, with that option. However I don't want to ship sys.dll by default on win32 with [half] broken backtraces, so if we let this PR slide to 0.3.5 then we should at least cherry-pick 367fd29 (I probably should've pushed that straight to release-0.3 instead of this PR branch) before tagging 0.3.4.

@staticfloat
Copy link
Member Author

The good news is, modulo whatever we're doing with build_sysimg.jl, it looks to me like core Julia is in good shape, from the binaries linked above. I tested Base.runtests(["all"]) on these guys (windows included), and all tests pass.

There are no changes to Base here.

Sorry, I thought the Base.samefile() function was included here. I'm thinking we should revert that from release-0.3, since it obviously fails for nonexistant files, and it's not a clear-cut decision on what we should do for those.

@tkelman
Copy link
Contributor

tkelman commented Dec 25, 2014

it looks to me like core Julia is in good shape, from the binaries linked above. I tested Base.runtests(["all"]) on these guys (windows included), and all tests pass.

Yes, that is good news. I feel more confident about that kind of thing now than with previous releases thanks to AppVeyor, but occasionally things sneak by so always good to check.

Sorry, I thought the Base.samefile() function was included here.

Base.samefile has existed for a very long time, even on release-0.3, but I only recently found out about it.

I'm thinking we should revert that from release-0.3, since it obviously fails for nonexistant files, and it's not a clear-cut decision on what we should do for those.

What, specifically, are you proposing we revert? I believe in the current code as of bf042d7, in both places that Base.samefile gets called, at least one of its inputs will always exist. I think this is preferable to the obviously-broken-on-case-insensitive-filesystems string comparison that was being used before my changes.

@staticfloat
Copy link
Member Author

I'm sorry, I was completely confused. I thought due to recent discussion on samefile() that we had recently added that. As that is not the case, I think we're good here. I vote we just merge now that it's pointing at .ji files, and call it a release.

tkelman added a commit that referenced this pull request Dec 25, 2014
Support building of system image in binary builds
@tkelman tkelman merged commit 8273e83 into release-0.3 Dec 25, 2014
@tkelman tkelman deleted the sf/build_sysimg3.0 branch December 25, 2014 08:28
@tkelman
Copy link
Contributor

tkelman commented Dec 25, 2014

The only remaining issues here as far as I can tell are:

i wonder if you should just loop over Sys.dllist(), rather than assuming the name is sys, and do the Base.samefile comparison against each one in turn

Which does make sense to me. We should do that on master, get a feel for it.

Anyway, checklist time? Holiday-themed release, apparently?

@tkelman tkelman mentioned this pull request Dec 27, 2014
@tkelman
Copy link
Contributor

tkelman commented Feb 15, 2015

Ouch, @staticfloat, we goofed here. The julia-debug.exe segfault on release-0.3 that I mentioned at #10058 (comment) was caused by the changes in the first commit here, 86b1094.

We aren't testing julia-debug on AppVeyor to save time, so this went unnoticed for a while. We should really revisit running the tests on the Windows buildbots, where we build a little less frequently and aren't as constrained on time.

@vtjnash
Copy link
Member

vtjnash commented Feb 15, 2015

ah yes, there's where someone tried to open sys.dll on julia-debug.exe on windows: 86b1094#diff-669d4cc5c9c8f4573c5f8d57f5dcab20R1200

(it's mildly OK on other platforms since libjulia isn't linked to anything else on those platforms, but we should probably be loading it with the platform-specific LOAD_ONLY flags, so it doesn't run code or link itself into anything)

@tkelman
Copy link
Contributor

tkelman commented Feb 15, 2015

backtrace, if at all useful https://gist.github.com/d1774892dda5962820a2

@vtjnash
Copy link
Member

vtjnash commented Feb 15, 2015

this causes libjulia and libjulia-debug to get loaded into the same process. the net effect is that symbol resolution becomes partially ambiguous and the result is a segfault shortly thereafter. the backtrace isn't particularly meaningful, since the fault could happen anywhere, the behavior is simply undefined if multiple libjulia versions get loaded into the same process.

@staticfloat
Copy link
Member Author

Ah, whoops. Yes, this makes a lot of sense. I'll try to test out a PR today.

@staticfloat
Copy link
Member Author

Wait, I'm confused. Why does this cause libjulia and libjulia-debug to both get loaded? I'm only loading sys.dll here, not libjulia.dll.

@staticfloat
Copy link
Member Author

At least, those are the ones I'm loading directly.

@vtjnash
Copy link
Member

vtjnash commented Feb 15, 2015

julia-debug has already loaded libjulia-debug. sys.dll forces libjulia.dll to also get loaded in the same process

@staticfloat
Copy link
Member Author

I see. This is because of this line in Makefile, where we link against libjulia only on Windows.

I can't find any mention of LOAD_ONLY flags. I found a RTLD_NOLOAD flag, but that doesn't seem to do what we'd want it to do. I noticed that I don't dlclose() what I open with dlopen() here. If I did that, do you think it would fix things?

I'll note here that I don't get a segfault when starting the latest 0.4 julia-debug.exe on Win64 right now. @tkelman is there anything special I need to do in order to make it segfault? I'm currently building 0.3 right now, we'll see what happens.

@vtjnash
Copy link
Member

vtjnash commented Feb 15, 2015

no, that wouldn't help because the damage has already been done. i think tkelman is running the spawn test to trigger this. i don't know exactly what flags should be used, there are a couple that might be correct: https://msdn.microsoft.com/en-us/library/windows/desktop/ms684179%28v=vs.85%29.aspx

@tkelman
Copy link
Contributor

tkelman commented Feb 16, 2015

@staticfloat on master the segfault happens in the spawn test, #10111. On release-0.3 it happens just from starting julia-debug when sys.dll is present.

@staticfloat
Copy link
Member Author

I'm having a hard time triggering this bug, so it's going to be hard for me to come up with a fix and see that it's made a difference. This is on my win64 buildbot. With a clean build on 0.3-release:

$ ./julia-debug.exe
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.3.6-pre+73 (2015-02-15 00:00 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 50ea614 (1 day old release-0.3)
|__/                   |  x86_64-w64-mingw32

julia> Base.runtests("spawn")
     * spawn
       [stdio passthrough ok]
    SUCCESS

julia>

Not only does it startup, but it runs the spawn test fine. I'll try to test master later.

@tkelman
Copy link
Contributor

tkelman commented Feb 16, 2015

How was that built/installed? If it was doing make dist then running the installer, then that deletes sys.dll. This is mostly a problem for source builds at this time, but will become important when we want to start distributing sys.dll.

@staticfloat
Copy link
Member Author

It was running within the source tree of a buildbot.

@tkelman
Copy link
Contributor

tkelman commented Feb 17, 2015

Was sys.dll present? If not, where did it get deleted?

@staticfloat
Copy link
Member Author

Yes, it's present. Here's it running on 0.3.6 on the buildbot:

$ ./julia-debug.exe
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.3.6 (2015-02-17 22:12 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-w64-mingw32

julia> filter( x -> contains(x, "sys."), Sys.dllist())
1-element Array{String,1}:
 "C:\\cygwin\\home\\vagrant\\buildbot\\slave\\package_win8_1-x64\\build\\usr\\lib\\julia\\sys.DLL"

julia> Base.runtests("spawn")
     * spawn
       [stdio passthrough ok]
    SUCCESS

@tkelman
Copy link
Contributor

tkelman commented Feb 18, 2015

I'm confused then. What is the buildbot doing differently than my machine? I forget, what version of windows are the VM's using?

@staticfloat
Copy link
Member Author

They're running 8.1
-E

On Wed, Feb 18, 2015 at 12:35 PM, Tony Kelman notifications@github.com
wrote:

I'm confused then. What is the buildbot doing differently than my machine?
I forget, what version of windows are the VM's using?


Reply to this email directly or view it on GitHub
#9376 (comment).

@tkelman
Copy link
Contributor

tkelman commented Feb 20, 2015

Base.runtests never uses debug Julia (maybe it should when run from debug julia). And due to 2670c77, on master I currently get a Julia and the system image were compiled for different architectures when I try to do Base.runtests("spawn") from julia-debug.exe.

@tkelman
Copy link
Contributor

tkelman commented Feb 21, 2015

Seems to be something different between Windows 7 and 8+. Using a VM of Windows Server 2012 R2 (which should be similar to Windows 8), I can run the spawn test just fine with julia-debug on master (via usr/bin/julia-debug test/runtests.jl all). Not sure what's causing the difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants