Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

static compile round 2 #4898

Merged
merged 29 commits into from
Dec 13, 2013
Merged

static compile round 2 #4898

merged 29 commits into from
Dec 13, 2013

Conversation

vtjnash
Copy link
Sponsor Member

@vtjnash vtjnash commented Nov 23, 2013

This is largely complete. I'll probably try to merge it piecemeal, but I thought people might like to see the status of this.

Big TODO items remaining:

  • respect the jl_set_imaging_mode flag in some essential places (currently only sys0.ji gets the benefits of this)
  • clean-up some platform-specific code (currently targets Mac)
  • precompile a much greater set of functions
  • lookup all external / ccall functions at runtime
  • add support for windows
  • improve efficiency of ccall function lookup
  • investigate cause of repl crash when executing the first input line
  • enough comments to make Jeff happy
  • jl_load_and_lookup(libname, funcname, libptrgv)
  • reduce size of sysimg file such that this pull request doesn't affect the size of the file
  • misc cleanup requested by Jeff

Based on:
Squash reloadso changes for cleaner history
Sanitize macro names in llvm to avoid implied symbol
Re-add jl_dump_bitcode and clean up debugging code
generate llvm global variables for most literal_pointer_val
switch to llvm-managed global variables, load and save them
get much further in the static compile boot process
(barely) working static-compiled repl
static compiling
@aviks
Copy link
Member

aviks commented Nov 23, 2013

Woo hoo!

Will there be any hooks to compile (and load) packages?

@timholy
Copy link
Sponsor Member

timholy commented Nov 23, 2013

This is huge, especially if it will (eventually) be possible to leverage for packages!

@StefanKarpinski
Copy link
Sponsor Member

Yes, this is great to see. Very, very exciting.

@staticfloat
Copy link
Sponsor Member

@vtjnash when you're ready for testing, just let me know and I'll try to break it on all the boxes I run Julia on. ;)

@JeffBezanson
Copy link
Sponsor Member

This is looking pretty good. It is actually a fairly non-invasive change; basically all we needed were some tables to map things to their names in the dynamic symbol table. And the same changes will surely work alongside MCJIT, but @loladiro can confirm that.

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Nov 24, 2013

yeah, its almost surprising how minimal the changes are.

well, sort-of. originally i used some tables to map them to names, but that caused some issues with the linker, aside from being inefficient. in the second commit here, i changed to just mapping them to indices in a private julia table.

the MCJIT should work about the same, except we will need to dump all of the modules

@ghost ghost assigned JeffBezanson Nov 25, 2013
@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Dec 1, 2013

it is very exhilarating to watch julia launch so fast: on some win32 testing where I would launch time julia-readline, wait for the prompt, hit ^D, and wait for it to close, average timings dropped from > 8 seconds to < 2 seconds!

this just has one small item remaining (merging function pointer lookups in ccall, for better efficiency).

@johnmyleswhite
Copy link
Member

Just tried this. It's amazing how fast the REPL starts on my machine with this branch. Well done!

@staticfloat
Copy link
Sponsor Member

Wow. Just... wow. I figured I had to get in on the buzz and see how much of a deal this actually makes..... On my (newer) macbook pro:

$ time julia -e 'return 0'

real  0m3.078s
user  0m3.055s
sys   0m0.118s
$ time ./julia-fast -e 'return 0'

real  0m0.251s
user  0m0.232s
sys   0m0.071s

This is awesome. Can't wait till we can apply this to packages as well!

@Keno
Copy link
Member

Keno commented Dec 1, 2013

0.3 is gonna be awesome :)

@johnmyleswhite
Copy link
Member

In case it's helpful information, this passes all tests on my OS X machine, but segfaults as soon as I type 1 + 1 in the REPL.

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Dec 2, 2013

@johnmyleswhite I'm not sure what is going wrong there -- for some reason, jl_parse_input_line is returning (<null>, <null>) (e.g. a tuple of 2 NULLs). If you hit enter on the first repl prompt, the later ones seem to accept input just fine.

@StefanKarpinski
Copy link
Sponsor Member

That's great. I really do think that we should consider moving some of the parts of Base that are less commonly used and require loading external libraries out. The linear algebra stuff is getting a bit out of hand – I mean, I love that stuff, but it's starting to be a lot of stuff to ship with.

@tshort
Copy link
Contributor

tshort commented Dec 13, 2013

I'm late to the party, but I'd like to add my YAHOOO! and thanks to Jameson and Isaiah.

@ViralBShah
Copy link
Member

I am late too - but this is really amazing. We probably could move some of the more uncommon linear algebra stuff out of base, but hey, with these load times, I am encouraged to add stuff rather than remove!

@StefanKarpinski
Copy link
Sponsor Member

Well, I think the end goal here should be matching load times of python and ruby, so a 10x further speedup would be ideal.

@ivarne
Copy link
Sponsor Member

ivarne commented Dec 14, 2013

Baseline memory usage might also be a good reason to push less commonly used stuff out of Base.

@tknopp
Copy link
Contributor

tknopp commented Dec 14, 2013

I would like to second that it would be great when base Julia would be a little bit more lightweight. This especially important when embedding Julia. Lua has such a success as an embedding language as it is so lightweight.

From my perspective it would be great if their would be, beside Base, about 4-5 modules living in the Julia tree that are precompiled and part of the Julia distribution. The Pkg management in Julia is great, but I think that their is a too high gap between putting things into Base or putting things into a package. And its quite hard to draw a distinct line. Just an example: I have been looking for a median filter and this would be a perfect example of a function that should not be part of Base but part of a "Signal" module, which could live in the Julia source tree. The fft routines are an example of which I think should not be Part of base but still in one of the "high quality" module that could live in the Julia repo.

@JeffBezanson
Copy link
Sponsor Member

I agree it is hard to know where the line is. I think the default should be "batteries included", but it would be great if it were easier to remove pieces you don't need.

A lot of our memory use on startup is just openblas. Particularly with multiple threads, openblas allocates a large amount of memory.

@ViralBShah
Copy link
Member

My default is also to have "batteries included", but I do believe there may be a few things we can move out of Base. Even so, it is unlikely to have any impact on startup time or memory.

With @vtjnash 's patch for setting the openblas threads, we should see much lesser memory utilization. Also, the default max number of threads for openblas are now much lower in our build than before.

@tknopp
Copy link
Contributor

tknopp commented Dec 15, 2013

@JeffBezanson: Does "batteries included" also mean "available at startup"? In Python I also have to "import os" although it is in the standard library.
But anyway, I think that it makes sense to split Base into 5 modules forming the standard library of Julia. Then make it easy to chose which of these is available at startup.

@JeffBezanson
Copy link
Sponsor Member

I think that's very reasonable; if it's easy to change what's available at startup, then that choice doesn't matter so much.

@ivarne
Copy link
Sponsor Member

ivarne commented Dec 15, 2013

It is fairly easy to add using LinAlg to the juliarc file. Only problem then is that you get a fragmented environment where different machines have different configuration. Maybe we could ship a default juliarc file to include the standard packages.

@johnmyleswhite
Copy link
Member

I'd prefer that there be a command-line switch like --minimal that avoids pulling in things like LinAlg. This means that almost all users will get the full system, but conservative users will still have an easy way to remove things they don't want.

@tknopp
Copy link
Contributor

tknopp commented Dec 15, 2013

Whats wrong with a user base choosing different default imports? If Julia wants to be more than a "Matlab environment" it should be easy to get a small memory footprint and startup time.

I also do not get whats so bad about explicit imports. In most programming languages one has to import something before it is usable (import os in Python). If it should be there at startup, it can be put into juliarc.jl

@johnmyleswhite
Copy link
Member

What's wrong with the user base having different defaults is that you have to go to more work to make sure that code is portable across different Julia installations.

Personally, I think Python's approach to hiding basic functionality in modules is an example of bad design for a language used for scientific programming. Both Matlab and R introduce a huge amount of functionality by default and this is something I think is very desirable for aiding discoverability. I would be very sad to see Julia stop doing this. The hiding of functionality is the main reason I did not use Python historically.

@tknopp
Copy link
Contributor

tknopp commented Dec 15, 2013

It is a different thing what is available at the REPL and whats available when executing a Julia program. If you have one Base module an 5 Standard modules, it could be convention to import all in the REPL but only the Base module when executing a Julia program.

This would not make writing portable code hard. One would have to import the Standard modules when one uses them in a different module.

For the discoverability there can be other solutions then importing everything.

The point why I am proposing the two level structure is that currently it is quite hard to get things into Base. There are various functions in Images.jl, that IMHO are "batteries" and should be included (e.g. gaussian filter). But proposing this might lead to a controversial discussion. On the other hand I would assume that everyone would agree that this should go into a "second level stdlib"

@tknopp
Copy link
Contributor

tknopp commented Dec 15, 2013

I have created #5155 for a dedicated discussion on a Base/Standard module.

@gitfoxi
Copy link
Contributor

gitfoxi commented Dec 15, 2013

Julia has two competing notions of hierarchical namespace. One is Module, the other is Type. If you were only allowed one notion hierarchical namespace, which would you choose?

Taking this to it's logical conclusion, imagine a global Type hierarchy that includes every Type -- and so every Method -- in every Module. If I call a specific Method on a specific Type it is up to the compiler to resolve the code to execute. Whether that code is in some pre-compiled base .so that is mmap'd on startup, another .so that isn't, fresh source code in my ~/tmp directory or in a package on Github doesn't matter to me as long as the system has a way to find, compile and execute. I shouldn't have to using or import anything. If I call allknowledge = translate(transmogrify(url"wikipedia.org", :LightPurple), MongolianLanguage()) then that ends my conversation with the computer. Leave the compiling to the compiler -- that's what it's good at.

In all seriousness though, just having a function available in a so/dylib/dll isn't something you should optimize. It's the OS' job to figure it out. Otherwise you'd have hundreds of files in /usr/lib and -- oh, you do? Well, hundreds more than that if you can imagine, it would look like /usr/include.

@tknopp
Copy link
Contributor

tknopp commented Dec 15, 2013

I have not said one function per dll. Modularization should be done in a sane manner. And if done right it actually helps a lot structuring source code. I agree that not needing using seems to be the perfect world but I do not see how this could be implemented efficiently.

@JeffBezanson
Copy link
Sponsor Member

The only purpose of modules and using is to control the answer to the question "when I say x, what x does it refer to?" If that were entirely automated and there were no using, it would be equivalent to having a single namespace since the only x you could get would be whatever the language picks.

@StefanKarpinski
Copy link
Sponsor Member

Also, types aren't a hierarchical namespace in any way that makes sense to me.

@JeffBezanson
Copy link
Sponsor Member

One could actually argue that everything exported should be part of a single namespace, such as Base or some common pool. Then you'd have some of the properties of a single namespace but still be able to hide definitions.

@toivoh
Copy link
Contributor

toivoh commented Dec 15, 2013

In the name of modularity, please don't pool all exported definitions! I for one like to organize my code using modules internal to my modules. I want those to be able export things that are not ultimately exported. I also think that it's really helpful to have some explicit control of what gets imported into a given module, so that I know what it depends on.

@JeffBezanson
Copy link
Sponsor Member

I agree; I was playing a bit of devil's advocate. Modules are one of those things where people tend to expect it to read their mind and know what they want, but many people want something different.

@toivoh
Copy link
Contributor

toivoh commented Dec 15, 2013

Oh, I should have seen that :)

@JeffBezanson
Copy link
Sponsor Member

I am noticing this change has disrupted my workflow --- I'm used to reading email or something while waiting for julia to start up, and that is not possible anymore :)

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Dec 16, 2013

So true.

Also, I like disruptive changes :P

@aviks
Copy link
Member

aviks commented Dec 16, 2013

While developing the JavaCall.jl package, most bugs cause a segfault. Further, a JVM cannot be loaded twice in the same process, even after being destroyed cleanly... so the module cannot be reloaded easily. This change has therefore saved me many hours of time in the last two weeks.

@tkelman tkelman deleted the jn/static_compile_2 branch April 19, 2015 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.