Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can the REPL be made more efficient? #15101

Closed
waTeim opened this issue Feb 16, 2016 · 32 comments
Closed

Can the REPL be made more efficient? #15101

waTeim opened this issue Feb 16, 2016 · 32 comments
Labels
performance Must go faster REPL Julia's REPL (Read Eval Print Loop)

Comments

@waTeim
Copy link
Contributor

waTeim commented Feb 16, 2016

Interactive commands such as uparrow for previous commands appear to be just in time compiled. The extra time is negligible on normal OS, but on ARM processor with slow flash-filesystems the initial response is > 5 seconds. It is fast after that. The same is true of other code that must be JITTed, but basic navigation in the REPL calls attention to itself.

Is there a way to have this functionality compiled into the image, or is it already?

@tkelman tkelman added performance Must go faster REPL Julia's REPL (Read Eval Print Loop) labels Feb 16, 2016
@tkelman
Copy link
Contributor

tkelman commented Feb 16, 2016

There is the list of manual precompile statements that could maybe use some updating.

@ViralBShah
Copy link
Member

This is especially noticeable on arm.

@timholy
Copy link
Member

timholy commented Feb 16, 2016

SnoopCompile is pretty useful for figuring out which functions need precompilation.

@ViralBShah
Copy link
Member

That is really cool! Thanks, I did not know about it.

@waTeim
Copy link
Contributor Author

waTeim commented Mar 17, 2016

Well I attempted to use this but encountered problems (tested on a package).

julia> import SnoopCompile

julia> SnoopCompile.@snoop "/tmp/kalman_compiles.csv" begin include("kalman.jl") end
Turning on compiler logging
error: src/codegen.cpp: No such file or directory
make: *** No targets specified and no makefile found.  Stop.
ERROR: failed process: Process(`sh snoop.sh ''`, ProcessExited(2)) [2]

Just checking that include by itself does actually work.

julia> include("kalman.jl")
kalman

is the problem this won't work unless it has access to the source or julia is built from source? If so, that's a problem because such a thing is pretty much an impossibility on the target system.

There's just so much slowness, 2 - 3 minutes to precompile is not uncommon.

@timholy
Copy link
Member

timholy commented Mar 17, 2016

For julia-0.4, SnoopCompile doesn't work if your julia is from a binary install. If you have a julia-0.5 binary install, it should (now) work if you check out the master branch of SnoopCompile.

@waTeim
Copy link
Contributor Author

waTeim commented Mar 19, 2016

Oh that's helpful. The only precompilation stuff that occurred was in Base; this gist has the output of it. I think that this is what happens simply when the package is imported, but actually using the functions in it there are not that many. All in all it takes about 3-5 minutes for everything to get underway.

Kinda unrelated, is this following an idea? Create another module that uses all of kalman functions and see where that leads?

@waTeim
Copy link
Contributor Author

waTeim commented Mar 20, 2016

I have a list of things to precompile now. I followed my own advice and now I have 6 precompile files. It took about 5 minutes to run snoop compile everything. That's about right. I'd like to precompile everying. But it's mostly in Base (245); there are about 270 total.

The documentation says that you can precompile functions in another module? How do I get around this? The userimg.jl approach was documented to have serious drawbacks. what now?

@timholy
Copy link
Member

timholy commented Mar 20, 2016

If you add the lines relevant to Base to base/precompile.jl and rebuild julia, how much of a performance improvement does this yield?

As for precompiling across modules, aside from userimg.jl, I'm not sure there is a possible solution that doesn't involve changing core julia code. @vtjnash is probably the best person to address that. You can also read some of his blog posts on http://juliacomputing.com/ to see if any ideas occur to you.

@waTeim
Copy link
Contributor Author

waTeim commented Mar 20, 2016

Are there instructions on how to cross-compile Julia because attempting to re-compile Julia on this ARM host is just not possible. I looked into the compile-all possibility and AOT Julia; I feel this might work, but when I tried it, Julia crashed; it was an older version of (57 day ago nightly), I've since replaced it with the build from yesterday, but have not yet reattempted. The idea in #14995 about enabling Juia to cross-compile-all sounds like it would definitely work. This discussion is getting pretty far away from making the REPL more responsive, but it's still applicable, I guess. It seems that for ARM, what is really needed is a way to get most everything in Base native

@vtjnash
Copy link
Member

vtjnash commented Mar 21, 2016

This discussion is getting pretty far away from making the REPL more responsive

agreed, let's continue on julia-dev.

the __precompile__ line causes julia to split the library load into a separate process, hiding that work from SnoopCompile (if you used @profile first, you should see that compilation time was not the bottleneck for the current process, but just a blackbox wait). launch with --compilecache=no to setup the test environment.

AOT Julia is now exercised by the mac buildbot to prevent regressions.

@tkelman
Copy link
Contributor

tkelman commented Mar 21, 2016

exercised by the mac buildbot

CI and buildbot are entirely separate. This is running on mac Travis right now but that's going to have to be temporary, once we can bring back full testing we'll have to move the compile-all testing elsewhere.

@waTeim
Copy link
Contributor Author

waTeim commented Mar 27, 2016

Let me take a step back here for a moment.

  1. Is there a way to cross-compile Julia? Is the a way to execute the Julia build from source on Linux (intel) and what gets created is an ARM executable?
  2. If there isn't a way to cross-compile is using qemu possible?

@yuyichao
Copy link
Contributor

You can use qemu to cross-compile julia. Just set CC and CXX to the cross-compiler and it should work. I only tried using a sysroot with most of the deps installed.

@eschnett
Copy link
Contributor

I have used qemu to set up a Raspberry Pi ARM environment on my laptop. Obviously there's a speed penalty from qemu's CPU emulation, and certain qemu restrictions mean that I cannot increase the amount of memory available in the emulator as much as I'd like. However, overall there is a slight speed benefit, likely because hard disk access is faster on my laptop.

Setting this up was "straightforward" (the quotes indicate sarcasm), and "only" a matter of following the relevant instructions. However, the relevant instructions I found were almost incomprehensible, and spread over different web sites that partly contradict each other. The disk image I managed to create doesn't actually boot successfully, so that I have to boot it into rescue mode and then manually start /sbin/init...

@yuyichao
Copy link
Contributor

Just to note that what I meant was qemu user mode. The system mode could be more straightfoward if you already have a system setup....

@waTeim
Copy link
Contributor Author

waTeim commented Mar 27, 2016

Ok! So what I'm taking away from this is it's possible but is extremely not straightforward and undocumented?

@yuyichao
Copy link
Contributor

It's not much less straightforward than other projects plus getting binfmt_support working.

@waTeim
Copy link
Contributor Author

waTeim commented Mar 27, 2016

Let me see if I can sum things up. First off, it's not documented, I'm on my own, but it might simply be a matter of typing make -- the particular problems I may encounter can be addressed later, but first I have to set up the cross-compile env which are not Julia specific. Seems like there are 2 main options.

Create/copy an ARM distro, use qemu (system), log in download Julia source compile etc. By the way has anyone looked into using Yocto Linux. That actually is straightforward. Just follow the instructions and at the end you have a system. There's the possibility of also getting Julia built using that, but I think difficult because the julia-build deviates from strict use of auto-tools or cmake. However, I have not personally used it like that, I've just used the rpm's that it creates. This has an added feature of creating a cross-compiler based on a meta-data description of the processor.

Or use qemu (user) and configure binfmt-support after having obtained a cross-compiler to support Julia-arm compiling itself during execution of precompile.jl? The trick here is to get and invoke the right cross-compiler for the target architecture, and then somehow enabling Julia-ARM via binfmt?

@tkelman
Copy link
Contributor

tkelman commented Mar 28, 2016

You might not need to use binfmt. We have documented support for cross compiling from linux to windows by setting XC_HOST=x86_64-w64-mingw32 and the final bootstrap steps get run under wine. You could probably extend that setup with an arm cross compiler prefix and use qemu as the launcher program to do bootstrap. Or not worry about bootstrap, just cross compile all the deps then copy things over to the arm board to finish with bootstrap.

@waTeim
Copy link
Contributor Author

waTeim commented Mar 29, 2016

I was able to use qemu-system-arm and virsh, on ubuntu 15 to install the ARM version of Fedora 23 (server). I'm currently installing software tools for Julia compile as well as looking into setting it to use multiple cores. Will update later with full info.

@waTeim
Copy link
Contributor Author

waTeim commented Apr 1, 2016

So when trying to use Fedora, I am getting struck by the same issue that others are: #10602. primes.jl returns an InexactError, and random.jl ends in a segfault.

@nalimilan
Copy link
Member

I wonder what could be so specific to Fedora...

@waTeim
Copy link
Contributor Author

waTeim commented Apr 1, 2016

Maybe something specific to the emulated processor this is using the 'virt' machine <type arch='armv7l' machine='virt'>hvm</type> ? Here it is:

processor   : 0
model name  : ARMv7 Processor rev 1 (v7l)
BogoMIPS    : 125.00
Features    : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm 
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x2
CPU part    : 0xc0f
CPU revision    : 1

Hardware    : Generic DT based system
Revision    : 0000
Serial      : 0000000000000000

@waTeim
Copy link
Contributor Author

waTeim commented Apr 1, 2016

Avoiding the error in random.jl by not including random.jl in sysimage then fails later

error during bootstrap:
LoadError(at "sysimg.jl" line 281: LoadError(at "irrationals.jl" line 95: Base.AssertionError(msg="Float64(π) == Float64(big(π))")))

@waTeim
Copy link
Contributor Author

waTeim commented Apr 6, 2016

circumventing precompile and Fedora issues by attempting the compile-all method on a Rasberry Pi also fails; details can be found in #15783

@waTeim
Copy link
Contributor Author

waTeim commented Apr 17, 2016

So looks like I might be stuck with modifying precompile.jl optio, but how do I get a list (for this issue) of all of the REPL related functions?

@waTeim
Copy link
Contributor Author

waTeim commented Apr 17, 2016

Oh related. Does the precompile trick -- modification of sys.so? work generally for 2 different ARM targets? compile on host 1 (ARM1) work for ARM2? On other words does compilation on ARMresult n generic ARM or does it need to be forced to be generic?

@waTeim
Copy link
Contributor Author

waTeim commented Apr 18, 2016

Compiling on a Rasberry Pi 3 doesn't work either as that distro uses a different glibc than then target host. Can someone simply run compile-all for the generic arm build and provide that as a separate "acclellerated" version or does that not work because of differing processor types?

@JeffBezanson
Copy link
Member

#20793 Helps repl latency a bit.

@ViralBShah
Copy link
Member

PackageCompiler.jl and SnoopCompile.jl work reasonably well now. There is generally effort towards reducing compiler latency. I think we can close this issue and file more specific issues.

@StefanKarpinski
Copy link
Member

We don't need an issue this vague to track this—we know that REPL startup and everything else should be as fast as we can make it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster REPL Julia's REPL (Read Eval Print Loop)
Projects
None yet
Development

No branches or pull requests

10 participants