Packages take too long to load #7280

lindahua · 2014-06-16T16:49:36Z

I know this has been discussed for many times, and support of (cached) precompiled packages is something in plan. However, I think it is useful to have a dedicated issue for this.

The long loading time of packages is becoming worse as packages grow. Many important packages take seconds to load. This has already caused unnecessary tension between having sophisticated packages and reducing package loading time.

I measured the time of loading some important packages (on a quite decent Mac Pro with i7 CPU and 16GB Ram). Results are below:

Package Name	Load time (second)
StatsBase	0.92
Graphs	1.04
Distributions	3.07
DataFrames	4.26
PyPlot	7.15
Winston	8.19
Gadfly	16.50

For comparison, packages in Python is much bigger but loads much faster. e.g. scipy takes about 70 ms to load, while matplotlib takes about 92 ms.

I think it is time that we seriously consider this, and have the facility for package pre-compiling and caching as soon as possible.

The text was updated successfully, but these errors were encountered:

JeffBezanson · 2014-06-16T17:21:26Z

dup of #4373

I know everybody is frustrated, myself included.
Conditional dependencies should help Gadfly, which loads slowly mostly because it depends on almost every other major package. Other than that, we need more analysis. It is possible we are hitting some bad critical paths deep in the system.

StefanKarpinski · 2014-06-16T17:24:18Z

One possibility might be to precompile more code that we know common packages need.

timholy · 2014-06-16T17:28:35Z

@JeffBezanson, should we interpret your comment as meaning that you think there may be room for substantially faster loading without the need for a julia analog of *.pyc files?

JeffBezanson · 2014-06-16T17:54:31Z

Yes. Something in there is not scaling well. I have some old performance data where Winston used to load in ~5 seconds, and now it takes 9 seconds on the same machine.

It is possible that compiling top-level expressions is an issue. The vast majority of top-level expressions should not require native code gen (e.g. they just add definitions), but we compile a lot of them out of pure fear that somebody will benchmark something inside one.

There is also a large amount of type diversity that is making it harder for the compiler to converge. My favorite example is that the package manager code uses 17 different types of Dicts (when I last counted about a year ago). I don't know how to address this problem.

Julia itself could also probably start up faster. Of course it's gotten much better, but we're still looking at ~0.5 second. I suspect a lot of time is spent deserializing the system image data, but this deserves a close look now that the "first 90%" of startup time has been solved.

We also spend a lot of time in the LLVM optimizer (~17% of Winston loading in my year-ago measurement), which may not be worth it for lots of code, but this is tough to solve.

Type inference also spends too much time analyzing itself, due to every type in the system streaming through it. It's very hard to avoid this without pessimizing user code. We might need some special cases for inference.jl, or just rewrite in C, which will also make bootstrapping faster and easier.

StefanKarpinski · 2014-06-16T17:59:35Z

My favorite example is that the package manager code uses 17 different types of Dicts (when I last counted about a year ago). I don't know how to address this problem.

I think you're talking about addressing the problem in general, but we could stop using such specifically typed dictionaries in the package manager.

StefanKarpinski · 2014-06-16T18:00:47Z

Type inference also spends too much time analyzing itself, due to every type in the system streaming through it. It's very hard to avoid this without pessimizing user code. We might need some special cases for inference.jl, or just rewrite in C, which will also make bootstrapping faster and easier.

I would be sad to see such a major piece of the implementation move from Julia to C.

JeffBezanson · 2014-06-16T18:04:34Z

A lot of the "type diversity" comes from having too many types generally. For example, all of ASCIIString, UTF8String, and ByteString are used for Dict keys. That's 9 types of basic String=>String Dicts for no real reason (and if you throw in String, 16 types). Of course not all of these are used, but that's the space the type entropy is trying to occupy.

timholy · 2014-06-16T18:06:23Z

I like the idea of studying it. AFAICT, as reported by ProfileView.view(C=true) all the time is spent inside C code. At least on Linux, the ability to get useful lookups from C instruction pointers (ips) is so limited, it's hard to study. When I do @profile reload("DataFrames"), the only useful hits are in jl_load (big surprise) and jl_expand. I'd wager >90% of ips are not usefully decoded. But basically it looks like all the time is in libjulia.so, even if I delete sys.so.

lindahua · 2014-06-16T18:41:11Z

Isn't it the case that caching the compiled image somewhere is an easier solution? (as we discussed in #4373, @JeffBezanson, thanks for pointing to that issue).

I think it is pretty acceptable if a package compiles as fast as the Julia Base itself.

JeffBezanson · 2014-06-16T18:50:22Z

I think both caching and speeding things up are worthwhile. But if we can make loading faster without caching, it will help package authors iterate faster for example. Also, caching is not as easy as it sounds --- it is very hard to know if some existing native code is consistent with all the currently loaded definitions. Furthermore, many packages like to make run-time decisions during loading, effectively making their code impossible to cache.

StefanKarpinski · 2014-06-16T18:57:31Z

Explicit caching is also an option – i.e. making package caching opt-in. Major packages would clearly do so to improve their load times. That would make it clearer that run-time decisions are either going to be cached or have to happen after loading the cached code.

timholy · 2014-06-16T21:42:46Z

OK, I basically got complete backtracing on @profile reload("DataFrames"). Looks like the large majority of the time is being spent in flisp.c: apply_cl. EDIT: There are three hotspots inside that function, but it's not easy to tell which because we can't look up source-code lines in C code (they're instruction-pointer offsets).

I'd be happy to post the data somewhere, if others want to analyze it too.

timholy · 2014-06-16T22:28:07Z

In the meantime here's a gist. These are the triggering lines, sorted in alphabetical order. (Sampling interval was 10ms, just to avoid overfilling the profile buffer.) inference.jl accounts for almost none of the time (5 out of 876 triggers).

JeffBezanson · 2014-06-16T22:53:53Z

It would be good if that result is accurate. It will probably be easier to optimize the front-end than inference.jl. One promising option is to port the front-end to Chicken scheme, which many people seem to feel is a great scheme-to-C compiler.

StefanKarpinski · 2014-06-16T23:27:35Z

Cough, JuliaParser, cough.

StefanKarpinski · 2014-06-16T23:28:02Z

Although Chicken may be good too.

JeffBezanson · 2014-06-16T23:29:33Z

Julia is not fast enough for our needs :-P

timholy · 2014-06-16T23:31:57Z

Agreed that interpretation is key. FYI here is what I did: compile the debug version of Julia, then

Profile.init(10^7, 0.01)
using StatsBase
@profile reload("DataFrames")  # I chose DataFrames for its lack of macros
ips, lidict = Profile.retrieve()
using HDF5, JLD
@save "/tmp/reloaddataframes.jld" ips lidict

For some reason, this works beautifully on one machine (a SandyBridge Xeon E5-2650 system running CentOS 6.4), but the C lookups are nearly useless on my laptop (i7-640LM running Kubuntu 14.04). That's part of why I'm offering to post the data, if you want it.

You can extract the triggering line by doing something like this:

ends = find(ips .== 0)
starts = [4,ends[1:end-1].+4]   # on my machine there are 3 ips for the signal handler, etc
for i = 1:length(starts); println(lidict[ips[starts[i]]]); end

timholy · 2014-06-19T13:24:00Z

Jeff, since 76% of the samples were collected in flisp.c, let me ask: does the lisp code only run during the lowering step? Rather than having to deal with the full complexity of caching an .so file, what about caching just the lowered representation? Seems like it might provide a 4x speed bump, which would be quite noticeable, and it's something that I (naively) imagine might not take many lines of code.

JeffBezanson · 2014-06-19T13:35:59Z

It's parsing and lowering. Generally lowering takes a bit longer than parsing. The only difficulty with caching this is that the cache for one file has to be invalidated if it uses a macro defined in another file that changes.

Jameson started implementing this: #5061

lindahua · 2014-06-19T13:45:13Z

Cutting the loading time to half or one third of the current status would make many packages much nicer to use.

timholy · 2014-06-19T17:31:07Z

The only difficulty with caching this is that the cache for one file has to be invalidated if it uses a macro defined in another file that changes.

Ugh. That's a tough one. I suppose having to explicitly declare one's dependencies doesn't count as a solution. Do we even have an analog of @which that works for macros? Even worse, what happens when the user renames the file containing the macro?

dcjones · 2014-06-21T17:28:06Z

If significant time is spent on LLVM optimizer passes, would it be practical to add a switch to julia disable all or some of them? The issue I have with developing Gadfly is having to reload it and draw one plot over and over. So if I tweak something, it takes like 40 seconds to see the result. Since I'm only drawing one plot, I imagine some of that optimization is counter-productive.

harryprince · 2018-03-11T00:27:49Z

Gadfly is too slow, ggplot2 or R basic plot are better than this. In that, I don't call Julia to draw a plot.

lindahua added the packages label Jun 16, 2014

JeffBezanson closed this as completed Jun 16, 2014

timholy mentioned this issue Aug 12, 2014

precompilation #7977

Closed

This was referenced Sep 12, 2014

WIP: RFC: cache parse results while loading files #5061

Closed

Further speedups in flisp? Possible, or not? #8310

Closed

JeffBezanson mentioned this issue Jul 13, 2018

startup time #28092

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Packages take too long to load #7280

Packages take too long to load #7280

lindahua commented Jun 16, 2014

JeffBezanson commented Jun 16, 2014

StefanKarpinski commented Jun 16, 2014

timholy commented Jun 16, 2014

JeffBezanson commented Jun 16, 2014

StefanKarpinski commented Jun 16, 2014

StefanKarpinski commented Jun 16, 2014

JeffBezanson commented Jun 16, 2014

timholy commented Jun 16, 2014

lindahua commented Jun 16, 2014

JeffBezanson commented Jun 16, 2014

StefanKarpinski commented Jun 16, 2014

timholy commented Jun 16, 2014

timholy commented Jun 16, 2014

JeffBezanson commented Jun 16, 2014

StefanKarpinski commented Jun 16, 2014

StefanKarpinski commented Jun 16, 2014

JeffBezanson commented Jun 16, 2014

timholy commented Jun 16, 2014

timholy commented Jun 19, 2014

JeffBezanson commented Jun 19, 2014

lindahua commented Jun 19, 2014

timholy commented Jun 19, 2014

dcjones commented Jun 21, 2014

harryprince commented Mar 11, 2018 •

edited

Loading

Packages take too long to load #7280

Packages take too long to load #7280

Comments

lindahua commented Jun 16, 2014

JeffBezanson commented Jun 16, 2014

StefanKarpinski commented Jun 16, 2014

timholy commented Jun 16, 2014

JeffBezanson commented Jun 16, 2014

StefanKarpinski commented Jun 16, 2014

StefanKarpinski commented Jun 16, 2014

JeffBezanson commented Jun 16, 2014

timholy commented Jun 16, 2014

lindahua commented Jun 16, 2014

JeffBezanson commented Jun 16, 2014

StefanKarpinski commented Jun 16, 2014

timholy commented Jun 16, 2014

timholy commented Jun 16, 2014

JeffBezanson commented Jun 16, 2014

StefanKarpinski commented Jun 16, 2014

StefanKarpinski commented Jun 16, 2014

JeffBezanson commented Jun 16, 2014

timholy commented Jun 16, 2014

timholy commented Jun 19, 2014

JeffBezanson commented Jun 19, 2014

lindahua commented Jun 19, 2014

timholy commented Jun 19, 2014

dcjones commented Jun 21, 2014

harryprince commented Mar 11, 2018 • edited Loading

harryprince commented Mar 11, 2018 •

edited

Loading