Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework stdlib caching #48069

Merged
merged 1 commit into from
Mar 24, 2023
Merged

Rework stdlib caching #48069

merged 1 commit into from
Mar 24, 2023

Conversation

vchuravy
Copy link
Member

@vchuravy vchuravy commented Jan 1, 2023

New

In discussion with @KristofferC I decided that it might be better to do this step by step.
So for now this just updates the infrastructure to generate cache files for "upgradable stdlibs" like DelimitedFiles and SparseArrays.

Old

This is mostly meant to start a discussion about how to get to this point.
One of my motivations behind package images was to make it possible to upgrade stdlibs
independently of Julia. For this we need to move stdlibs out of the system image and
instead cache them with package images.

Right now REPL is the only sysimage included here since otherwise precompilation will simply hang.

Results

  • sys.so shrinks from 173M to 96M
  • With julia --startup-file=no startup time improves to 77ms from 127ms on my machine
  • The current strategy of pre-loading stdlibs increases loading time to ~3s most of that time is spent in method_table_insert

@vchuravy vchuravy added this to the 1.10 milestone Jan 1, 2023
@vchuravy vchuravy added compiler:latency Compiler latency pkgimage speculative Whether the change will be implemented is speculative stdlib Julia's standard library labels Jan 1, 2023
@LilithHafner
Copy link
Member

  • Random redefines select_pivot ...

This was introduced in https://github.com/JuliaLang/julia/pull/45222/files to allow QuickerSort to use randomized pivot selection but still function (albeit with deterministic pivot selection) without Random loaded. I think that the best solution will be to move the complex and high performance sorting algorithms out of base and into a stdlib that depends on Random. The sorting stdlib would still pirate Base.sort, just as the Random stdlib pirates Base.rand.

@KristofferC
Copy link
Member

The sorting stdlib would still pirate Base.sort, just as the Random stdlib pirates Base.rand.

That sounds bad. There's already a bunch of headaches with stdlibs type pirating so I don't think adding more is a good idea, or?

@KristofferC
Copy link
Member

KristofferC commented Jan 1, 2023

Another TODO is perhaps to move the precompile stuff from generate_precompile.jl into their respective stdlib.

Would also be interesting to run a PkgEval here just to see how bad the type piracy from LinearAlgebra and Random is.

@vchuravy
Copy link
Member Author

vchuravy commented Jan 1, 2023

Another TODO is perhaps to move the precompile stuff from generate_precompile.jl into their respective stdlib.

That's indeed my first TODO :) That needs to happen before we do a honest evaluation of the latency of this change.

@LilithHafner
Copy link
Member

A difference between Sorting and Random would be that sort! and friends could still work without Sorting loaded, they would just be faster with Sorting.

@KristofferC
Copy link
Member

This is side tracking the discussion of the PR a bit but why would we not want sort to be fast by default?

@quinnj
Copy link
Member

quinnj commented Jan 2, 2023

My understanding is that we need a basic sort for the compiler that doesn't need to be hyper-optimized, so we have a simple implementation for that. Then later, we define all the fancier radixsort et al algorithms that may have other dependencies that are optimized.

@LilithHafner
Copy link
Member

why would we not want sort to be fast by default?

I would want sort to be fast by default when the Sorting stdlib is part of the sysimage, but if someone is stripping out stdlibs then presumably they care about reducing the amount of code they are using (for sysimage size, bug surface area, oor some other reason they find worthwhile). In that case I'd like to only include highly optimized sorting when they or any of their dependents (e.g. probably DataFrames) explicitly ask for it.

My understanding is that we need a basic sort for the compiler that doesn't need to be hyper-optimized

Yep! It is a heap sort (unstable, n log n, rarely slower than 3x the runtime of Base.sort, and sometimes faster, implementation here). Unfortunately, we can't use it as the default without Sorting loaded because we guarantee stability by default.

@vchuravy
Copy link
Member Author

vchuravy commented Jan 2, 2023

The goal here is to remove stdlibs permanently from the sysimg. So we will have to figure out a solution to this (and yes this might mean including the code from two locations)

@LilithHafner
Copy link
Member

If replacing the method overwrite with an ordinary invalidation would work I could hack that out pretty easily

sort.jl

"""
    _select_pivot(lo::Integer, hi::Integer, ::Any) -> deterministic
    _select_pivot(lo::Integer, hi::Integer, ::Nothing) -> randomized

This internal method exists to allow the Random stdlib to redefine Base.Sort's `select_pivot` 
function without triggering Revise's method overwrite errors. Base.Sort defines the `::Any` 
method and Random defines the `::Nothing` method. The function is always called with 
`nothing` as its third argument.
"""
_select_pivot(lo::Integer, hi::Integer, ::Any) = typeof(hi-lo)(hash(lo) % (hi-lo+1)) + lo
select_pivot(lo::Integer, hi::Integer) = _select_pivot(lo::Integer, hi::Integer, nothing)

Random.jl

_select_pivot(lo::Integer, hi::Integer, ::Nothing) = rand(lo:hi)

@StefanKarpinski
Copy link
Member

Can we just have a separate non-public sorting function that the compiler can use that does whatever?

@oscardssmith
Copy link
Member

We already have Core.Compiler.Sort (that is based on Heapsort)

@petvana
Copy link
Member

petvana commented Jan 3, 2023

Just very naive question. Would it make sense to create a single Stdlib.jl library for loading a single Stdlib.ji and Stdlib.so file? Then, transforming generate_precompile.jl would be much easier and it may overcome the type piracy problem as well, right?

@vchuravy
Copy link
Member Author

vchuravy commented Jan 3, 2023

Just very naive question. Would it make sense to create a single Stdlib.jl library for loading a single Stdlib.ji and Stdlib.so file? Then, transforming generate_precompile.jl would be much easier and it may overcome the type piracy problem as well, right?

Partially yes, I was thinking of something like Session.jl to stash all of the methods needed for REPL and to auto-load Pkg, but the piracy issue are not solved by that. I don't want all of the stdlibs in the same image.

@vchuravy vchuravy force-pushed the vc/slimmer_sysimg branch from 195056e to a1a9194 Compare March 22, 2023 14:55
@vchuravy vchuravy changed the title Excise stdlibs from system image Rework stdlib caching Mar 22, 2023
@vchuravy vchuravy marked this pull request as ready for review March 22, 2023 14:56
@vchuravy vchuravy force-pushed the vc/slimmer_sysimg branch 3 times, most recently from 12c1f14 to 28e6493 Compare March 24, 2023 00:08
@vchuravy vchuravy removed the speculative Whether the change will be implemented is speculative label Mar 24, 2023
@vchuravy vchuravy force-pushed the vc/slimmer_sysimg branch from 28e6493 to 50d4632 Compare March 24, 2023 12:10
@vchuravy vchuravy removed the compiler:latency Compiler latency label Mar 24, 2023
Copy link
Member

@staticfloat staticfloat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vchuravy vchuravy merged commit cac267c into master Mar 24, 2023
@vchuravy vchuravy deleted the vc/slimmer_sysimg branch March 24, 2023 16:04
@giordano
Copy link
Contributor

sys.so shrinks from 173M to 96M

% ls -lh usr/lib/julia/sys.so
-rwxr-xr-x 1 mose mose 173M Mar 25 19:03 usr/lib/julia/sys.so

Am I missing something?

@vchuravy
Copy link
Member Author

Should have been more clear about this, but this PR is no longer removing all stdlibs from the sysimage.
It just reworked how the caching is done so that Kristoffer more easily can explore upgradable stdlibs.

@KristofferC
Copy link
Member

Are all the empty .release.image files in stdlib really necessary?

@vchuravy
Copy link
Member Author

We can move them into a separate folder or something, but at lease my Makefile-fu doesn't let me avoid them... We don't know the cache suffix we will choose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkgimage stdlib Julia's standard library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants