-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precompile all the CodeInstances #42016
Conversation
I am hoping I'll have time to get back to this over the holidays. If anyone who knows this part of the codebase has a chance to glance at this and see if it's moving in the right direction, I'd appreciate it. |
fec3b59
to
20a6c8a
Compare
If a deserialized MethodInstance has CodeInstances that are not in the cache, recache it. Currently this never runs, and will require other changes to have a functional consequence.
This "turns on" the new functionality
20a6c8a
to
63e6242
Compare
The ordering of recaching types, deserializing external methods, and recaching them is proving to be a bit tricky.
This moves extra-root serialization and deserialization in the block for methods rather than method instances. It also takes large strides towards removing the need for `currently_(de)serializing` by determining the proper course of action locally.
Doesn't quite work correctly yet (there's still some roots-numbering confusion), but it doesn't crash for all packages.
63e6242
to
57a768b
Compare
Also make small changes in the order in which the newrootsindex gets marked, as serialization of the roots themselves might trigger the assert in ircode.c
If we start deserialization and some methods already have new roots that haven't been serialized, we need to pretend that deserialization happened before any new roots were added. Thus we "move them out of the way" and then restore them at the end of deserialization. This also points out that we need to encode all new roots with relative indexing regardless of serialization status. This further decouples ircode.c from dump.c.
In incremental mode, there may be no particular reason to compress until we serialize. It was hoped this would resolve the remaining issues, but it doesn't: a method that first gets compiled against "old" roots but in a different environment can end up with broken root-references.
This abandons the attempt to put everything into a single index and instead sets up a table of blocks indexed by the build_id of the toplevel module. Most of the machinery is hidden behind an API defined in methods.c, so that one could modify the implementation fairly simply. Small fixes
This feels very close. I'm still having some memory-corruption issues but the overall shape seems essentially complete. Some of the bigger changes:
There's significant risk my window for working on this is about to close again, but it would be lovely to get it working so we can at least test it out. |
During incremental compilation (aka, package precompilation), wait to call `jl_compress_ir` until the moment of method serialization. This increases the efficiency of potential root-order transformations (e.g., #42016). Aside from possible memory constraints, there appears to be little downside to waiting.
During incremental compilation (aka, package precompilation), wait to call `jl_compress_ir` until the moment of method serialization. This increases the efficiency of potential root-order transformations (e.g., #42016). Aside from possible memory constraints, there appears to be little downside to waiting.
I am thinking about splitting this into smaller PRs, to make it easier to debug & review. First one is up at #43759. |
Prior to this PR, Julia's precompiled `*.ji` files saved just two categories of code: unspecialized method definitions and type-specialized code for the methods defined by the package. Any novel specializations needed from other code (Base, other packages) were not saved, and therefore effectively thrown away. This PR caches all the code---internal or external---called during package definition that hadn't been previously inferred. This makes precompilation more intuitive (now it saves all relevant inference results), and substantially reduces latency for inference-bound packages. Closes #42016 Fixes #35972 Issue #35972 arose because codegen got started without re-inferring some discarded CodeInstances. This forced the compiler to insert a `jl_invoke`. This PR fixes the issue because needed CodeInstances are no longer discarded by precompilation.
commit 88a1cbf Author: Tim Holy <tim.holy@gmail.com> Date: Sat Feb 19 03:14:45 2022 -0600 Copy over CodeInstances too commit 48c91b3 Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 14:22:02 2022 -0600 Exclude MethodInstances that don't link to worklist module This should prevent us from serializing too much code. commit 3241c4c Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 12:23:12 2022 -0600 Add invalidation test commit ead1fd9 Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 10:23:52 2022 -0600 Fix a failure to invalidate commit b44a8fc Author: Tim Holy <tim.holy@gmail.com> Date: Thu Jan 27 02:54:47 2022 -0600 Serialize external CodeInstances Prior to this PR, Julia's precompiled `*.ji` files saved just two categories of code: unspecialized method definitions and type-specialized code for the methods defined by the package. Any novel specializations needed from other code (Base, other packages) were not saved, and therefore effectively thrown away. This PR caches all the code---internal or external---called during package definition that hadn't been previously inferred. This makes precompilation more intuitive (now it saves all relevant inference results), and substantially reduces latency for inference-bound packages. Closes JuliaLang#42016 Fixes JuliaLang#35972 Issue JuliaLang#35972 arose because codegen got started without re-inferring some discarded CodeInstances. This forced the compiler to insert a `jl_invoke`. This PR fixes the issue because needed CodeInstances are no longer discarded by precompilation.
commit 88a1cbf Author: Tim Holy <tim.holy@gmail.com> Date: Sat Feb 19 03:14:45 2022 -0600 Copy over CodeInstances too commit 48c91b3 Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 14:22:02 2022 -0600 Exclude MethodInstances that don't link to worklist module This should prevent us from serializing too much code. commit 3241c4c Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 12:23:12 2022 -0600 Add invalidation test commit ead1fd9 Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 10:23:52 2022 -0600 Fix a failure to invalidate commit b44a8fc Author: Tim Holy <tim.holy@gmail.com> Date: Thu Jan 27 02:54:47 2022 -0600 Serialize external CodeInstances Prior to this PR, Julia's precompiled `*.ji` files saved just two categories of code: unspecialized method definitions and type-specialized code for the methods defined by the package. Any novel specializations needed from other code (Base, other packages) were not saved, and therefore effectively thrown away. This PR caches all the code---internal or external---called during package definition that hadn't been previously inferred. This makes precompilation more intuitive (now it saves all relevant inference results), and substantially reduces latency for inference-bound packages. Closes JuliaLang#42016 Fixes JuliaLang#35972 Issue JuliaLang#35972 arose because codegen got started without re-inferring some discarded CodeInstances. This forced the compiler to insert a `jl_invoke`. This PR fixes the issue because needed CodeInstances are no longer discarded by precompilation.
commit 88a1cbf Author: Tim Holy <tim.holy@gmail.com> Date: Sat Feb 19 03:14:45 2022 -0600 Copy over CodeInstances too commit 48c91b3 Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 14:22:02 2022 -0600 Exclude MethodInstances that don't link to worklist module This should prevent us from serializing too much code. commit 3241c4c Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 12:23:12 2022 -0600 Add invalidation test commit ead1fd9 Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 10:23:52 2022 -0600 Fix a failure to invalidate commit b44a8fc Author: Tim Holy <tim.holy@gmail.com> Date: Thu Jan 27 02:54:47 2022 -0600 Serialize external CodeInstances Prior to this PR, Julia's precompiled `*.ji` files saved just two categories of code: unspecialized method definitions and type-specialized code for the methods defined by the package. Any novel specializations needed from other code (Base, other packages) were not saved, and therefore effectively thrown away. This PR caches all the code---internal or external---called during package definition that hadn't been previously inferred. This makes precompilation more intuitive (now it saves all relevant inference results), and substantially reduces latency for inference-bound packages. Closes JuliaLang#42016 Fixes JuliaLang#35972 Issue JuliaLang#35972 arose because codegen got started without re-inferring some discarded CodeInstances. This forced the compiler to insert a `jl_invoke`. This PR fixes the issue because needed CodeInstances are no longer discarded by precompilation.
commit 88a1cbf Author: Tim Holy <tim.holy@gmail.com> Date: Sat Feb 19 03:14:45 2022 -0600 Copy over CodeInstances too commit 48c91b3 Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 14:22:02 2022 -0600 Exclude MethodInstances that don't link to worklist module This should prevent us from serializing too much code. commit 3241c4c Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 12:23:12 2022 -0600 Add invalidation test commit ead1fd9 Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 10:23:52 2022 -0600 Fix a failure to invalidate commit b44a8fc Author: Tim Holy <tim.holy@gmail.com> Date: Thu Jan 27 02:54:47 2022 -0600 Serialize external CodeInstances Prior to this PR, Julia's precompiled `*.ji` files saved just two categories of code: unspecialized method definitions and type-specialized code for the methods defined by the package. Any novel specializations needed from other code (Base, other packages) were not saved, and therefore effectively thrown away. This PR caches all the code---internal or external---called during package definition that hadn't been previously inferred. This makes precompilation more intuitive (now it saves all relevant inference results), and substantially reduces latency for inference-bound packages. Closes JuliaLang#42016 Fixes JuliaLang#35972 Issue JuliaLang#35972 arose because codegen got started without re-inferring some discarded CodeInstances. This forced the compiler to insert a `jl_invoke`. This PR fixes the issue because needed CodeInstances are no longer discarded by precompilation.
commit 88a1cbf Author: Tim Holy <tim.holy@gmail.com> Date: Sat Feb 19 03:14:45 2022 -0600 Copy over CodeInstances too commit 48c91b3 Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 14:22:02 2022 -0600 Exclude MethodInstances that don't link to worklist module This should prevent us from serializing too much code. commit 3241c4c Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 12:23:12 2022 -0600 Add invalidation test commit ead1fd9 Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 10:23:52 2022 -0600 Fix a failure to invalidate commit b44a8fc Author: Tim Holy <tim.holy@gmail.com> Date: Thu Jan 27 02:54:47 2022 -0600 Serialize external CodeInstances Prior to this PR, Julia's precompiled `*.ji` files saved just two categories of code: unspecialized method definitions and type-specialized code for the methods defined by the package. Any novel specializations needed from other code (Base, other packages) were not saved, and therefore effectively thrown away. This PR caches all the code---internal or external---called during package definition that hadn't been previously inferred. This makes precompilation more intuitive (now it saves all relevant inference results), and substantially reduces latency for inference-bound packages. Closes JuliaLang#42016 Fixes JuliaLang#35972 Issue JuliaLang#35972 arose because codegen got started without re-inferring some discarded CodeInstances. This forced the compiler to insert a `jl_invoke`. This PR fixes the issue because needed CodeInstances are no longer discarded by precompilation.
commit 88a1cbf Author: Tim Holy <tim.holy@gmail.com> Date: Sat Feb 19 03:14:45 2022 -0600 Copy over CodeInstances too commit 48c91b3 Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 14:22:02 2022 -0600 Exclude MethodInstances that don't link to worklist module This should prevent us from serializing too much code. commit 3241c4c Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 12:23:12 2022 -0600 Add invalidation test commit ead1fd9 Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 10:23:52 2022 -0600 Fix a failure to invalidate commit b44a8fc Author: Tim Holy <tim.holy@gmail.com> Date: Thu Jan 27 02:54:47 2022 -0600 Serialize external CodeInstances Prior to this PR, Julia's precompiled `*.ji` files saved just two categories of code: unspecialized method definitions and type-specialized code for the methods defined by the package. Any novel specializations needed from other code (Base, other packages) were not saved, and therefore effectively thrown away. This PR caches all the code---internal or external---called during package definition that hadn't been previously inferred. This makes precompilation more intuitive (now it saves all relevant inference results), and substantially reduces latency for inference-bound packages. Closes JuliaLang#42016 Fixes JuliaLang#35972 Issue JuliaLang#35972 arose because codegen got started without re-inferring some discarded CodeInstances. This forced the compiler to insert a `jl_invoke`. This PR fixes the issue because needed CodeInstances are no longer discarded by precompilation.
commit 88a1cbf Author: Tim Holy <tim.holy@gmail.com> Date: Sat Feb 19 03:14:45 2022 -0600 Copy over CodeInstances too commit 48c91b3 Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 14:22:02 2022 -0600 Exclude MethodInstances that don't link to worklist module This should prevent us from serializing too much code. commit 3241c4c Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 12:23:12 2022 -0600 Add invalidation test commit ead1fd9 Author: Tim Holy <tim.holy@gmail.com> Date: Thu Feb 17 10:23:52 2022 -0600 Fix a failure to invalidate commit b44a8fc Author: Tim Holy <tim.holy@gmail.com> Date: Thu Jan 27 02:54:47 2022 -0600 Serialize external CodeInstances Prior to this PR, Julia's precompiled `*.ji` files saved just two categories of code: unspecialized method definitions and type-specialized code for the methods defined by the package. Any novel specializations needed from other code (Base, other packages) were not saved, and therefore effectively thrown away. This PR caches all the code---internal or external---called during package definition that hadn't been previously inferred. This makes precompilation more intuitive (now it saves all relevant inference results), and substantially reduces latency for inference-bound packages. Closes JuliaLang#42016 Fixes JuliaLang#35972 Issue JuliaLang#35972 arose because codegen got started without re-inferring some discarded CodeInstances. This forced the compiler to insert a `jl_invoke`. This PR fixes the issue because needed CodeInstances are no longer discarded by precompilation.
Prior to this PR, Julia's precompiled `*.ji` files saved just two categories of code: unspecialized method definitions and type-specialized code for the methods defined by the package. Any novel specializations of methods from Base or previously-loaded packages were not saved, and therefore effectively thrown away. This PR caches all the code---internal or external---called during package definition that hadn't been previously inferred, as long as there is a backedge linking it back to a method owned by a module being precompiled. (The latter condition ensures it will actually be called by package methods, and not merely transiently generated for the purpose of, e.g., metaprogramming or variable initialization.) This makes precompilation more intuitive (now it saves all relevant inference results), and substantially reduces latency for inference-bound packages. Closes #42016 Fixes #35972 Issue #35972 arose because codegen got started without re-inferring some discarded CodeInstances. This forced the compiler to insert a `jl_invoke`. This PR fixes the issue because needed CodeInstances are no longer discarded by precompilation.
Prior to this PR, Julia's precompiled `*.ji` files saved just two categories of code: unspecialized method definitions and type-specialized code for the methods defined by the package. Any novel specializations of methods from Base or previously-loaded packages were not saved, and therefore effectively thrown away. This PR caches all the code---internal or external---called during package definition that hadn't been previously inferred, as long as there is a backedge linking it back to a method owned by a module being precompiled. (The latter condition ensures it will actually be called by package methods, and not merely transiently generated for the purpose of, e.g., metaprogramming or variable initialization.) This makes precompilation more intuitive (now it saves all relevant inference results), and substantially reduces latency for inference-bound packages. Closes #42016 Fixes #35972 Issue #35972 arose because codegen got started without re-inferring some discarded CodeInstances. This forced the compiler to insert a `jl_invoke`. This PR fixes the issue because needed CodeInstances are no longer discarded by precompilation. (cherry picked from commit df81bf9)
Prior to this PR, Julia's precompiled `*.ji` files saved just two categories of code: unspecialized method definitions and type-specialized code for the methods defined by the package. Any novel specializations of methods from Base or previously-loaded packages were not saved, and therefore effectively thrown away. This PR caches all the code---internal or external---called during package definition that hadn't been previously inferred, as long as there is a backedge linking it back to a method owned by a module being precompiled. (The latter condition ensures it will actually be called by package methods, and not merely transiently generated for the purpose of, e.g., metaprogramming or variable initialization.) This makes precompilation more intuitive (now it saves all relevant inference results), and substantially reduces latency for inference-bound packages. Closes JuliaLang#42016 Fixes JuliaLang#35972 Issue JuliaLang#35972 arose because codegen got started without re-inferring some discarded CodeInstances. This forced the compiler to insert a `jl_invoke`. This PR fixes the issue because needed CodeInstances are no longer discarded by precompilation.
Prior to this PR, Julia's precompiled `*.ji` files saved just two categories of code: unspecialized method definitions and type-specialized code for the methods defined by the package. Any novel specializations of methods from Base or previously-loaded packages were not saved, and therefore effectively thrown away. This PR caches all the code---internal or external---called during package definition that hadn't been previously inferred, as long as there is a backedge linking it back to a method owned by a module being precompiled. (The latter condition ensures it will actually be called by package methods, and not merely transiently generated for the purpose of, e.g., metaprogramming or variable initialization.) This makes precompilation more intuitive (now it saves all relevant inference results), and substantially reduces latency for inference-bound packages. Closes #42016 Fixes #35972 Issue #35972 arose because codegen got started without re-inferring some discarded CodeInstances. This forced the compiler to insert a `jl_invoke`. This PR fixes the issue because needed CodeInstances are no longer discarded by precompilation. (cherry picked from commit df81bf9)
Prior to this PR, Julia's precompiled `*.ji` files saved just two categories of code: unspecialized method definitions and type-specialized code for the methods defined by the package. Any novel specializations of methods from Base or previously-loaded packages were not saved, and therefore effectively thrown away. This PR caches all the code---internal or external---called during package definition that hadn't been previously inferred, as long as there is a backedge linking it back to a method owned by a module being precompiled. (The latter condition ensures it will actually be called by package methods, and not merely transiently generated for the purpose of, e.g., metaprogramming or variable initialization.) This makes precompilation more intuitive (now it saves all relevant inference results), and substantially reduces latency for inference-bound packages. Closes JuliaLang#42016 Fixes JuliaLang#35972 Issue JuliaLang#35972 arose because codegen got started without re-inferring some discarded CodeInstances. This forced the compiler to insert a `jl_invoke`. This PR fixes the issue because needed CodeInstances are no longer discarded by precompilation.
This is an attempt at allowing packages to save the entirety of their type-inferred code, which may result in a substantial reduction in latency for many packages. Currently we cache
CodeInstance
s for the module in which they are defined, but not ones that get generated by being called from some other package. This is essentially a replacement for #32705, attempting to solve the "roots
problem" without the massive duplication that doomed that approach. One interesting feature of the approach I've taken here is that it allows us to save type-inferred code even if it's hidden behind a runtime dispatch (no backedges from "owned" methods required).Very briefly, our serialization ultimately refers to a list of objects (the
roots
associated with each method) by index. The problem is, fresh compilation might add to the roots table, and since it grows by order-of-insertion, there's a big risk:Base.push!(::Vector{TypPkgA}, ::TypPkgA)
for some new type defined in PkgA. Thus the correspondingBase
methodpush!(a::Vector{T}, item) where T
getsTypPkgA
inserted in itsroots
table.roots
table. So, when precompilation saves the*.ji
cache file, root index 90 will "mean"TypPkgA
using PkgB
first, then sayusing PkgA
. Let's imagine thatPkgB
also generates a CodeInstance for the same method, and inserts some other type,TypPkgB
. Since this package got loaded first,TypPkgB
occupies index 90 in theroots
table, and some code has already been deserialized and cached in the global code store assuming this to be the case (so we can't change it).using PkgA
. When deserialized, theCodeInstance
forBase.push!(::Vector{TypPkgA}, ::TypPkgA)
will refer to index 90 in the roots table, but unfortunately this now meansTypPkgB
. 💣The starting point for this PR was the question, "If that's so bad, why can we get away with caching CodeInstances for internal methods?" The reason is that when a package is serialized, it has two properties: (1) the methods the package "owns" were never defined before loading the package, so there is no risk that some other package will "get there first" and insert things earlier in the
roots
table; and (2) it is serialized in a "clean" environment, with nothing extra added, so we can be sure that all the objects in theroots
table are available in the package and the things it depends on.This PR takes the approach that we might be able to safely cache new CodeInstances for old modules if we are careful about how we encode root indexes. The idea is to encode the "extra" root indexes not with an absolute index, but a relative one. Essentially, whenever you're about to extend the
roots
table from outside the module, you record the index of the "old" end of the table, and everything that gets added henceforth will be encoded via relative index, i.e., from the old end of the roots table rather than an absolute index. When you deserialize, the idea is that you use the relative offset from whatever the roots table end now happens to be. Since we still have the "clean environment" property (2) above, this seems likely to be fairly well controlled.This is definitely not working yet, but I think it may be close. My sense is that mostly what's missing is mostly or entirely on the deserialization side:
TAG_EXTERN_METHODROOT
, we'd want to re-serialize and update the global code cache with absolute (traditional) encoding ofroots
. This is necessary to enable the next package to exploit the same process. Effectively, the relative indexing can only last during the window in which a package is being loaded, and once that's over we have to set the state of the system to be the same as if we had never used the relative offsets.I don't have a complete understanding of this aspect of
dump.c
, so I could really use some help. I do think this will be a pretty big step forward, so I hope one or more experts can coach me through the remainder. If, that is, the strategy isn't doomed for reasons I've failed to consider.