Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

staticdata: Some refactors for clarity #52852

Merged
merged 1 commit into from
Jan 12, 2024
Merged

staticdata: Some refactors for clarity #52852

merged 1 commit into from
Jan 12, 2024

Conversation

Keno
Copy link
Member

@Keno Keno commented Jan 10, 2024

I was reading this code as part of looking into #52797. To recap, when we serialize objects for package images, we classify each method as either internal or external, depending on whether the method extends a function in the ambient set of modules ( system image, other previously loaded package images, etc) or whether it is private to the module we're currently serializing (in which case we call it internal).

One of my primary confusions in reading this code was that the new_specializations object holds only external code instances in most of the code, but towards the end of loading, we used to also add any internal code instances that had dependencies on other external method instances in order to share the code instance validation code between the two cases.

I don't think this design is that great. Conceptually, internal CodeInstances are already in the CI cache at the point that validation happens (their max_world is just set to 1, so they're not eligible to run). We do guard the cache insertion by a check whether a code instance already exists, which also catches this case, but I think it's cleaner to just not add any internal code instances to the list and instead simply re-activate the code instance in situ.

Another issue with the old code that I didn't observe, but I think is possible in theory is that any CodeInstances that are not part of the cache hierarchy (e.g. because they were created by external abstract interpreters) should not accidentally get added to the cache hierarchy by this code path. I think this was possible in the old code, but I didn't observe it.

With this change, new_specializations now always only holds external cis, so rename it appropriately. While we're at it, also do some other clarity changes.

@Keno Keno requested review from vtjnash and timholy January 10, 2024 21:26
@@ -3454,8 +3456,6 @@ static void jl_restore_system_image_from_stream_(ios_t *f, jl_image_t *image, jl
jl_code_instance_t *ci = (jl_code_instance_t*)obj;
assert(s.incremental);
ci->min_world = world;
if (ci->max_world != 0)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really the key part of this change. Everything else is largely cosmetic.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if this was before implemented as if (ci->max_world != 0 && !in(ci, new_specializations)) abort() it would never have failed, but was some relic of some past thing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there two kinds of code instances like that:

  1. Internal ones that require validation (max_world == 1), which are handled by the extra cache walk you commented on below
  2. Internal ones that don't (max_world == (size_t)-1), which don't need any extra handling.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, yeah, I guess that does make sense.

alright, SGTM/LGTM

Comment on lines 1181 to 1189
} else {
// Likely internal. Find the CI already in the cache hierarchy.
for (jl_code_instance_t *codeinst = caller->cache; codeinst; codeinst=codeinst->next) {
if (codeinst->min_world != minworld)
continue;
if (codeinst->max_world != WORLD_AGE_REVALIDATION_SENTINEL)
continue;
codeinst->max_world = maxvalid;
}
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This visitor seems slightly uncomfortable to me. There seems an assumption to me that all CI are handled by jl_insert_backedges as well, which is not true (some don't have any edges at all)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it doesn't have any edges (and it is internal), the serializer doesn't set the max_world to WORLD_AGE_REVALIDATION_SENTINEL, and no further processing is required.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it wasn't internal? Clearly I don't remember well how this part of the code is currently implemented, as it has changed too many times

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not internal, it's in the next_ext_cis list and will not got through this code path.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't really like this iterator, though maybe it is unavoidable to bring it back. It seems to imply that it is forbidden for multiple codeinst to exist for roughly the same set of conditions (as they might have different edge constraints), and that isn't really something we want to disallow. And seems like it implies that deserialization did a bad job at tracking the content it loaded, which seems awkward and unfortunate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand what you're saying. There's nothing preventing multiple code instances from being in this cache. Are you saying that the edges are really a property of the CodeInstances rather than the MethodInstances? In that case, I think I would agree with that, but that's not really how things are set up right now. I really think this is a string improvement over what we used to do, which would result in essentially the same end state, but much more convoluted. If we want to do even better than this, I'm all in favor of course.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is also just the comment about "Likely internal". Should that say simply "Internal"? (and optionally assert that with a pkgimage id lookup via the address)

Yeah, agreed that edges right now are coalesced into the MethodInstance even though they belong to the CodeInstance. It is one of the things that #52415 is intended to start working towards fixing, as it better stores (some of) these edges, so they are not lost into the Method.roots table instead (as happens now)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably possible for something to be neither external nor internal depending on what external absints are doing, which is the case I was hedging here. I can change the comment if you prefer though.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like it would be an error to end up here though in any other case, right? Since this is validating edges, except those edges will validate incorrectly for any absint. Particularly any absint that doesn't do vanilla inference and/or that uses overlay tables, this code will be wrong for them and invalid to reach here with those.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it shouldn't end up here (though I think it would probably be reasonably safe). We'll want to think about what the proper thing to do for external absints is, but for now, I think it would be fine.

@aviatesk
Copy link
Sponsor Member

aviatesk commented Jan 11, 2024

Another issue with the old code that I didn't observe, but I think is possible in theory is that any CodeInstances that are not part of the cache hierarchy (e.g. because they were created by external abstract interpreters) should not accidentally get added to the cache hierarchy by this code path. I think this was possible in the old code, but I didn't observe it.

I filed this as an issue (#48453), and actually happens for some external AbstractInterpreters. We currently avoid this issue with this hack, but it's great if pkgimage avoids caching external CodeInstances for now.

@aviatesk
Copy link
Sponsor Member

Confirmed this fixes #48453.

I was reading this code as part of looking into #52797. To recap, when we serialize
objects for package images, we classify each method as either internal or external,
depending on whether the method extends a function in the ambient set of modules (
system image, other previously loaded package images, etc) or whether it is private
to the module we're currently serializing (in which case we call it internal).

One of my primary confusions in reading this code was that the `new_specializations`
object holds only external code instances in most of the code, but towards the end
of loading, we used to also add any internal code instances that had dependencies on
other external method instances in order to share the code instance validation code
between the two cases.

I don't think this design is that great. Conceptually, internal CodeInstances are
already in the CI cache at the point that validation happens (their max_world is just
set to 1, so they're not eligible to run). We do guard the cache insertion by a check
whether a code instance already exists, which also catches this case, but I think
it's cleaner to just not add any internal code instances to the list and instead
simply re-activate the code instance in situ.

Another issue with the old code that I didn't observe, but I think is possible in theory
is that any CodeInstances that are not part of the cache hierarchy (e.g. because
they were created by external abstract interpreters) should not accidentally get added
to the cache hierarchy by this code path. I think this was possible in the old code,
but I didn't observe it.

With this change, `new_specializations` now always only holds external cis, so rename
it appropriately. While we're at it, also do some other clarity changes.
@Keno Keno merged commit 9836f2d into master Jan 12, 2024
4 of 7 checks passed
@Keno Keno deleted the kf/extcirefactor branch January 12, 2024 14:18
aviatesk added a commit to aviatesk/JET.jl that referenced this pull request Jan 15, 2024
With the introduction of `CodeInstance` serialization support in
JuliaLang/julia#52852, it's now reasonable to experiment with persisting
the analysis cache during precompilation.

There's an important consideration: the `codeinst.max_world` might
occasionally be set to `WORLD_AGE_REVALIDATION_SENTINEL` on the C side
under certain conditions. This assignment should ideally be reserved for
proper `CodeInstance`s (but it is not currently) and supposed to be
fixed up by a subsequent serialization pass. To address this, this
commit also adds an `__init__` hook that overrides such `max_world`s
at package loading time.
aviatesk added a commit to aviatesk/JET.jl that referenced this pull request Jan 16, 2024
With the introduction of `CodeInstance` serialization support in
JuliaLang/julia#52852, it's now reasonable to experiment with persisting
the analysis cache during precompilation.

There's an important consideration: the `codeinst.max_world` might
occasionally be set to `WORLD_AGE_REVALIDATION_SENTINEL` on the C side
under certain conditions. This assignment should ideally be reserved for
proper `CodeInstance`s (but it is not currently) and supposed to be
fixed up by a subsequent serialization pass. To address this, this
commit also adds an `__init__` hook that overrides such `max_world`s
at package loading time.
aviatesk added a commit to aviatesk/JET.jl that referenced this pull request Jan 16, 2024
With the introduction of `CodeInstance` serialization support in
JuliaLang/julia#52852, it's now reasonable to experiment with persisting
the analysis cache during precompilation.

There's an important consideration: the `codeinst.max_world` might
occasionally be set to `WORLD_AGE_REVALIDATION_SENTINEL` on the C side
under certain conditions. This assignment should ideally be reserved for
proper `CodeInstance`s (but it is not currently) and supposed to be
fixed up by a subsequent serialization pass. To address this, this
commit also adds an `__init__` hook that overrides such `max_world`s
at package loading time.
aviatesk added a commit to aviatesk/JET.jl that referenced this pull request Jan 16, 2024
With the introduction of `CodeInstance` serialization support in
JuliaLang/julia#52852, it's now reasonable to experiment with persisting
the analysis cache during precompilation.

There's an important consideration: the `codeinst.max_world` might
occasionally be set to `WORLD_AGE_REVALIDATION_SENTINEL` on the C side
under certain conditions. This assignment should ideally be reserved for
proper `CodeInstance`s (but it is not currently) and supposed to be
fixed up by a subsequent serialization pass. To address this, this
commit also adds an `__init__` hook that overrides such `max_world`s
at package loading time.
aviatesk added a commit to aviatesk/JET.jl that referenced this pull request Jan 16, 2024
With the introduction of `CodeInstance` serialization support in
JuliaLang/julia#52852, it's now reasonable to experiment with persisting
the analysis cache during precompilation.

There's an important consideration: the `codeinst.max_world` might
occasionally be set to `WORLD_AGE_REVALIDATION_SENTINEL` on the C side
under certain conditions. This assignment should ideally be reserved for
proper `CodeInstance`s (but it is not currently) and supposed to be
fixed up by a subsequent serialization pass. To address this, this
commit also adds an `__init__` hook that overrides such `max_world`s
at package loading time.
aviatesk added a commit to aviatesk/JET.jl that referenced this pull request Jan 16, 2024
With the introduction of `CodeInstance` serialization support in
JuliaLang/julia#52852, it's now reasonable to experiment with persisting
the analysis cache during precompilation.

There's an important consideration: the `codeinst.max_world` might
occasionally be set to `WORLD_AGE_REVALIDATION_SENTINEL` on the C side
under certain conditions. This assignment should ideally be reserved for
proper `CodeInstance`s (but it is not currently) and supposed to be
fixed up by a subsequent serialization pass. To address this, this
commit also adds an `__init__` hook that overrides such `max_world`s
at package loading time.
Comment on lines +1615 to +1616
if (jl_atomic_load_relaxed(&ci->inferred) && ptrhash_has(&s->callers_with_edges, ci->def))
newci->max_world = WORLD_AGE_REVALIDATION_SENTINEL;
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Keno Do you think it is necessary to assign this sentinel to CodeInstances that aren't part of the cache hierarchy (those created by external abstract interpreters)? Currently, the max_world for these CodeInstances is being set to the sentinel, but it appears they aren't corrected until deserialization.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be unconditional here, since it always will at least need to be set to the current value on deserialize?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants