Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make precise semantics of world-age increments in top-level thunks #55145

Closed
Keno opened this issue Jul 16, 2024 · 1 comment
Closed

Make precise semantics of world-age increments in top-level thunks #55145

Keno opened this issue Jul 16, 2024 · 1 comment

Comments

@Keno
Copy link
Member

Keno commented Jul 16, 2024

Recording some extended discussion with @vtjnash earlier today:

Right now, the semantics of world-age increment during top-level evaluation are ill-defined. It basically happens in the interpreter if codegen reaches here:

ct->world_age = jl_atomic_load_acquire(&jl_world_counter);

That's mostly every statement, but not all top-level evaluation goes through this point, so there can be some suprises.

In codegen, world age increment happens at the beginning of any top-level evaluation and then there is a post-processing pass that increments the world age before any call instruction.

julia/src/codegen.cpp

Lines 8996 to 9001 in 0945b9d

if (toplevel && !ctx.is_opaque_closure) {
LoadInst *world = ctx.builder.CreateAlignedLoad(ctx.types().T_size,
prepare_global_in(jl_Module, jlgetworld_global), ctx.types().alignof_ptr);
world->setOrdering(AtomicOrdering::Acquire);
ctx.builder.CreateAlignedStore(world, world_age_field, ctx.types().alignof_ptr);
}

julia/src/codegen.cpp

Lines 9646 to 9654 in 0945b9d

// we're at toplevel; insert an atomic barrier between every instruction
// TODO: inference is invalid if this has any effect (which it often does)
LoadInst *world = new LoadInst(ctx.types().T_size,
prepare_global_in(jl_Module, jlgetworld_global), Twine(),
/*isVolatile*/false, ctx.types().alignof_ptr, /*insertBefore*/&I);
world->setOrdering(AtomicOrdering::Acquire);
StoreInst *store_world = new StoreInst(world, world_age_field,
/*isVolatile*/false, ctx.types().alignof_ptr, /*insertBefore*/&I);
(void)store_world;

There are problems with this. The first is that inference does not model internal world age increments (and even if did, it would basically just end up giving up, because nothing is guaranteed anymore after a world-age increment). This has occasionally lead to complaints (#24316). We currently hack around this by syntactic detection of statements that affect world-age-partitioned global data structure and prohibiting inference of such statements. As noted in the issue this heuristic is incomplete as the existence of Core.eval implies that not all such modifications are syntactically visible (similar issues apply to concurrency).

A second issue is that #55032 has likely introduced divergent behavior between the interpreter and compiler with respect to observing world-age changes. This is probably not a huge problem, but would still break great to avoid.

At the same time, inference of top-level expressions can be valuable particularly when people run benchmarks or tests in toplevel scope, so we would like to preserve our ability to do so. Additionally, we want inference to always be correct, even when run on top-level code (see also e.g. #55144), so it is unsatisfying for inference to be unsound on some valid inputs, even if we never run it in practice.

We discussed various possible improvements. One well-liked solution was to increment world-ages only at the top of any :toplevel argument with corresponding changes in lowering to split definitions into more such blocks. The advantage of such a setup would be that individual top-level code blocks would always be legal to infer, with the world age effects well guarded by Expr(:toplevel). However, this solution appears unworkable because method and struct definitions are permitted access to local-scope variables that do not cross Expr(:toplevel) boundaries.

After some further discussion, we reached the following proposal:

  1. We stop incrementing the world age for every statement. Instead, world age (at toplevel) is incremented to the latest world age at only the following places:
    a. Entry into toplevel evaluation (i.e. at the beginning of Core.eval)
    b. After any evaluation of :method
    c. On the bindings branch, after any introduction of a new binding
  2. When inference's dataflow encounters a world-age affecting expr, it gives up and returns early, giving imprecise (although as precise as possible given the unkown new world age), but correct results for all toplevel code.

There is some tricky interaction with the assignment semantics to global bindings on the bindings branch. In particular, we don't want global assignment to existing bindings to pessimize inference, but at the same time a binding partition replacement requires a world age increment. Since these operations are not fully syntactically separated, global assignment must gain conditional semantic world-age increments. The problem with this is that, as designed, whether an assignment (at toplevel) updates or replaces a global depends on the partition of said binding in the latest world age, which is not a property accessible to inference. A proposal here is to change the semantics to throw a ConcurrencyViolationError if the binding partition in the executing world is not the binding partition in the latest world. This is a little bit unsatisfying as assignment to outdated binding partitions is well defined in non-toplevel code, but possibly the best we can do. It should also not happen in practice unless there's very racy concurrenct evaluation going on.

@Keno Keno pinned this issue Nov 8, 2024
Keno added a commit that referenced this issue Nov 9, 2024
This PR introduces a new, toplevel-only, syntax form `:worldinc`
that semantically represents the effect of raising the current
task's world age to the latest world for the remainder of the
current toplevel evaluation (that context being an entry to
`eval` or a module expression). For detailed motivation on why
this is desirable, see #55145, which I won't repeat here, but
the gist is that we never really defined when world-age increments
and worse are inconsistent about it. This is something we need
to figure out now, because the bindings partition work will make
world age even more observable via bindings.

Having created a mechanism for world age increments, the big question
is one of policy, i.e. when should these world age increments be
inserted.

Several reasonable options exist:
1. After world-age affecting syntax constructs (as proprosed in #55145)
2. Option 1 + some reasonable additional cases that people rely on
3. Before any top level `call` expression
4. Before any expression at toplevel whatsover

As in example, case, consider `a == a` at toplevel. Depending on the semantics
that could either be the same as in local scope, or each of the four
world age dependent lookups (three binding lookups, one method lookup
could occur in a different world age).

The general tradeoff here is between the risk of exposing the user to
confusing world age errors and our ability to optimize top-level code
(in general, any :worldinc statement will require us to fully pessimize
or recompile all following code).

This PR basically implements option 2 with the following semantics:

1. The interpreter explicit raises the world age only at `:worldinc`
   exprs or after `:module` exprs.
2. The frontend inserts `:worldinc` after all struct definitions, method
   definitions, `using` and `import.
3. The `@eval` macro inserts a worldinc following the call to `eval` if at toplevel
4. A literal (syntactic) call to `include` gains an implicit `worldinc`.

Of these the fourth is probably the most questionable, but is necessary
to make this non-breaking for most code patterns. Perhaps it would
have been better to make `include` a macro from the beginning (esp because
it already has semantics that look a little like reaching into the calling
module), but that ship has sailed.

Unfortunately, I don't see any good intermediate options between
this PR and option #3 above. I think option #3 is closes to what
we have right now, but if we were to choose it and actually fix the
soundness issues, I expect that we would be destroying all performance
of global-scope code. For this reason, I would like to try to make the
version in this PR work, even if the semantics are a little ugly.

The biggest pattern that this PR does not catch is:
```
eval(:(f() = 1))
f()
```

We could apply the same `include` special case to eval, but given
the existence of `@eval` which allows addressing this at the macro
level, I decided not to. We can decide which way we want to go
on this based on what the package ecosystem looks like.
Keno added a commit that referenced this issue Nov 10, 2024
This PR introduces a new, toplevel-only, syntax form `:worldinc`
that semantically represents the effect of raising the current
task's world age to the latest world for the remainder of the
current toplevel evaluation (that context being an entry to
`eval` or a module expression). For detailed motivation on why
this is desirable, see #55145, which I won't repeat here, but
the gist is that we never really defined when world-age increments
and worse are inconsistent about it. This is something we need
to figure out now, because the bindings partition work will make
world age even more observable via bindings.

Having created a mechanism for world age increments, the big question
is one of policy, i.e. when should these world age increments be
inserted.

Several reasonable options exist:
1. After world-age affecting syntax constructs (as proprosed in #55145)
2. Option 1 + some reasonable additional cases that people rely on
3. Before any top level `call` expression
4. Before any expression at toplevel whatsover

As in example, case, consider `a == a` at toplevel. Depending on the semantics
that could either be the same as in local scope, or each of the four
world age dependent lookups (three binding lookups, one method lookup
could occur in a different world age).

The general tradeoff here is between the risk of exposing the user to
confusing world age errors and our ability to optimize top-level code
(in general, any :worldinc statement will require us to fully pessimize
or recompile all following code).

This PR basically implements option 2 with the following semantics:

1. The interpreter explicit raises the world age only at `:worldinc`
   exprs or after `:module` exprs.
2. The frontend inserts `:worldinc` after all struct definitions, method
   definitions, `using` and `import.
3. The `@eval` macro inserts a worldinc following the call to `eval` if at toplevel
4. A literal (syntactic) call to `include` gains an implicit `worldinc`.

Of these the fourth is probably the most questionable, but is necessary
to make this non-breaking for most code patterns. Perhaps it would
have been better to make `include` a macro from the beginning (esp because
it already has semantics that look a little like reaching into the calling
module), but that ship has sailed.

Unfortunately, I don't see any good intermediate options between
this PR and option #3 above. I think option #3 is closes to what
we have right now, but if we were to choose it and actually fix the
soundness issues, I expect that we would be destroying all performance
of global-scope code. For this reason, I would like to try to make the
version in this PR work, even if the semantics are a little ugly.

The biggest pattern that this PR does not catch is:
```
eval(:(f() = 1))
f()
```

We could apply the same `include` special case to eval, but given
the existence of `@eval` which allows addressing this at the macro
level, I decided not to. We can decide which way we want to go
on this based on what the package ecosystem looks like.
Keno added a commit that referenced this issue Nov 14, 2024
This PR introduces uses the new, toplevel-only, syntax form `:latestworld`
that semantically represents the effect of raising the current
task's world age to the latest world for the remainder of the
current toplevel evaluation (that context being an entry to
`eval` or a module expression). For detailed motivation on why
this is desirable, see #55145, which I won't repeat here, but
the gist is that we never really defined when world-age increments
and worse are inconsistent about it. This is something we need
to figure out now, because the bindings partition work will make
world age even more observable via bindings.

Having created a mechanism for world age increments, the big question
is one of policy, i.e. when should these world age increments be
inserted.

Several reasonable options exist:
1. After world-age affecting syntax constructs (as proprosed in #55145)
2. Option 1 + some reasonable additional cases that people rely on
3. Before any top level `call` expression
4. Before any expression at toplevel whatsover

As in example, case, consider `a == a` at toplevel. Depending on the semantics
that could either be the same as in local scope, or each of the four
world age dependent lookups (three binding lookups, one method lookup
could occur in a different world age).

The general tradeoff here is between the risk of exposing the user to
confusing world age errors and our ability to optimize top-level code
(in general, any :worldinc statement will require us to fully pessimize
or recompile all following code).

This PR basically implements option 2 with the following semantics:

1. The interpreter explicit raises the world age only at `:latestworld`
   exprs, after `:module` exprs, or at the beginning of the top-level
   exprs inside `:toplevel` and `:module`.
2. The frontend inserts `:latestworld` after all struct definitions, method
   definitions, `using` and `import.
3. The `@eval` macro inserts a worldinc following the call to `eval` if at toplevel
4. A literal (syntactic) call to `include` gains an implicit `worldinc`.

Of these the fourth is probably the most questionable, but is necessary
to make this non-breaking for most code patterns. Perhaps it would
have been better to make `include` a macro from the beginning (esp because
it already has semantics that look a little like reaching into the calling
module), but that ship has sailed.

Unfortunately, I don't see any good intermediate options between
this PR and option #3 above. I think option #3 is closes to what
we have right now, but if we were to choose it and actually fix the
soundness issues, I expect that we would be destroying all performance
of global-scope code. For this reason, I would like to try to make the
version in this PR work, even if the semantics are a little ugly.

The biggest pattern that this PR does not catch is:
```
begin
    eval(:(f() = 1))
    f()
end
```

We could apply the same `include` special case to eval, but given
the existence of `@eval` which allows addressing this at the macro
level, I decided not to. We can decide which way we want to go
on this based on what the package ecosystem looks like.
Keno added a commit that referenced this issue Nov 19, 2024
This PR introduces uses the new, toplevel-only, syntax form `:latestworld`
that semantically represents the effect of raising the current
task's world age to the latest world for the remainder of the
current toplevel evaluation (that context being an entry to
`eval` or a module expression). For detailed motivation on why
this is desirable, see #55145, which I won't repeat here, but
the gist is that we never really defined when world-age increments
and worse are inconsistent about it. This is something we need
to figure out now, because the bindings partition work will make
world age even more observable via bindings.

Having created a mechanism for world age increments, the big question
is one of policy, i.e. when should these world age increments be
inserted.

Several reasonable options exist:
1. After world-age affecting syntax constructs (as proprosed in #55145)
2. Option 1 + some reasonable additional cases that people rely on
3. Before any top level `call` expression
4. Before any expression at toplevel whatsover

As in example, case, consider `a == a` at toplevel. Depending on the semantics
that could either be the same as in local scope, or each of the four
world age dependent lookups (three binding lookups, one method lookup
could occur in a different world age).

The general tradeoff here is between the risk of exposing the user to
confusing world age errors and our ability to optimize top-level code
(in general, any :worldinc statement will require us to fully pessimize
or recompile all following code).

This PR basically implements option 2 with the following semantics:

1. The interpreter explicit raises the world age only at `:latestworld`
   exprs, after `:module` exprs, or at the beginning of the top-level
   exprs inside `:toplevel` and `:module`.
2. The frontend inserts `:latestworld` after all struct definitions, method
   definitions, `using` and `import.
3. The `@eval` macro inserts a worldinc following the call to `eval` if at toplevel
4. A literal (syntactic) call to `include` gains an implicit `worldinc`.

Of these the fourth is probably the most questionable, but is necessary
to make this non-breaking for most code patterns. Perhaps it would
have been better to make `include` a macro from the beginning (esp because
it already has semantics that look a little like reaching into the calling
module), but that ship has sailed.

Unfortunately, I don't see any good intermediate options between
this PR and option #3 above. I think option #3 is closes to what
we have right now, but if we were to choose it and actually fix the
soundness issues, I expect that we would be destroying all performance
of global-scope code. For this reason, I would like to try to make the
version in this PR work, even if the semantics are a little ugly.

The biggest pattern that this PR does not catch is:
```
begin
    eval(:(f() = 1))
    f()
end
```

We could apply the same `include` special case to eval, but given
the existence of `@eval` which allows addressing this at the macro
level, I decided not to. We can decide which way we want to go
on this based on what the package ecosystem looks like.
Keno added a commit that referenced this issue Nov 20, 2024
This PR introduces uses the new, toplevel-only, syntax form `:latestworld`
that semantically represents the effect of raising the current
task's world age to the latest world for the remainder of the
current toplevel evaluation (that context being an entry to
`eval` or a module expression). For detailed motivation on why
this is desirable, see #55145, which I won't repeat here, but
the gist is that we never really defined when world-age increments
and worse are inconsistent about it. This is something we need
to figure out now, because the bindings partition work will make
world age even more observable via bindings.

Having created a mechanism for world age increments, the big question
is one of policy, i.e. when should these world age increments be
inserted.

Several reasonable options exist:
1. After world-age affecting syntax constructs (as proprosed in #55145)
2. Option 1 + some reasonable additional cases that people rely on
3. Before any top level `call` expression
4. Before any expression at toplevel whatsover

As in example, case, consider `a == a` at toplevel. Depending on the semantics
that could either be the same as in local scope, or each of the four
world age dependent lookups (three binding lookups, one method lookup
could occur in a different world age).

The general tradeoff here is between the risk of exposing the user to
confusing world age errors and our ability to optimize top-level code
(in general, any :worldinc statement will require us to fully pessimize
or recompile all following code).

This PR basically implements option 2 with the following semantics:

1. The interpreter explicit raises the world age only at `:latestworld`
   exprs, after `:module` exprs, or at the beginning of the top-level
   exprs inside `:toplevel` and `:module`.
2. The frontend inserts `:latestworld` after all struct definitions, method
   definitions, `using` and `import.
3. The `@eval` macro inserts a worldinc following the call to `eval` if at toplevel
4. A literal (syntactic) call to `include` gains an implicit `worldinc`.

Of these the fourth is probably the most questionable, but is necessary
to make this non-breaking for most code patterns. Perhaps it would
have been better to make `include` a macro from the beginning (esp because
it already has semantics that look a little like reaching into the calling
module), but that ship has sailed.

Unfortunately, I don't see any good intermediate options between
this PR and option #3 above. I think option #3 is closes to what
we have right now, but if we were to choose it and actually fix the
soundness issues, I expect that we would be destroying all performance
of global-scope code. For this reason, I would like to try to make the
version in this PR work, even if the semantics are a little ugly.

The biggest pattern that this PR does not catch is:
```
begin
    eval(:(f() = 1))
    f()
end
```

We could apply the same `include` special case to eval, but given
the existence of `@eval` which allows addressing this at the macro
level, I decided not to. We can decide which way we want to go
on this based on what the package ecosystem looks like.
Keno added a commit that referenced this issue Nov 21, 2024
This PR introduces a new, toplevel-only, syntax form `:worldinc` that
semantically represents the effect of raising the current task's world
age to the latest world for the remainder of the current toplevel
evaluation (that context being an entry to `eval` or a module
expression). For detailed motivation on why this is desirable, see
#55145, which I won't repeat here, but the gist is that we never really
defined when world-age increments and worse are inconsistent about it.
This is something we need to figure out now, because the bindings
partition work will make world age even more observable via bindings.

Having created a mechanism for world age increments, the big question is
one of policy, i.e. when should these world age increments be inserted.

Several reasonable options exist:
1. After world-age affecting syntax constructs (as proprosed in #55145)
2. Option 1 + some reasonable additional cases that people rely on
3. Before any top level `call` expression
4. Before any expression at toplevel whatsover

As an example, case, consider `a == a` at toplevel. Depending on the
semantics that could either be the same as in local scope, or each of
the four world age dependent lookups (three binding lookups, one method
lookup) could (potentially) occur in a different world age.

The general tradeoff here is between the risk of exposing the user to
confusing world age errors and our ability to optimize top-level code
(in general, any `:worldinc` statement will require us to fully
pessimize or recompile all following code).

This PR basically implements option 2 with the following semantics:

1. The interpreter explicit raises the world age only at `:worldinc`
exprs or after `:module` exprs.
2. The frontend inserts `:worldinc` after all struct definitions, method
definitions, `using` and `import.
3. The `@eval` macro inserts a worldinc following the call to `eval` if
at toplevel
4. A literal (syntactic) call to `include` gains an implicit `worldinc`.

Of these the fourth is probably the most questionable, but is necessary
to make this non-breaking for most code patterns. Perhaps it would have
been better to make `include` a macro from the beginning (esp because it
already has semantics that look a little like reaching into the calling
module), but that ship has sailed.

Unfortunately, I don't see any good intermediate options between this PR
and option #3 above. I think option #3 is closest to what we have right
now, but if we were to choose it and actually fix the soundness issues,
I expect that we would be destroying all performance of global-scope
code. For this reason, I would like to try to make the version in this
PR work, even if the semantics are a little ugly.

The biggest pattern that this PR does not catch is:
```
eval(:(f() = 1))
f()
```

We could apply the same `include` special case to eval, but given the
existence of `@eval` which allows addressing this at the macro level, I
decided not to. We can decide which way we want to go on this based on
what the package ecosystem looks like.
@Keno
Copy link
Member Author

Keno commented Nov 21, 2024

Addressed in #56509

@Keno Keno closed this as completed Nov 21, 2024
@DilumAluthge DilumAluthge unpinned this issue Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant