-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: introduce runtime representation of broadcast fusion #23692
Conversation
Awesome! Would be great if it had some tests and examples for actually modifying broadcast behaviour for custom container types. |
Haha. From my description above ("experiments with variations ... can now be implemented"), I was implying that someone else should do that 😛 |
Cool. This is awesome that this is looked at. :) Thanks @vtjnash Since I'm not sure of any preceding discussions - was this approach vs others considered? For example, one might consider lazy but non-parser-fusing broadcast? This problem seems at first blush possible to deal with using Julia types, dispatch, and dead-simple dot parsing where each dot-call or dot-op makes exactly one |
@andyferris, see e.g. #19198 for discussion of making broadcast lazy vs materializing. |
@@ -2914,7 +2914,7 @@ Base.literal_pow(::typeof(^), ::PR20530, ::Val{p}) where {p} = 2 | |||
p = 2 | |||
@test x^p == 1 | |||
@test x^2 == 2 | |||
@test [x,x,x].^2 == [2,2,2] | |||
@test_broken [x, x, x].^2 == [2, 2, 2] # literal_pow violates referential transparency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What result does this give now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On master, I get [x,x,x].^2 == [2,2,2]
but I guess this PR breaks this? Temporarily or permanently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think literal_pow
really played well with broadcast
and dot syntax to begin with. It was a little unpredictable when literal_pow
or regular ^
would be called...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, on this PR you get [1, 1, 1]
.
Probably also the best path for fixing #22932 (comment) |
What happens in this PR to |
No, they are currently passed as three arguments:
Are there cases where that makes a significant performance difference? |
@mbauman, I thought that it did, but now I'm trying a simple example and I'm not seeing much difference: julia> f!(dest, x) = broadcast!(x -> x^2 + 3x - 5x^3, dest, x);
julia> g!(dest, x) = broadcast!((x,y,z) -> x^2 + 3y - 5z^3, dest, x,x,x);
julia> x = rand(10^4); y = similar(x);
julia> @btime f!($y, $x);
7.167 μs (0 allocations: 0 bytes)
julia> @btime g!($y, $x);
7.183 μs (0 allocations: 0 bytes) (I wonder if the compiler got better since I last tried this?) |
It would also be nice to see a proof-of-concept implementation of a custom array type for which fusion is disabled, just to make sure that this is feasible with a reasonable amount of code given the new design. |
That's precisely what I'm working on right now. I understand this much better after playing with it for a little bit. Here's a simple example:
Expands out into:
So, now I think you'll need to overload It'd be really nice if we could pass the objects themselves to the Edit: note that it's easy to introspect these things by evaluating a partial expansion:
|
Ok, I don't have the complete thread here yet, but it looks like we'll want to add a recursive pre-walk to make it easy to modify the call tree and its arguments. For example, we want to recursively identify that, e.g., In this case, we'd then swap the Does this plan of attack sound reasonable and achievable? I definitely will need more time to process how this becomes an extensible interface. Others are very welcome to pitch in. :) |
@mbauman, if this analysis of the fusion call graph occurs at runtime, is there an issue with type inference? Functions that disable fusion still need to be inferrable. |
It's not – that later information is never produced in this representation. |
Right, but that's exactly what we'd need to address something like #22932 (comment). I was simply trying to identify if/how it'd be possible to get there from the current implementation in a way that's easy to extend. This is very clever and definitely an improvement in many cases… but it blows up fast due to all the nested types. For example, this definition is no longer inferable:
|
# Conflicts: # src/julia-syntax.scm # test/broadcast.jl # test/ranges.jl
I merged this into master. Here's a fun demo. codefunction Base.broadcast(f, r::UnitRange)
if all_ops_ursafe(f)
UnitRange(f(first(r)), f(last(r)))
elseif all_ops_rsafe(f)
range(f(first(r)), f(step(r)), length(r))
else
broadcast(f, Broadcast.DefaultArrayStyle{1}(), nothing, nothing, r)
end
end
all_ops_ursafe(f::Broadcast.Fusion) = all_ops_ursafe(f.f)
all_ops_ursafe(f::Broadcast.FusionCall) = all_ops_ursafe(f.f) && all(all_ops_ursafe, f.args)
all_ops_ursafe(::typeof(+)) = true
all_ops_ursafe(::typeof(-)) = true
all_ops_ursafe(f::Broadcast.FusionConstant{<:Real}) = true
all_ops_ursafe(f::Broadcast.FusionArg) = true
all_ops_ursafe(arg) = false
all_ops_rsafe(f::Broadcast.Fusion) = all_ops_rsafe(f.f)
all_ops_rsafe(f::Broadcast.FusionCall) = all_ops_rsafe(f.f) && all(all_ops_rsafe, f.args)
all_ops_rsafe(::typeof(+)) = true
all_ops_rsafe(::typeof(-)) = true
all_ops_rsafe(::typeof(*)) = true
all_ops_rsafe(::typeof(/)) = true
all_ops_rsafe(f::Broadcast.FusionConstant) = true
all_ops_rsafe(f::Broadcast.FusionArg) = true
all_ops_rsafe(arg) = false demo
Overall I'm pretty optimistic about this and think we probably want this for 1.0. My two concerns:
|
I suppose this is OK, but I don't really like this PR anymore, since it seemed like it only addressed part of the problem. I started a new project at master...vtjnash:jn/lazydotfuse to experiment with being fully lazy and it seemed better. It needs updating for the changes to the broadcast API. |
Neat demo |
I didn't spend a lot of time looking it over, but IIUC Conversely, this PR doesn't seem to suffer from that problem at all. What specifically seemed missing to you? |
The operations are realized at the "end" of the broadcasting, so the effect of it is transparent to user code. This PR still does an awkward amount of work in scheme. The other branch eliminates that and moves everything into Julia. The net result is that the default is for Julia to build effectively the same representation as this PR. But allowing libraries to implement more accurate control over the transform. |
So it seems like what @vtjnash has done on that branch is exactly what I had envisioned when I proposed doing this with lazy operations instead of as a purely syntactic transformation. The key insight is that we were already effectively deciding where the materialization boundary is in the syntactic approach: the materialization boundary is exactly where dot fusion stops. |
It might be worth exporting the linked lists as a standard library package? |
It seems the questions here are diverging further and further from the core of this PR. Please open new, focused, issues instead, and cross reference back to this one. Issues are cheap. |
What do folks think I should do about this? I'm currently blocked on #25267. I could push a branch that has the requisite changes in it, but it is going to fail nanosoldier. Or if it doesn't, it's only because we don't have sufficiently comprehensive broadcasting tests. I guess the real question is whether we want to merge something that has known performance regressions for the sake of the API change, and assume that the performance fix will come. |
I would say do the API changes now and performance improvements during the alpha. |
Can't this be done for 1.1? Right now, there is no documented API for making |
I agree, @stevengj – hooking into broadcast should be explicitly not a stable API in 1.0. |
Agreed this is a tough call. You outline the dangers well. The flip side is that it's a bit unsatisfying to say "Julia 1.0 is here! It's got a stable API except for broadcasting!" I suspect the coming PR is the last remaining component needed to avoid having that kind of caveat, and to me that's sufficiently attractive that I'm going to keep plugging away at this, get the PR submitted, and y'all can decide what to do with it. If the goal were only to allow broadcasting to be lazy, then I'd say that's not important enough to worry about. But as a refresher, the real problems to be fixed are:
If we do this later, then my crystal ball suggests it might look something like this:
If none of these sound serious to others, don't let this stand in the way of the 0.7-alpha release. |
@timholy, the minimal change to address some of those issues would be something like #22063: a way to disable fusion for certain argument types, without any more complicated stuff like laziness or this PR. If we ever change broadcast to be fully lazy or whatever, we can still implement (Honestly, I don't see the big deal about ranges. Just don't use |
It would certainly be better to be able to say "everything is perfect and we won't change it", but we're really out of time to keep working on this and it doesn't seem ready yet. Since broadcasting and dot syntax are such new and novel features, I don't think it's the end of the world to say that:
For the vast majority of Julia users, the "broadcasting API" is how broadcasting behaves and how they use it, not how it's implemented and extended. If/when we do change the internals of broadcasting, we can work closely with packages that hook into it to ensure a smooth transition so that non-library-author Julia users are none the wiser that anything has changed. |
It's possible to handle, it just requires some code duplication. |
Do we still need all the |
@andreasnoack, please don't derail this conversation to rehash that. I've explained why that's needed elsewhere and will be happy to do so again but not here. |
New version of this PR in #25377.
We're deprecating That fact then makes the alternative less attractive:
So we'd have to disable fusion for all ranges, and that presumably means ending fusion for quite a few operations. Totally doable, of course, just unattractive. |
@timholy, as you mentioned above, we could define |
One problem with julia> Base.broadcast(::typeof(+), ::UnitRange, ::Int) = "crazy"
julia> (1:5) .+ 1
5-element Array{Int64,1}:
2
3
4
5
6
julia> n = 1
1
julia> (1:5) .+ n
"crazy" But I agree that deep nesting is rare, and I could live without preserving the range in such cases. |
Okay, then let's just define |
We have that currently, so that wouldn't be a change. But that does take us back to
That said, not everything in 1.0 is going to be satisfying. Still, this problem and others are fixed in #25377. It's just a question of whether people are scared by it, and of course to finish up the loose ends. As an alternative, what about removing the literal-slurping from the lisp fusion code? Are there big negatives? That would at least make broadcasting work in simple cases. |
At a huge performance cost, I thought you said? |
Last I checked, there was a significant performance penalty to simple expressions if you remove the literal-slurping. But this doesn't seem to be the case any more?
I'm not sure what changed? Another problem with removing literal-slurping is that it eliminates the type-stability of Unitful expressions involving literal powers. One possibility would be to only do literal-slurping for |
Not as bad now. For julia> @btime broadcast!(+, $b, $a, 1);
12.715 ns (0 allocations: 0 bytes) and #25377 gives julia> @btime broadcast!(+, $b, $a, 1);
17.991 ns (0 allocations: 0 bytes) |
If we don't do #25377, that would be an improvement. Then, #22063 might be enough to prevent the GPU people from hating us. You could conceivably rework it around Still less flexible than #25377, but it's a much smaller change, and it automatically fixes the |
@inline (f::FusionCall)(args...) = f.f(map(a -> a(args...), f.args)...) | ||
# TODO: calling _apply on map _apply is not handled by inference | ||
# for now, we unroll some cases and generate others, to help it out | ||
#@inline (f::FusionApply)(args...) = Core._apply(f.f, map(a -> a(args...), f.args)...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
equivalently, I think this is also f.f(flatten(map(a -> a(args...), f.args)...)...)
where flatten(args...) = Core._apply(tuple, args...)
This patch represents the combined efforts of four individuals, over 60 commits, and an iterated design over (at least) three pull requests that spanned nearly an entire year (closes #22063, #23692, #25377 by superceding them). This introduces a pure Julia data structure that represents a fused broadcast expression. For example, the expression `2 .* (x .+ 1)` lowers to: ```julia julia> Meta.@lower 2 .* (x .+ 1) :($(Expr(:thunk, CodeInfo(:(begin Core.SSAValue(0) = (Base.getproperty)(Base.Broadcast, :materialize) Core.SSAValue(1) = (Base.getproperty)(Base.Broadcast, :make) Core.SSAValue(2) = (Base.getproperty)(Base.Broadcast, :make) Core.SSAValue(3) = (Core.SSAValue(2))(+, x, 1) Core.SSAValue(4) = (Core.SSAValue(1))(*, 2, Core.SSAValue(3)) Core.SSAValue(5) = (Core.SSAValue(0))(Core.SSAValue(4)) return Core.SSAValue(5) end))))) ``` Or, slightly more readably as: ```julia using .Broadcast: materialize, make materialize(make(*, 2, make(+, x, 1))) ``` The `Broadcast.make` function serves two purposes. Its primary purpose is to construct the `Broadcast.Broadcasted` objects that hold onto the function, the tuple of arguments (potentially including nested `Broadcasted` arguments), and sometimes a set of `axes` to include knowledge of the outer shape. The secondary purpose, however, is to allow an "out" for objects that _don't_ want to participate in fusion. For example, if `x` is a range in the above `2 .* (x .+ 1)` expression, it needn't allocate an array and operate elementwise — it can just compute and return a new range. Thus custom structures are able to specialize `Broadcast.make(f, args...)` just as they'd specialize on `f` normally to return an immediate result. `Broadcast.materialize` is identity for everything _except_ `Broadcasted` objects for which it allocates an appropriate result and computes the broadcast. It does two things: it `initialize`s the outermost `Broadcasted` object to compute its axes and then `copy`s it. Similarly, an in-place fused broadcast like `y .= 2 .* (x .+ 1)` uses the exact same expression tree to compute the right-hand side of the expression as above, and then uses `materialize!(y, make(*, 2, make(+, x, 1)))` to `instantiate` the `Broadcasted` expression tree and then `copyto!` it into the given destination. All-together, this forms a complete API for custom types to extend and customize the behavior of broadcast (fixes #22060). It uses the existing `BroadcastStyle`s throughout to simplify dispatch on many arguments: * Custom types can opt-out of broadcast fusion by specializing `Broadcast.make(f, args...)` or `Broadcast.make(::BroadcastStyle, f, args...)`. * The `Broadcasted` object computes and stores the type of the combined `BroadcastStyle` of its arguments as its first type parameter, allowing for easy dispatch and specialization. * Custom Broadcast storage is still allocated via `broadcast_similar`, however instead of passing just a function as a first argument, the entire `Broadcasted` object is passed as a final argument. This potentially allows for much more runtime specialization dependent upon the exact expression given. * Custom broadcast implmentations for a `CustomStyle` are defined by specializing `copy(bc::Broadcasted{CustomStyle})` or `copyto!(dest::AbstractArray, bc::Broadcasted{CustomStyle})`. * Fallback broadcast specializations for a given output object of type `Dest` (for the `DefaultArrayStyle` or another such style that hasn't implemented assignments into such an object) are defined by specializing `copyto(dest::Dest, bc::Broadcasted{Nothing})`. As it fully supports range broadcasting, this now deprecates `(1:5) + 2` to `.+`, just as had been done for all `AbstractArray`s in general. As a first-mover proof of concept, LinearAlgebra uses this new system to improve broadcasting over structured arrays. Before, broadcasting over a structured matrix would result in a sparse array. Now, broadcasting over a structured matrix will _either_ return an appropriately structured matrix _or_ a dense array. This does incur a type instability (in the form of a discriminated union) in some situations, but thanks to type-based introspection of the `Broadcasted` wrapper commonly used functions can be special cased to be type stable. For example: ```julia julia> f(d) = round.(Int, d) f (generic function with 1 method) julia> @inferred f(Diagonal(rand(3))) 3×3 Diagonal{Int64,Array{Int64,1}}: 0 ⋅ ⋅ ⋅ 0 ⋅ ⋅ ⋅ 1 julia> @inferred Diagonal(rand(3)) .* 3 ERROR: return type Diagonal{Float64,Array{Float64,1}} does not match inferred return type Union{Array{Float64,2}, Diagonal{Float64,Array{Float64,1}}} Stacktrace: [1] error(::String) at ./error.jl:33 [2] top-level scope julia> @inferred Diagonal(1:4) .+ Bidiagonal(rand(4), rand(3), 'U') .* Tridiagonal(1:3, 1:4, 1:3) 4×4 Tridiagonal{Float64,Array{Float64,1}}: 1.30771 0.838589 ⋅ ⋅ 0.0 3.89109 0.0459757 ⋅ ⋅ 0.0 4.48033 2.51508 ⋅ ⋅ 0.0 6.23739 ``` In addition to the issues referenced above, it fixes: * Fixes #19313, #22053, #23445, and #24586: Literals are no longer treated specially in a fused broadcast; they're just arguments in a `Broadcasted` object like everything else. * Fixes #21094: Since broadcasting is now represented by a pure Julia datastructure it can be created within `@generated` functions and serialized. * Fixes #26097: The fallback destination-array specialization method of `copyto!` is specifically implemented as `Broadcasted{Nothing}` and will not be confused by `nothing` arguments. * Fixes the broadcast-specific element of #25499: The default base broadcast implementation no longer depends upon `Base._return_type` to allocate its array (except in the empty or concretely-type cases). Note that the sparse implementation (#19595) is still dependent upon inference and is _not_ fixed. * Fixes #25340: Functions are treated like normal values just like arguments and only evaluated once. * Fixes #22255, and is performant with 12+ fused broadcasts. Okay, that one was fixed on master already, but this fixes it now, too. * Fixes #25521. * The performance of this patch has been thoroughly tested through its iterative development process in #25377. There remain [two classes of performance regressions](#25377) that Nanosoldier flagged. * #25691: Propagation of constant literals sill lose their constant-ness upon going through the broadcast machinery. I believe quite a large number of functions would need to be marked as `@pure` to support this -- including functions that are intended to be specialized. (For bookkeeping, this is the squashed version of the [teh-jn/lazydotfuse](#25377) branch as of a1d4e7e. Squashed and separated out to make it easier to review and commit) Co-authored-by: Tim Holy <tim.holy@gmail.com> Co-authored-by: Jameson Nash <vtjnash@gmail.com> Co-authored-by: Andrew Keller <ajkeller34@users.noreply.github.com>
This patch represents the combined efforts of four individuals, over 60 commits, and an iterated design over (at least) three pull requests that spanned nearly an entire year (closes #22063, #23692, #25377 by superceding them). This introduces a pure Julia data structure that represents a fused broadcast expression. For example, the expression `2 .* (x .+ 1)` lowers to: ```julia julia> Meta.@lower 2 .* (x .+ 1) :($(Expr(:thunk, CodeInfo(:(begin Core.SSAValue(0) = (Base.getproperty)(Base.Broadcast, :materialize) Core.SSAValue(1) = (Base.getproperty)(Base.Broadcast, :make) Core.SSAValue(2) = (Base.getproperty)(Base.Broadcast, :make) Core.SSAValue(3) = (Core.SSAValue(2))(+, x, 1) Core.SSAValue(4) = (Core.SSAValue(1))(*, 2, Core.SSAValue(3)) Core.SSAValue(5) = (Core.SSAValue(0))(Core.SSAValue(4)) return Core.SSAValue(5) end))))) ``` Or, slightly more readably as: ```julia using .Broadcast: materialize, make materialize(make(*, 2, make(+, x, 1))) ``` The `Broadcast.make` function serves two purposes. Its primary purpose is to construct the `Broadcast.Broadcasted` objects that hold onto the function, the tuple of arguments (potentially including nested `Broadcasted` arguments), and sometimes a set of `axes` to include knowledge of the outer shape. The secondary purpose, however, is to allow an "out" for objects that _don't_ want to participate in fusion. For example, if `x` is a range in the above `2 .* (x .+ 1)` expression, it needn't allocate an array and operate elementwise — it can just compute and return a new range. Thus custom structures are able to specialize `Broadcast.make(f, args...)` just as they'd specialize on `f` normally to return an immediate result. `Broadcast.materialize` is identity for everything _except_ `Broadcasted` objects for which it allocates an appropriate result and computes the broadcast. It does two things: it `initialize`s the outermost `Broadcasted` object to compute its axes and then `copy`s it. Similarly, an in-place fused broadcast like `y .= 2 .* (x .+ 1)` uses the exact same expression tree to compute the right-hand side of the expression as above, and then uses `materialize!(y, make(*, 2, make(+, x, 1)))` to `instantiate` the `Broadcasted` expression tree and then `copyto!` it into the given destination. All-together, this forms a complete API for custom types to extend and customize the behavior of broadcast (fixes #22060). It uses the existing `BroadcastStyle`s throughout to simplify dispatch on many arguments: * Custom types can opt-out of broadcast fusion by specializing `Broadcast.make(f, args...)` or `Broadcast.make(::BroadcastStyle, f, args...)`. * The `Broadcasted` object computes and stores the type of the combined `BroadcastStyle` of its arguments as its first type parameter, allowing for easy dispatch and specialization. * Custom Broadcast storage is still allocated via `broadcast_similar`, however instead of passing just a function as a first argument, the entire `Broadcasted` object is passed as a final argument. This potentially allows for much more runtime specialization dependent upon the exact expression given. * Custom broadcast implmentations for a `CustomStyle` are defined by specializing `copy(bc::Broadcasted{CustomStyle})` or `copyto!(dest::AbstractArray, bc::Broadcasted{CustomStyle})`. * Fallback broadcast specializations for a given output object of type `Dest` (for the `DefaultArrayStyle` or another such style that hasn't implemented assignments into such an object) are defined by specializing `copyto(dest::Dest, bc::Broadcasted{Nothing})`. As it fully supports range broadcasting, this now deprecates `(1:5) + 2` to `.+`, just as had been done for all `AbstractArray`s in general. As a first-mover proof of concept, LinearAlgebra uses this new system to improve broadcasting over structured arrays. Before, broadcasting over a structured matrix would result in a sparse array. Now, broadcasting over a structured matrix will _either_ return an appropriately structured matrix _or_ a dense array. This does incur a type instability (in the form of a discriminated union) in some situations, but thanks to type-based introspection of the `Broadcasted` wrapper commonly used functions can be special cased to be type stable. For example: ```julia julia> f(d) = round.(Int, d) f (generic function with 1 method) julia> @inferred f(Diagonal(rand(3))) 3×3 Diagonal{Int64,Array{Int64,1}}: 0 ⋅ ⋅ ⋅ 0 ⋅ ⋅ ⋅ 1 julia> @inferred Diagonal(rand(3)) .* 3 ERROR: return type Diagonal{Float64,Array{Float64,1}} does not match inferred return type Union{Array{Float64,2}, Diagonal{Float64,Array{Float64,1}}} Stacktrace: [1] error(::String) at ./error.jl:33 [2] top-level scope julia> @inferred Diagonal(1:4) .+ Bidiagonal(rand(4), rand(3), 'U') .* Tridiagonal(1:3, 1:4, 1:3) 4×4 Tridiagonal{Float64,Array{Float64,1}}: 1.30771 0.838589 ⋅ ⋅ 0.0 3.89109 0.0459757 ⋅ ⋅ 0.0 4.48033 2.51508 ⋅ ⋅ 0.0 6.23739 ``` In addition to the issues referenced above, it fixes: * Fixes #19313, #22053, #23445, and #24586: Literals are no longer treated specially in a fused broadcast; they're just arguments in a `Broadcasted` object like everything else. * Fixes #21094: Since broadcasting is now represented by a pure Julia datastructure it can be created within `@generated` functions and serialized. * Fixes #26097: The fallback destination-array specialization method of `copyto!` is specifically implemented as `Broadcasted{Nothing}` and will not be confused by `nothing` arguments. * Fixes the broadcast-specific element of #25499: The default base broadcast implementation no longer depends upon `Base._return_type` to allocate its array (except in the empty or concretely-type cases). Note that the sparse implementation (#19595) is still dependent upon inference and is _not_ fixed. * Fixes #25340: Functions are treated like normal values just like arguments and only evaluated once. * Fixes #22255, and is performant with 12+ fused broadcasts. Okay, that one was fixed on master already, but this fixes it now, too. * Fixes #25521. * The performance of this patch has been thoroughly tested through its iterative development process in #25377. There remain [two classes of performance regressions](#25377) that Nanosoldier flagged. * #25691: Propagation of constant literals sill lose their constant-ness upon going through the broadcast machinery. I believe quite a large number of functions would need to be marked as `@pure` to support this -- including functions that are intended to be specialized. (For bookkeeping, this is the squashed version of the [teh-jn/lazydotfuse](#25377) branch as of a1d4e7e. Squashed and separated out to make it easier to review and commit) Co-authored-by: Tim Holy <tim.holy@gmail.com> Co-authored-by: Jameson Nash <vtjnash@gmail.com> Co-authored-by: Andrew Keller <ajkeller34@users.noreply.github.com>
This preserves all dot-fusion information in the type-heirarchy, permitting complex runtime-analysis and re-dispatch overrides. Since no information is obscured in a new type, experiments with variations on #22060 and #22053 can now be implemented in pure Julia just by adding new dispatch rules. Also, since no new types are created, these can now be easily serialized, and can be returned from generated functions.
fix #21094
fix #22060
fix #22053
replaces #22063