Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: make reinterpret work on structs #32660

Closed
wants to merge 1 commit into from
Closed

Conversation

andyferris
Copy link
Member

@andyferris andyferris commented Jul 23, 2019

Currently, reinterpret works on primitive types through the bitcast intrinsic and as a (inner!) constructor for ReinterpretArray. In both older array element type casting and the newer ReinterpretArray implementation, it's possible to reinterpret arrays with isbits struct fields as other isbits struct fields, making it more generic than the non-array version.

This PR is my attempt to copy the way Keno implemented getindex on ReinterpretArray to make reinterpret also work on structs. Semantically, we copy the value to a Ref and copy the bits to a Ref of a different type and load that. Fortnately there is enough codegen optimization such that, like the ReinterpretArray case, the overhead is (mostly) removed in the resultant LLVM/native code.

This is work in progress - but I would appreciate early feedback from knowing people on the approach taken in the new reinterpret method.

  • Work around bootstrap issues so we don't overwrite the basic reinterpret method.
  • Add some unit tests
  • News/documentation

if isprimitivetype(Out) && isprimitivetype(In)
return bitcast(Out, x)
elseif struct_subpadding(Out, In)
in = Ref{In}(x)
Copy link
Contributor

@chethega chethega Jul 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we simply

GC.@preserve in begin return unsafe_load(unsafe_convert(Ptr{Out}, in)) end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that’s probably better

ptr_in = unsafe_convert(Ptr{In}, in)
out = Ref{Out}()
ptr_out = unsafe_convert(Ptr{Out}, out)
GC.@preserve in out begin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't in and out be preserved before getting the pointer?

@vtjnash
Copy link
Member

vtjnash commented Nov 11, 2021

Since struct layout is not generally defined (other than for C-compatible isbits types), we've previously felt that reinterpret should not be exposing that detail to the user.

@andyferris
Copy link
Member Author

In that case, is an isbitstype check on In and Out sufficient to ensure correctness?

@BioTurboNick
Copy link
Contributor

BioTurboNick commented Nov 11, 2021

isbitstype types can still have padding, I believe that just refers to the presence or not of references. Are some Julia isbits types not C-compatible?

I guess you'd need to check that both types have the same padding? Though I see you have a note about not being able to check the input's padding, can you say more about that?

@andyferris
Copy link
Member Author

Sorry, yes I remember the issue now. We need to check that any input padding bits are also output padding bits. Maybe even vice-versa, just in case there is some UB there. I’m not sure if there’s any function lying around that can do that or to help see the padding bits?

@BioTurboNick
Copy link
Contributor

I think amending padding in reinterpretarray.jl to this works:

import Base.padding
import Base.Padding
function padding(T)
    padding = Padding[]
    last_end::Int = 0
    for i = 1:fieldcount(T)
        offset = fieldoffset(T, i)
        fT = fieldtype(T, i)
        if offset != last_end
            push!(padding, Padding(offset, offset-last_end))
        end
        last_end = offset + sizeof(fT)
    end
    if 0 < last_end < sizeof(T)
        push!(padding, Padding(last_end + 1, sizeof(T) - last_end))
    end
    padding
end

struct WithPadding1
                  z::Bool
                  y::Int16
                  x::Int8
                  end

padding(WithPadding1)
#=
2-element Vector{Padding}:
 Padding(2, 1)
 Padding(6, 1)
=#

The only consumer of padding is CyclePadding, which may add another Padding line, but I'm not certain in what situations it would be the case that the minimum alignment is greater than the size of type, to test whether anything cares that there are contiguous Padding entries at the end. reinterpretarray tests pass, at least.

@BioTurboNick
Copy link
Contributor

I think this works?

@inline function reinterpret(::Type{Out}, x::In) where {Out, In}
    isbitstype(Out) || error("reinterpret target type must be isbits")
    isbitstype(In) || error("reinterpret source type must be isbits")
    sizeof(Out) == sizeof(In) || error("types must be the same size; got $(sizeof(Out)), $(sizeof(In))")
    isprimitivetype(Out) && isprimitivetype(In) && return bitcast(Out, x)
    struct_subpadding(Out, In) || throw(PaddingError(Out, In))
    in = Ref{In}(x)
    out = Ref{Out}()
    GC.@preserve in out begin
        ptr_in = unsafe_convert(Ptr{In}, in)
        ptr_out = unsafe_convert(Ptr{Out}, out)
        _memcpy!(ptr_out, ptr_in, sizeof(Out))
    end
    return out[]
end

There'd still be the Bool trap representation issue from #43035 .

I don't think unsafe_load can work as suggested in the code comments, as the unsafe_convert can only turn a ref to type T into a pointer of type T.

This PR actually handles what I was trying to do in #43035 but better. You can even do:

reinterpret(Int64, (Int32(1), Int16(1), Int8(1), Int8(1)))

and just get the Int64 directly.

@BioTurboNick
Copy link
Contributor

BioTurboNick commented Nov 12, 2021

I suppose one issue might be that reinterpret for arrays currently doesn't consider end-padding when deciding if a reinterpretation is valid. Is this a bug?

reinterpret(Int32, [(Int32(1), Int16(1), Int8(1))]) # works
reinterpret(Int32, [(Int32(1), Int8(1), Int16(1))]) # PaddingError

Changing the padding function as shown above would be a breaking change for reinterpret arrays if any code is currently relying on trailing padding being ignored. I suppose that's related to #41071? Though being able to convert an array to pure bytes is a useful operation...

It could be that reinterpreting to something with less padding should be always allowed (as the current PR mentions in a comment), but it would still be useful to have a potential round-trip available. Perhaps it could check that the bytes that would go into the padding are all zeroed?

@tkf
Copy link
Member

tkf commented Nov 12, 2021

I believe we need an unsafe (narrow contract API), lower-level, and less magic version of reinterpret that can be used for arbitrary pointer-free types. We need to allow structs of Union fields and padded fields. This is required for supporting rich pointer-free immutable types in GPU and lock-free data structures. The idea is to support something like unsafe_bitcast(NTuple{4,Int32}, Some{Union{Float64,Missing}}(0)) to get an opaque sequence of bytes that can be used with low-level atomics (e.g., torn relaxed store / load) and GPU APIs (e.g., shfl). The only thing we need in such use case is a round trip within a single process.

For such use case, we need a very predictable behavior and need to avoid the "magic behavior" of reinterpret(T, x) where T may act on elements when x is an array (or tuple as was suggested in #43035). For example, it's reasonable to try type-pun a static array.

Here's a sketch of the API

Base.unsafe_bitcast(T::Type, x::S) -> y::T

Interpret the bit representation of x of type S as the bit representation of type T.

The unsafe_ prefix reflects that:

  • The invariance imposed by the constructor of T is not imposed for the value y.
  • If S fields are padded, values of some fields of T may be undefined.
  • The bit representation y may not be usable across Julia processes. Invoking unsafe_bitcast(S, y) when y is produced in a different Julia process is undefined behavior unless S and its-subcomponents do not contain Union fields.

The roundtrip is guaranteed to be ===-identical:

unsafe_bitcast(typeof(x), unsafe_bitcast(T, x)) === x

The target type T and the source type S both must be concrete and have identical size. Both T and S must not contain boxed Julia values. They can contain Union fields.

Examples

julia> x = Some{Union{UInt64,Nothing}}(0);

julia> y = Base.unsafe_bitcast(UInt128, x);

julia> Base.unsafe_bitcast(typeof(x), y) === x
true

Implementation is trivial since I'm not suggesting to detect padding inconsistency:

function unsafe_bitcast(::Type{T}, x::S) where {T,S}
    isconcretetype(T) || throw(ArgumentError("output type $T is not concrete"))
    datatype_pointerfree(T) ||
        throw(ArgumentError("output type $T may contain a boxed object"))
    datatype_pointerfree(S) ||
        throw(ArgumentError("input type $S may contain a boxed object"))
    sizeof(T) == sizeof(S) || throw(ArgumentError("different input and output sizes"))
    output = Ref{T}()
    input = Ref{S}(x)
    GC.@preserve output input begin
        po = Ptr{UInt8}(pointer_from_objref(output))
        pi = Ptr{UInt8}(pointer_from_objref(input))
        _memcpy!(po, pi, sizeof(S))
    end
    return output[]
end

@Seelengrab
Copy link
Contributor

We need to allow structs of Union fields and padded fields. This is required for supporting rich pointer-free immutable types in GPU and lock-free data structures.

Could you expand on that a little? I'm not familiar with GPU programming and I don't immediately see why Union fields follow from that. Also, if only one of the union fields is allowed to be active at one time (it's not a C union, where memory is aliased, right?), then how/why does reinterpreting/bitcasting that make sense?

@BioTurboNick
Copy link
Contributor

If unsafe_bitcast exists for exact preservation, which does seem like a useful function, what if reinterpret skipped over padding, instead of needing them to be compatible?

Should reinterpret be able to take the 56 meaningful bits in (Int32(1), Int16(1), Int8(1)) and create a Tuple{Int32, Int8, Int16}?

@tkf
Copy link
Member

tkf commented Nov 13, 2021

We need to allow structs of Union fields and padded fields. This is required for supporting rich pointer-free immutable types in GPU and lock-free data structures.

Could you expand on that a little? I'm not familiar with GPU programming and I don't immediately see why Union fields follow from that.

The emphasis is on "supporting rich pointer-free immutable types" not on GPU. What I wanted to say was that, if I need to use objects containing Union values on GPU, I need something like unsafe_bitcast. If you want concrete (but messay ATM) code, there's https://github.com/JuliaFolds/FoldsCUDA.jl/blob/master/src/unionvalues.jl. (But my code is likely to contain some undefined behavior. Trying to do it the "right way" points me to the memcpy-based type punning as I suggested above.)

Also, if only one of the union fields is allowed to be active at one time (it's not a C union, where memory is aliased, right?), then how/why does reinterpreting/bitcasting that make sense?

Julia's Union is more like C++'s variant than union when stored somewhere. So, it makes sense to bundle the active type tag and the value of the active type in a single chunk of opaque bytes.

what if reinterpret skipped over padding, instead of needing them to be compatible?

Do you mean to pack bytes when the in-memory representation contain pads? I can't think of the usage of this for me personally. But, if it has some use somewhere, I can imagine having it in Base make sense since it's not clear how to access all these data layout information without touching the internals.

@BioTurboNick
Copy link
Contributor

BioTurboNick commented Nov 13, 2021

what if reinterpret skipped over padding, instead of needing them to be compatible?

Do you mean to pack bytes when the in-memory representation contain pads? I can't think of the usage of this for me personally. But, if it has some use somewhere, I can imagine having it in Base make sense since it's not clear how to access all these data layout information without touching the internals.

Yeah. There were a couple people who raised a desire to convert tuples of various isbitstypes to tuples of other isbitstypes, which is how I got interested in this PR.

@tkf tkf mentioned this pull request Nov 13, 2021
@tkf
Copy link
Member

tkf commented Nov 13, 2021

OK, since it sounds like the use case of reinterpret somewhat extends beyond what I had in mind, I opened #43065 to focus discussion on the API I proposed.

@BioTurboNick
Copy link
Contributor

BioTurboNick commented Nov 13, 2021

I believe I have a solution that implements my more general proposal.

Is it possible to do a pull request to a pull request?

https://github.com/BioTurboNick/julia/tree/reinterpret-all

What I've added:
If padding is mismatched, then it breaks up the copying operations by field, using only the non-padding bytes.
padding function now traverses padding of nested isbitstypes. e.g. Tuple{Tuple{Int8, Int16}, Int8} and gains an entry for trailing padding.

Simple reinterpretations are still ~1 ns; packed-to-padded and vice versa ~14+ ns; mismatched padded-to-padded is 40+ ns, all without allocations.

A remaining issue is Bool trap representations; however, the current reintrepret allows these trap representations via bitcast, so it seems like that should be a separate issue, if that needs to be hardened.

The rationale I'm working with is that the high-level representation of data should be agnostic to the implementation details of the type. So there shouldn't be any logical barrier to converting a bitstype tuple or struct with fields that add up to 64 bits to any other tuple or struct with fields that add up to 64 bits, for example.

@Seelengrab
Copy link
Contributor

Seelengrab commented Nov 13, 2021

A remaining issue is Bool trap representations; however, the current reintrepret allows these trap representations via bitcast, so it seems like that should be a separate issue, if that needs to be hardened.

Since "trap" values are not exclusive to booleans (as the proposed docstring by @tkf mentions, no constructor guarantees are observed), booleans could serve as a good example in the documentation to illustrate why people still have to check whether their desired bitcast is valid for their usecase or not. I still don't believe it should check anything/error here, since we have convert for that behavior (well or we change booleans to care about more than just the LSB, though I'm fairly certain that's more controversial than having the behavior as-is).

@BioTurboNick
Copy link
Contributor

A remaining issue is Bool trap representations; however, the current reintrepret allows these trap representations via bitcast, so it seems like that should be a separate issue, if that needs to be hardened.

Since "trap" values are not exclusive to booleans (as the proposed docstring by @tkf mentions, no constructor guarantees are observed), booleans could serve as a good example in the documentation to illustrate why people still have to check whether their desired bitcast is valid for their usecase or not. I still don't believe it should check anything/error here, since we have convert for that behavior (well or we change booleans to care about more than just the LSB, though I'm fairly certain that's more controversial than having the behavior as-is).

Gotcha. Looks like the bool issue is best left for #34909

@jenkspt
Copy link

jenkspt commented Nov 28, 2021

Not sure if this is still useful, but #43035 was asking for invalid floats from reinterpret. Here are some examples:

julia> typemax(Int64)
9223372036854775807
julia> reinterpret(Float64, typemax(Int64))
NaN
julia> reinterpret(Int64, NaN64)
9221120237041090560
julia> reinterpret(Float64, typemax(Int64)-1)
NaN
julia> reinterpret(Float64, typemax(Int64)-100)
NaN

@BioTurboNick
Copy link
Contributor

Not sure if this is still useful, but #43035 was asking for invalid floats from reinterpret. Here are some examples:

julia> typemax(Int64)
9223372036854775807
julia> reinterpret(Float64, typemax(Int64))
NaN
julia> reinterpret(Int64, NaN64)
9221120237041090560
julia> reinterpret(Float64, typemax(Int64)-1)
NaN
julia> reinterpret(Float64, typemax(Int64)-100)
NaN

Thanks, but by invalid, we mean something that would have undefined behavior because the bit representation was not considered valid. NaNs are all valid bit patterns for Float64 and are processed as such.

@BioTurboNick
Copy link
Contributor

That's in contrast with Bools, which are only considered valid (in Julia/LLVM) if just the first bit is set or all bits are zero. Any other combination of values can lead to unexpected behavior because the compiler assumes the other bits in a Bool are all 0s.

@BioTurboNick
Copy link
Contributor

A remaining issue is Bool trap representations; however, the current reintrepret allows these trap representations via bitcast, so it seems like that should be a separate issue, if that needs to be hardened.

Since "trap" values are not exclusive to booleans (as the proposed docstring by @tkf mentions, no constructor guarantees are observed), booleans could serve as a good example in the documentation to illustrate why people still have to check whether their desired bitcast is valid for their usecase or not. I still don't believe it should check anything/error here, since we have convert for that behavior (well or we change booleans to care about more than just the LSB, though I'm fairly certain that's more controversial than having the behavior as-is).

Gotcha. Looks like the bool issue is best left for #34909

Looks like that bool issue is about to be fixed by #45689 ?

@Seelengrab
Copy link
Contributor

Looks like that bool issue is about to be fixed by #45689 ?

It's only booleans though that are affected by this - other kinds of constructor constraints still don't get checked. In effect, the fix just looses us a convenient example with a builtin datatype 🤷

@BioTurboNick
Copy link
Contributor

Looks like that bool issue is about to be fixed by #45689 ?

It's only booleans though that are affected by this - other kinds of constructor constraints still don't get checked. In effect, the fix just looses us a convenient example with a builtin datatype 🤷

I think it's clear that the compiler considers the high 7 bits to be padding and is free to assume they're 0s, so arguably it's more in line with padding concerns than invalid bit patterns within valid bits. This creates undefined behavior that works sometimes and not in others.

That is:

struct Foo
x::Int
function Foo(x::Int)
    x < 0 || x > 1 && throw(ArgumentException())
    new(x)
end
end

This struct considers any other values to be invalid Foos, and it's possible to make an invalid Foo by reinterpreting 4 as a Foo, but the effect of doing that will be defined.

Can we construct an example where behavior would be undefined, and not just wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants