Reinterpret and structure padding #25908

chethega · 2018-02-06T12:31:30Z

versioninfo()
#Julia Version 0.7.0-DEV.3696
#Commit 9f8496c734* (2018-02-02 16:38 UTC)

struct bax2
       a::Int8
       b::Int
       c::Int8
end

A4=[bax2(-1,-1,-1)];
A4r=reinterpret(UInt8, A4);
@show A4r[2];
# A4r[2] = 0xfd
A4r[2]=2;
@show A4r[2];
#A4r[2] = 0xfd

A longer discussion is on discourse https://discourse.julialang.org/t/why-does-reinterpret-cause-an-extra-allocation/8833/24.

My preferred behavior would be the old one for base-arrays (same as unsafe_wrapping the pointer into a new array) and documented UB for abstract arrays (or deprecate reinterpret array for non-contiguous memory). Maybe also special support for offset arrays and contiguous views.

The text was updated successfully, but these errors were encountered:

chethega · 2018-02-06T16:45:08Z

Maybe I should write down better what the issue is: The new reinterpret array works for abstract arrays, and loads from via getindex and stores via setindex! into a temporary that is then re-interpreted via unsafe_load/unsafe_store!.

Now, the 2nd byte of bax2 is structure padding. getindex and setindex! are afaik undefined with respect to how they read/write structure padding. Indeed, one could imagine an abstract array that re-packs structs and never materializes memory for the structure padding. For this reason, parts of reinterpreted abstract arrays of elements with structure padding must be undefined, there is no way around it.

For contiguous memory arrays that use julia conventions for structure layout, we could be faster and without unnecessary UB by directly writing to the memory. The same applies for reading.

This should imho be better documented, and reinterpret should have a special case for the built-in array type.

Keno · 2018-02-06T17:48:46Z

My preferred behavior would be the old one for base-arrays

The whole reason for this change was to get rid of this behavior for arrays so we could have stronger aliasing guarantees on arrays. I'd be fine with disallowing reinterpretarray from data types that contain padding (or throwing an error when accessing those indicies).

chethega · 2018-02-06T18:48:19Z

Can't a reinterpret-array be implemented exactly the same way as a base-array, just with the twist that it has looser aliasing guarantees? That would also make all performance problems go away, like the 20x-30x slowdown for read-access to an UInt8-reinterpreted originally-UInt64 array. Indeed, can't alias-safety be part of the array type (extra type parameter) on the julia side and a jl_array flag on the C side? That way one doesn't even need new builtins for arrayref/arrayset.

Re disallowing/errors: For me a documentation and maybe a one-time warning on construction of the reinterpret-array would be enough ("Warning: You reinterpret from an array with structure padding; all padding bytes are and stay undefined (even if you manage to set them)"). When reinterpreting into a type with padding the UB is more benign (see discourse), not sure whether it warrants a warning.

Keno · 2018-02-06T18:52:29Z

The performance problem is just a temporary regression that I haven't gotten around to fixing yet, as I've mentioned elsewhere I plan to address that before the release.

Can't a reinterpret-array be implemented exactly the same way as a base-array, just with the twist that it has looser aliasing guarantees?

You could add new pointer load intrinsics that take an array and load an arbitrary value from a byte offset. I've been trying to avoid that because it's a bit annoying and frankly reinterpret is a bit of a code smell, but it is certainly possible.

chethega · 2018-02-06T19:19:20Z

An even easier way could be to just store the parent and implement getindex/setindex! directly via unsafe_load / unsafe_store! to reinterpret(Ptr{S},pointer(parent))? That way the aliasing info could be retained on the llvm side (we are, after all, using the same pointer for the access). Or are we loosing all the nice metadata on the bitcast, and this is what you mean by new pointer intrinsic? (new rich_pointer that preserves all aliasing/alignment metadata)

Sorry if the question is somewhat naive; my understanding of llvm is limited.

Keno · 2018-02-06T19:23:39Z

LLVM annotates alias information on accesses, not on pointers, so that doesn't quite work. What does work is a version of unsafe_* that takes which type it should be considered for AA purposes.

chethega · 2018-02-06T19:48:09Z

Hmm. A low-effort variant might be a less_unsafe_* that is allowed to alias with everything (in the same address space)? That way reinterpret-arrays are still slightly slower than base-arrays whenever aliasing-analysis matters, but the main point of this change was to speed up non-reinterpreted base-arrays, right?

I am understanding llvm right that one could simply emit an empty noalias! scope for these pointer accesses? Or does TBAA override the nolias! scoping rules from https://llvm.org/docs/LangRef.html#id921? Or am I just hopelessly confused?

Keno · 2018-02-06T19:49:46Z

You can just leave off the TBAA on pointer loads, which'll be correct, but also pessimize all other memory accesses in the same function.

chethega · 2018-02-07T12:22:06Z

So the low-effort variant would be that the builtin pointerref and pointerset get an additional boolean argument that turns TBAA on or off; whether this is low-effort is questionable (but also needs to be compared to your plans for fixing performance of the current construction); whether this fixes the reinterpret-array speed compared to whatever you had planned is up to benchmarks and context; additionally one loses the (questionable) flexibility of reinterpret-array working on non-contiguous layouts (except if one kept both versions). But optionally disabling TBAA might be also useful for other people who reinterpret via pointers (e.g. operate on structs that are stored in a UInt8-buffer). Slightly higher effort would be what you said, give it an additional type-argument that is used for TBAA (defaulting to the type of the store/load, with "Nothing" meaning no TBAA, i.e. can alias any type)

I am unsure about the semantics of the noalias annotation for function parameters; if I understand the spec right, these introduce extra noalias scopes upon inlining and propagate to all the pointer loads and stores. Is this used at all in julia? Not sure whether this would become painful.

Thanks for the explanation!

In #25908 it was noted that reinterpreting structures with paddings exposes undef LLVM values to user code. This is problematic, because an LLVM undef value is quite dangerous (it can have a different value at every use, e.g. for `a::Bool` undef, we can have `a || !a == true`. There are proposal in LLVM to create values that are merely arbitrary (but the same at every use), but that capability does not currently exist in LLVM. As such, we should try hard to prevent `undef` showing up in a user-visible way. There are several ways to fix this: 1. Wait until LLVM comes up with a safer `undef` and have the value merely be arbitrary, but not dangerous. 2. Always guarantee that padding bytes will be 0. 3. For contiguous-memory arrays, guarantee that we end up with the underlying bytes from that array. However, for now, I think don't think we should make a choice here. Issues like #21912, may play into the consideration, and I think we should be able to reserve making a choice until that point. So what this PR does is only allow reinterprets when they would not expose padding. This should hopefully cover the most common use cases of reinterpret: - Reinterpreting a vector or matrix of values to StaticVectors of the same element type. These should generally always have compatiable padding (if not, reinterpret was likely the wrong API to use). - Reinterpreting from a Vector{UInt8} to a vector of structs (that may have padding). This PR allows this for reading (but not for writing). Both cases are generally better served by the IO APIs, but hopefully this should still allow the common cases. Fixes #25908

Keno mentioned this issue May 23, 2018

Improve reinterpret #27213

Closed

Keno mentioned this issue Jun 30, 2018

Don't allow reinterprets that would expose padding #27877

Merged

Keno closed this as completed in #27877 Jul 7, 2018

chethega mentioned this issue Oct 11, 2018

ReinterpretArray and nested structure padding #29605

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reinterpret and structure padding #25908

Reinterpret and structure padding #25908

chethega commented Feb 6, 2018

chethega commented Feb 6, 2018

Keno commented Feb 6, 2018

chethega commented Feb 6, 2018

Keno commented Feb 6, 2018

chethega commented Feb 6, 2018

Keno commented Feb 6, 2018

chethega commented Feb 6, 2018

Keno commented Feb 6, 2018

chethega commented Feb 7, 2018

Reinterpret and structure padding #25908

Reinterpret and structure padding #25908

Comments

chethega commented Feb 6, 2018

chethega commented Feb 6, 2018

Keno commented Feb 6, 2018

chethega commented Feb 6, 2018

Keno commented Feb 6, 2018

chethega commented Feb 6, 2018

Keno commented Feb 6, 2018

chethega commented Feb 6, 2018

Keno commented Feb 6, 2018

chethega commented Feb 7, 2018