Convincing the compiler that some storage is initialized #1241

kyouko-taiga · 2023-12-23T10:35:49Z

kyouko-taiga
Dec 23, 2023
Maintainer

It's been discussed in Teams meetings that Hylo should have a way to get a pointer to uninitialized storage that can be used for initialization. The main use case is to be able to initialize a piece of memory using a foreign function. For example:

@ffi("initialize_long")
fun initialize_long(_ p: PointerToMutable<Int64>) -> Void

public fun main() {
  var n: Int64
  initialize_long(pointer_for_initialization[to: &n])
  print(n)
}

It is impossible to write such a program in Hylo today. As noted by @dabrahams in #1239:

A set subscript would need to project an uninitialized pointer (not pointer-to-uninitialized) and would expect it to get intialized by the calling code. In a function, we can form the pointer to the uninitialized set parameter but we can never convince the compiler that we've initialized set parameter except by, well, intializing (assigning into) it.

It seems like Dave and I now agree that the feature we really want, at least for the time being, is only the one that lets us implement a Hylo function in C. @ffi does not address this goal because the purpose of this feature is to automatically write the glue code mapping Hylo's types to their C counterparts (e.g. Int8 -> char) and adapt calling conventions. Instead, what we want is to let the compiler assume that the body of a function declared in Hylo can be implemented externally and provided at link time.

If we still wanted the ability to "convince" the compiler that something is initialized, we could add an the unsafe primitive to do that (see my answer in #1239), but I think it is unnecessary. I'd like to explain why in case we revisit this topic in the future.

First, I consider the feature very unsafe because it can easily violate assumptions that are harder to break with other unsafe operations. If you can convince the compiler that some stack-allocated memory is initialized, it will run a deinitializer at the end of its useful lifetime. That is unlike dynamically allocated memory that you have to babysit until you decide it can be reclaimed. In other words, I believe that one takes on more responsibility when they write PointerToMutable<T>.allocate(count: n) than when they write var x: T[n], and this extra responsibility justifies the additional control one gets.

Second, if we put language interoperability aside, I am not convince there are enough compelling use cases for this feature. Most of them are covered by Union<A, B> which can do the job safely and probably more efficiently, in part because in general one doesn't have intimate knowledge about the memory layout of A and B and in part because we can teach the compiler about the semantics of Union<A, B>. The exception would be the cases where one can dispense with the discriminator because some other part of their business logic can decide what's in the union. Those cases exists, but I think one should write them using buffers of bytes. Existing examples include our UTF8Array.

We can also imagine writing some helper types around this idea, or come up with a list of recipes for developers on a bit budget. The following shows how one can implement a type whose payload can be initialized later without compiler oversight. Few features are missing to compile it, all are on the roadmap.

public type DeferredInitialized<T: Movable & Deinitializable> {

  @aligned(MemoryLayout<T>.alignment())
  var payload: Int8[MemoryLayout<T>.size()]

  public init() {
    &payload = .new(fill: (_, x) => &x = 0)
  }

  public unsafe_initialize(_ v: sink T) {
    let p = PointerToMutable<T>(base: Builtin.address(of: payload))
    p.unsafe_initialize_pointee(v)
  }

  public property unsafe_initialized_value: T {
    let {
      let p = Pointer<T>(base: Builtin.address(of: payload))
      yield p.unsafe[]
    }
    inout {
      let p = PointerToMutable<T>(base: Builtin.address(of: payload))
      yield &p.unsafe[]
    }
    sink {
      let p = PointerToMutable<T>(base: Builtin.address(of: payload))
      return p.unsafe_pointee()
    }
  }

  public fun unsafe_deinitialize() sink {
    unsafe_initialized_value.deinit()
  }

}

Sure, your code will have to deal with DeferredInitialized<T> rather than just T but I think the notional indirection is justified. It compels you to think more carefully about the initialization state of the payload when you want to interact with it and when your instances go out of scope. The memory reinterpretation shenanigans could also probably be optimized away.

lucteo · 2023-12-23T10:53:10Z

lucteo
Dec 23, 2023
Collaborator

For system programming, the DeferredInitialized solution would still be too costly:

We should not need to fill the array of bytes with 0.
We should not require an extra move to initialize the object.

Imagine that (for some reason) I want to implement an Array that has local storage and doesn't do heap allocation. In that case, the memory layout would be something like [size, e1, e2, e3, ..., e100]. To initialize such an object, we only need to touch the bytes for the size part of the object, and don't touch the bits for the other elements. Also, the move of such an object would be potentially costly as we are moving each element in turn. Constructing the object in place, would make more sense.

3 replies

kyouko-taiga Dec 23, 2023
Maintainer Author

We don't have to pay the move. We can write an initializer that takes a lambda accepting a set parameter to build the object in place (c.f. the emplace_back discussion.) As for the zero-initialization, well, reading from uninitialized storage has undefined behavior. So the buffer of your array would have to be some kind of UnsafeBuffer<T, n> whose most API would be unsafe. I would be more inclined to offer a Swift-like with_temporary_allocation(capacity:do:) function because it would make the responsibility to deal with the memory yourself more explicit.

Regardless of the specifics of this discussion, I think there is a point at which we won't be able to promise the ability to create safe abstractions and unlimited powers for low-level bit trickery at the same time. If we can't label Hylo as a language for systems programming without being able to do absolutely everything one can write in C, then I'm not sure I want the label.

I'd also like to remind that so far we've driven most of our design decisions with concrete use cases. I can't say that there exist no possible situation in which one may want to use Builtin.mark_state, there likely is, but until we find a legit one that can't be worked around with the techniques I have mentioned (and perhaps others), then we should not rush to add any new unsafe feature.

lucteo Dec 25, 2023
Collaborator

We don't have to pay the move. We can write an initializer that takes a lambda accepting a set parameter to build the object in place (c.f. the emplace_back discussion.)
Probably I'm not fully understanding what you are saying here, but I read the above as a mechanism for providing something equivalent to a pointer to an uninitialised object (that I can later initialise in the lambda I'm providing). I think this solves my needs. It does that by making things more complex (i.e., making compilation times longer), and probably without actually improving safety (I can do all the unsafe things I wanted with pointers to uninitialised storage).

As for the zero-initialization, well, reading from uninitialized storage has undefined behavior.
In my example, I was not proposing to read uninitialised storage. I was proposing to read from the allocated buffer only the amount that we've initialised. If we did not store any element, we should initialise & use only the size part; if we have 3 elements, we should be reading only [size, e1, e2, e3], and so on.

I'd also like to remind that so far we've driven most of our design decisions with concrete use cases.
I think we are discussing about three concrete examples here (and in the concurrency discussions):

the above example with Array that doesn't use dynamic memory
the concurrency example in which the spawn frame is initialised by calling an external function
the concurrency example in which the awaited value is initialised and the end of the spawned computation.

In all of these examples, we are essentially finding a to give the user access to memory to initialise it later, and the compiler would have to trust that the objects are properly initialised (especially if we want to implement these as efficient as possible). This is exactly the usecase of providing a pointer to uninitialised memory.

kyouko-taiga Dec 25, 2023
Maintainer Author

The non-dynamic array example is not compelling. If you can't pay for zero-initialization, then I doubt that you can pay to check bounds to access/append/remove elements. There are unavoidable run-time costs to use a memory-safe resizable collection unless your type system can provide much stronger guarantees than the ones we can. If you're ready to abandon safety, then you should use whatever mechanism we'll provide to let you write alloca.

The concurrency examples will eventually be implemented by the compiler, which will have the privilege to initialize things whenever it wants. So I'm not convinced you need a new unsafe feature here neither. Besides, the fact that you can use the new external functions to write your library is proof that your interop use cases are already covered.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Hylo Group

Convincing the compiler that some storage is initialized #1241

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

The Hylo Group

Convincing the compiler that some storage is initialized #1241

kyouko-taiga Dec 23, 2023 Maintainer

Replies: 1 comment · 3 replies

lucteo Dec 23, 2023 Collaborator

kyouko-taiga Dec 23, 2023 Maintainer Author

lucteo Dec 25, 2023 Collaborator

kyouko-taiga Dec 25, 2023 Maintainer Author

kyouko-taiga
Dec 23, 2023
Maintainer

Replies: 1 comment 3 replies

lucteo
Dec 23, 2023
Collaborator

kyouko-taiga Dec 23, 2023
Maintainer Author

lucteo Dec 25, 2023
Collaborator

kyouko-taiga Dec 25, 2023
Maintainer Author