Typestate pointers API #726

kyouko-taiga · 2023-05-19T12:29:15Z

kyouko-taiga
May 19, 2023
Maintainer

I'd like to design of the API for manipulating pointers in Val. As discussed with other contributors already, we have the opportunity to experiment with typestates to create a "safer" API than Swift. One challenge, therefore, is to balance safety with convenience.

Note: The remainder of this post assumes familiarity with Swift's design.

Pointer properties

There are three ways to obtain a pointer:

Get a pointer to some existing storage
Get a pointer to newly dynamically allocated memory
Get a pointer returned from a (possibly foreign) library function

Case (3) is a distinct because (1) and (2) will let us infer much more information about the properties of a pointer and the value(s) it references.

Because Val is statically typed, we can know the type of the referenced memory if we get a pointer to existing storage and the size of the referred memory by querying the metatype of the pointee's type. If we allocate memory dynamically, we can know the alignment of the pointer.

The memory referenced by a pointer can be in either initialized, uninitialized, or undetermined.
Uninitialized memory must be initialized before it is accessed for reading (i.e., with capability let, inout, or sink). Initialized memory must be deinitialized before it is accessed for re-initialization (i.e., with capability set) or deallocated.

Finally, a pointer may or may not hold the capability to modify its referenced memory.

Based on those properties, we can start with the following set of types:

[Initialized|Uninitialized][Mutable]Pointer<T>: pointer to memory of type T.
[Initialized|Uninitialized][Mutable]RawPointer: pointer to untyped memory.

The Initialized and Uninitialized qualifiers indicate whether the pointer is known to refer to initialized or uninitialized memory, respectively. A pointer without any of these qualifiers is considered to be referencing undetermined memory.

The Mutable qualifier indicates whether the pointer is allowed to modify the referenced memory. (Note: a non-Mutable pointer may reference memory that is mutable through another pointer.)

There are cases where the size of an allocation can only be known at run-time. A recurring pattern in C APIs, for example, is to call a function that returns a pointer to an array along with the number of elements in that array.

APIs

I will now describe the APIs I envision for pointers, starting with the basis and then commenting on some convenient extensions.

Basis

I propose the following basis for the types listed above:

/// A type representing a pointer to some allocation.
public trait PointerTrait {}

/// A pointer to typed, mutable storage.
public type MutablePointer<Pointee>: PointerTrait {

  /// Returns a copy of `self` assumed to be referencing initialized memory.
  public unsafe fun assumed_initialized() sink -> InitializedMutablePointer<Pointee>

  /// Returns a copy of `self` assumed to be referencing uninitialized memory.
  public unsafe fun assumed_uninitialized() sink -> UninitializedMutablePointer<Pointee>

}

/// A pointer to typed, initialized, and mutable storage.
public type InitializedMutablePointer<Pointee>: PointerTrait {

  /// Returns the result of calling `action` with the value referenced by `self`.
  public fun with_pointee<E, R>(_ action: inout ([E](let T) inout -> R)) -> R
  
  /// Returns the result of calling `action` with the value referenced by `self`.
  public fun with_mutable_pointee<E, R>(_ action: inout ([E](inout Pointee) inout -> R)) -> R

  /// Returns `(p, r)` where `r` is the result of calling `action` with the value referenced by
  /// `self` and `p` is a pointer referencing to the storage this value occupied.
  public fun with_consumed_pointee<E, R>(
    _ action: inout ([E](sink Pointee) inout -> R)
  ) sink -> (UninitializedMutablePointer<Pointee>, R)

  /// Projects a pointer to `value`.
  public static subscript to(_ value: inout Pointee): Self { let }

}

/// A pointer to typed, uninitialized, and mutable storage.
public type UninitializedMutablePointer<Pointee>: PointerTrait {

  /// Returns `(p, r)` where `r` is the result of calling `action` with the uninitialized
  /// storage referenced by `self` and `p` is a pointer referencing the same storage.
  public fun with_pointee<E, R>(
    _ action: inout ([E](set Pointee) inout -> R)
  ) sink -> (InitializedMutablePointer<Pointee>, R)

  /// Deallocates the memory previously allocated at `self`.
  public fun deallocate() sink

  /// Allocates memory for an instance of `Pointee` aligned at `alignment`.
  public static fun allocate(
    aligned_at alignment: Int = MemoryLayout<Pointee>.alignement
  ) -> Self

  /// Projects a pointer to `value`.
  public static subscript to(_ value: set Pointee): Self { let }

}

/// A pointer to untyped, mutable storage.
public type MutableRawPointer: PointerTrait {

  /// Returns a copy of `self` assumed to be referencing initialized memory.
  public unsafe fun assumed_initialized() sink -> InitializedMutableRawPointer

  /// Returns a copy of `self` assumed to be referencing uninitialized memory.
  public unsafe fun assumed_uninitialized() sink -> UninitializedMutableRawPointer

  /// Returns a copy of `self` assumed to be referencing memory of type `T`.
  public unsafe fun assumed_typed<T>() sink -> MutablePointer<T>

}

/// A pointer to untyped, initialized, and mutable storage.
public type InitializedMutableRawPointer: PointerTrait {

  /// Returns a copy of `self` assumed to be referencing memory of type `T`.
  public unsafe fun assumed_typed<T>() sink -> InitializedMutablePointer<T>

}

/// A pointer to untyped, uninitialized, and mutable storage.
public type UninitializedMutableRawPointer: PointerTrait {

  /// Returns a copy of `self` assumed to be referencing memory of type `T`.
  public unsafe fun assumed_typed<T>() sink -> UninitializedMutablePointer<T>

  /// Deallocates the memory previously allocated at `self`.
  public fun deallocate() sink

}

/// A pointer to typed storage.
public type Pointer<Pointee>: PointerTrait {

  /// Returns a copy of `self` assumed to be referencing initialized memory.
  public unsafe fun assumed_initialized() sink -> InitializedPointer<Pointee>

  /// Returns a copy of `self` assumed to be referencing uninitialized memory.
  public unsafe fun assumed_uninitialized() sink -> UninitializedPointer<Pointee>

}

/// A pointer to typed, initialized storage.
public type InitializedPointer<Pointee>: PointerTrait {

  /// Returns the result of calling `action` with the value referenced by `self`.
  public fun with_pointee<E, R>(_ action: inout ([E](let T) inout -> R)) -> R

  /// Projects a pointer to `value`.
  public static subscript to(_ value: Pointee): Self { let }

}

/// A pointer to typed, uninitialized storage.
public type UninitializedPointer<Pointee>: PointerTrait {}

/// A pointer to untyped storage.
public type RawPointer {

  /// Returns a copy of `self` assumed to be referencing initialized memory.
  public unsafe fun assumed_initialized() sink -> InitializedRawPointer

  /// Returns a copy of `self` assumed to be referencing uninitialized memory.
  public unsafe fun assumed_uninitialized() sink -> UninitializedRawPointer

  /// Returns a copy of `self` assumed to be referencing memory of type `T`.
  public unsafe fun assumed_typed<T>() sink -> Pointer<T>

}

/// A pointer to untyped, initialized storage.
public type InitializedRawPointer: PointerTrait {

  /// Returns a copy of `self` assumed to be referencing memory of type `T`.
  public unsafe fun assumed_typed<T>() sink -> InitializedPointer<T>

}

/// A pointer to untyped, uninitialized storage.
public type UninitializedRawPointer: PointerTrait {

  /// Returns a copy of `self` assumed to be referencing memory of type `T`.
  public unsafe fun assumed_typed<T>() sink -> UninitializedPointer<T>

}

Note: UninitializedPointer<T> might be a useless abstraction.

Notice that allocate(alignment:) doesn't accept a count argument (unlike in Swift). That's because I think there's a better way to express that information and encode it in the type system. Specifically, Val has a way to represent fixed-size buffers: T[n]. So I think it would make sense to reuse this feature to keep track of allocation sizes for typed pointers:

To allocate a single T, one uses UninitializedMutablePointer<T>.
To allocate multiple Ts, one uses UninitializedMutablePointer<T[n]>.

I can think of two options for untyped pointers. The first is to define a dedicated type parameterized by a size, e.g., UnintializedMutableSizedPointer<n>. The second is to use UninitializedMutablePointer<UInt8[n]> and manually convert to another type as necessary. I propose to choose the second option for the sake of economy.

About access effects on higher-order functions

There's a current limitation in Val's type system that hampers the expressiveness of higher-order functions. Namely, without support for polymorphic effects, it is not possible to define a function that accept a lambda with any arbitrary effect on its environment.

One workaround is to accept a lambda with inout access and wrap every lambda with an immutable environment into another instance at call site. This trick is not not powerful enough to deal with sink parameters, though.

Pointer arithmetic

Additive operations on pointers make sense when those pointers refer to array members. Hence, I propose to only extend [Initialized|Uninitialized][Mutable]Pointer<T> when its pointee is a buffer. Leveraging Val's ability to use generic types as traits, we could provide the following extensions:

extension [Initialized|Uninitialized][Mutable]Pointer where Pointee: Buffer {

  /// Returns a pointer to the element at `position` in the referenced buffer.
  public fun offset(by position: Int) -> [Initialized|Uninitialized][Mutable]Pointer<Buffer.Element>

}

Note: the operation is safe because we can check that position is in bounds.

Subtraction by a pointer is a special case. It is meaningful if and only if the two operands are pointers within the same array. Unless we encode provenance in the type system, I think we have no choice but to define an unsafe subtraction on all pointers of the same type:

extension PointerTrait {

  /// Returns the number of `Pointee` instances that fit between `self` and `other`.
  ///
  /// - Requires: `self` and `other` are pointers to elements of the same buffer.
  public unsafe fun infix- (_ other: Self) -> Int

}

Access

Under Val rules, a member projection causes self to have the same access effect as the projected value. As a result, we can't use an inout subscript to access mutably a pointee referenced by a let-bound pointer.

Nonetheless, we could provide a let projection on Initialized[Mutable]Pointer<T> to access the pointee's value without having to write a lambda, and additionally an inout variant on mutable pointers.

extension InitializedPointer {

  /// Projects the value referenced by this pointer.
  public property pointee: Pointee { let }

}

extension InitializedMutablePointer {

  /// Projects the value referenced by this pointer.
  public property pointee: Pointee { let inout }

}

Initialization and assignment

In most cases, initializing the memory referenced by a pointer simply consists of storing an existing value. For convenience, then, we could provide the following extension:

extension UninitializedMutablePointer {

  /// Initializes the referenced memory with `value`.
  public fun initialize(
    to value: sink Pointee
  ) sink -> @discardable InitializedMutablePointer<Pointee> {
    self.with_pointee(fun(v) { &v = value }).0
  }

}

Similarly, we could provide the following extension for assignment.

extension InitializedMutablePointer {

  /// Assigns the referenced memory to `value`.
  public fun assign(_ value: sink Pointee) {
    self.with_mutable_pointee(fun(v) { &v = value })
  }

}

Outstanding issues

Statically unknown sizes

There are cases where the size of an allocation can only be known at run-time. A recurring pattern, for example, is to call a function that returns a pointer to an array along with the number of elements in that array. In such a situation, we can't use Pointer<T[n]> because we don't know n at compile-time.

Taking inspiration from Swift, we could provide additional abstractions to handle these cases. Because we need one buffer pointer type per combination of pointer properties, I propose to define a single parameterizable abstraction.

/// A pointer to an arbitrarily sized buffer.
public type BufferPointer<P: PointerTrait> {

  /// The base address of this buffer.
  public let base: P

  /// The number of elements in the buffer.
  public let count: Int

}

Partially initialized memory

The proposed typestate model cannot represent partially initialized memory. As it stands, the model is kind of a all or nothing deal. UninitializedMutablePointer<T[n]> and InitializedMutablePointer<T[n]> are pointers to n uninitialized or initialized Ts. There's no way to represent a buffer where only the ith element is initialized.

The conservative approach to deal with this situation would to consider a partially initialized buffer to have an undetermined state and use unsafe conversions to get pointers to individual elements. Here's an example of the whole protocol:

fun use(_ p: InitializedPointer<Int[4]>) {}

public fun main() {
  let p = UninitializedMutablePointer<Int[4]>.allocate()
  let q = MutablePointer(p)
  for i in 0 ..< q.count {
    unsafe q.offset(by: i).assumed_uninitialized().initialize(to: 42)
  }
  let r = unsafe q.assumed_initialized()
  use(.new(r))
}

The allocation creates a pointer p to a fully uninitialized buffer. p is weakened as a pointer to undetermined memory q. In the loop, we create undetermined pointers to single instances using offset(by:), assume they point to uninitialized memory with assumed_unintialized(), and initialize the referenced memory. Out of the loop, we know that all elements have been initialized, so strenghtening q with assumed_initialized() is safe. Finally, we weaken r to drop its mutation capability and call use(_:).

Pointers to opaque types

C APIs often uses pointers to opaque types to represent "handles" to various objects. For example, consider the following C header:

/// A handle to a container instance.
typedef struct somelib_container_t* somelib_container_handle_t;

/// Creates an empty container.
somelib_container_handle_t somelib_container_new();

/// Deletes a container.
void somelib_container_delete(somelib_container_handle_t);

somelib_container_handle_t is a handle that does not expose any information about the object it actually represents.

A simple way to represent such handles is to use untyped pointers, since RawPointer roughly corresponds to const void*. However, this approach erases the tiny bit of type safety offered by handles. In the above example, somelib_container_handle_t is more precise than void*.

In Swift, C handles are typically represented with OpaqueType, which isn't much better than using a raw pointer. The problem, however, is that there is no obvious way to represent the type of an incomplete C struct.

One way to work around this issue could be to represent incomplete C structs as unhabited types on the Val side, this ensuring that no instance can every be created. Unfortunately, Val currently has only one unhabited type. It's called Never and is merely an alias for an empty sum type. We could add Swift-like enums to circumvent this issue.

namespace SomeLib {

  enum Container {}

  @ffi("somelib_container_new")
  container_new() -> Pointer<Container>

}

Unsoundness

It is easy to "cheat" Val safety guards using the API presented in this proposal. For example, consider the following program:

fun eat(_ sink: String) {}

public fun main() {
  let fruit = "Mango"
  let p = InitializedMutablePointer.to[fruit]
  let q = p.copy()
  // <-- projection of `fruit` ends here; `q` is dangling
  eat(fruit)
  print(q.pointee)
}

More generally, copying any pointer is enough to break the relationships Val could use to soundly reason about the assumptions encoded in the typestates. Therefore, I am not sure we can claim any operation on pointers' pointees to be safe.

Other considerations

Alignment

I have considered adding a set of types for pointers with known alignment, but eventually thought such an addition was not worth its complexity.

Provenance

The provenance of a pointer identifies the original allocation from which its value is derived. For example, consider the following C++ program:

int main() {
  int x = 1, y = 2;
  int* p = &x;
  int* q = &y + 1;
  std::cout << p << " " << q << std::endl;
}

It is likely that this program will print the same address twice. However, an optimizer will consider p and q to be distinct pointers to preserve the soundness of their optimizations. The extra information that is used to distinguish pointers to the same address is called provenance. In this example, the provenance of p isn't the same as that of q and therefore p and q are not equal in the eye of an optimizer.

We could encode provenance in the type system even if this information gets eventually erased. Rust has started exploring this direction.

dabrahams · 2023-05-19T16:39:47Z

dabrahams
May 19, 2023
Maintainer

A lot to read here. A few (overly terse) comments after a quick scan:

The empty trait is a red flag for me. There's nothing you can do with that, that I can't do with any, so it is meaningless.
Maybe the root trait for pointers has all the operations, but they are all unsafe (perhaps except for equatability), and then there are various refined traits that refine particular operations into safe ones?
I don't think your allocation scheme using fixed-sized buffers works. We need to be able to allocate space for a dynamic number of Ts.

More generally, copying any pointer is enough to break the relationships Val could use to soundly reason about the assumptions encoded in the typestates. Therefore, I am not sure we can claim any operation on pointers' pointees to be safe.

Alternatively, copyability can be an unsafe operation.

Oh! I think we need unsafe conformances:

conformance SomePointer: unsafe Copyable { ... }

4 replies

kyouko-taiga May 20, 2023
Maintainer Author

The empty trait is a red flag for me. There's nothing you can do with that, that I can't do with any, so it is meaningless.

Although in this particular case a sum type might be more appropriate than a trait (I have to remember Val isn't Swift), I disagree on the principle that empty traits are red flags.

A trait is a way to name to an abstraction that will help us make sense of our code. For example, the signature any PointerTrait -> Int carries more intent than Any -> Int. The fact an abstraction has no additional programmable properties relative to another isn't a reason to reject it. If it were true, a feature like newtype would be useless.

Maybe the root trait for pointers has all the operations, but they are all unsafe (perhaps except for equatability), and then there are various refined traits that refine particular operations into safe ones?

I think we're going in the wrong direction. It is always safe to copy the bytes of a pointer's value. You can also copy the copy ad eternam and never cause any issue. The same can be said about escaping and comparison (LLVM says pointers are Comparable the same way Int is). Only pointer arithmetic isn't defined for all possible pointer values.

An operation shouldn't be unsafe only because it can enable another operation to cause UB in an arbitrarily distant future. Otherwise, we could play the blame game and claim that any safe operation from the pointer's copy to its eventual dereferencing is equally at fault.

In some cases, using a pointer to access its pointee is safe. Unfortunately, tracking the extrinsic relationship between a pointer and its pointee does not fit Val's design philosophy. So I'm starting to think that trying to encode that safety in the type system is a non-goal. In fact, if we provide safe abstractions for common uses, like tail-allocated and ref-counted storage, then I think the only motivation left to use pointers will be to interact with foreign libraries and sidestep a limitation of Val's projections. We can't uphold safety in those situations.

I wouldn't mind if all pointee-accessing operations would be marked unsafe. The typestate system would still have value because it would enshrine the protocol one is supposed to follow to manipulate memory, thereby eliminating some obvious traps.

It's kind of a checklist: you get a RawPointer pointer from a library:

Do I know the type of the referenced memory? assume_typed<T> => Pointer<T>
Do I know that the memory is initialized? assume_initialized => InitializedPointer<T>
...

It's not safe, but it's safer and probably more learnable.

I don't think your allocation scheme using fixed-sized buffers works. We need to be able to allocate space for a dynamic number of Ts.

That is why I proposed BufferPointer<T>. There's a paragraph about it.

dabrahams May 20, 2023
Maintainer

I think you're probably entirely right and my brain has just not be brain-ing very well for the past couple of days. I'll come back to this when I've been able to think it through clearly.

lucteo May 20, 2023
Collaborator

Maybe I'm not fully understanding this, but to me allowing unsoundness is a dangerous path. Val promises safety and I believe that the compiler should strive to guarantee it.

In the example from "Unsoundness" section, I would see some unsafe to be required. Probably, the operation that needs to be marked as unsafe is the one that accesses the value of the pointer. After all, we don't know if that value is still valid, if it's not accessed by some other code at the same time, etc.

kyouko-taiga May 21, 2023
Maintainer Author

allowing unsoundness is a dangerous path

The unsafe API will always create avenues for unsoundness. The point is precisely to allow operation for which soundness can't be proven statically. It is simply not possible to create a type system guaranteeing pointers safe without whole program analysis or imposing heavy restrictions on their use.

Val promises safety and I believe that the compiler should strive to guarantee it.

The compiler promises safety by default: by default, all operations are proven safe statically. You can opt out using unsafe APIs.

In the example from "Unsoundness" section, I would see some unsafe to be required.

Yes, I am suggesting that pointee-accessing operations be always unsafe. Strengthening typestate conversions (e.g., assume_initialized), deallocation, and pointer difference should be unsafe too. Copying and comparison are safe and I believe we can make pointer/integer arithmetics safe (e.g., advancing a pointer to an array member).

lucteo · 2023-05-20T18:01:13Z

lucteo
May 20, 2023
Collaborator

How is Val planning to handle different allocators?

To me, it feels important for certain applications to be able to allocate memory in different ways. Examples: common heap, special heaps, optimized allocators, video memory, different drivers, etc.

A custom allocation also means custom deallocation. This means, that the member deallocate will probably not work with custom allocators.

3 replies

lucteo May 20, 2023
Collaborator

Another example that just came to my mind: how can we allocate stack memory for stackfull coroutines?

kyouko-taiga May 21, 2023
Maintainer Author

We can deal with custom allocators by removing deallocate from the API InitializedPointer so that it becomes a method of the referenced memory's allocator. We could also have deallocate accept an inout parameter. We don't want to store the deallocator with the pointer because that would be an inversion of the whole/part relationship.

There's nothing special about stack memory. We can offer alloca at the Val level without a deallocate method and we are free to do whatever we want at the language implementation level.

dabrahams Jun 13, 2023
Maintainer

It's important that we not invert the whole-part relationship, as C++ did, in dealing with allocators. It resulted in a giant explosion of complexity in the standard library. AFAICT, the principled approach is that an allocator is a kind of collection. Thus, Array is a contiguous-memory allocator (for example). I don't think any of this should affect the pointer API.

dabrahams · 2023-12-27T22:09:59Z

dabrahams
Dec 27, 2023
Maintainer

@kyouko-taiga I think we can close this discussion now, as we have decided to go with a simpler model. The complexity introduced in unsafe code by transforming different kinds of pointers to track type state is thought to be as likely to lead to bugs as it is to help.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Hylo Group

Typestate pointers API #726

{{title}}

Replies: 3 comments 7 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

The Hylo Group

Typestate pointers API #726

kyouko-taiga May 19, 2023 Maintainer

Pointer properties

APIs

Basis

About access effects on higher-order functions

Pointer arithmetic

Access

Initialization and assignment

Outstanding issues

Statically unknown sizes

Partially initialized memory

Pointers to opaque types

Unsoundness

Other considerations

Alignment

Provenance

Replies: 3 comments · 7 replies

dabrahams May 19, 2023 Maintainer

kyouko-taiga May 20, 2023 Maintainer Author

dabrahams May 20, 2023 Maintainer

lucteo May 20, 2023 Collaborator

kyouko-taiga May 21, 2023 Maintainer Author

lucteo May 20, 2023 Collaborator

lucteo May 20, 2023 Collaborator

kyouko-taiga May 21, 2023 Maintainer Author

dabrahams Jun 13, 2023 Maintainer

dabrahams Dec 27, 2023 Maintainer

kyouko-taiga
May 19, 2023
Maintainer

Replies: 3 comments 7 replies

dabrahams
May 19, 2023
Maintainer

kyouko-taiga May 20, 2023
Maintainer Author

dabrahams May 20, 2023
Maintainer

lucteo May 20, 2023
Collaborator

kyouko-taiga May 21, 2023
Maintainer Author

lucteo
May 20, 2023
Collaborator

lucteo May 20, 2023
Collaborator

kyouko-taiga May 21, 2023
Maintainer Author

dabrahams Jun 13, 2023
Maintainer

dabrahams
Dec 27, 2023
Maintainer