Typestate pointers API #726
Replies: 3 comments 7 replies
-
A lot to read here. A few (overly terse) comments after a quick scan:
Alternatively, copyability can be an unsafe operation. Oh! I think we need unsafe conformances:
|
Beta Was this translation helpful? Give feedback.
-
How is Val planning to handle different allocators? To me, it feels important for certain applications to be able to allocate memory in different ways. Examples: common heap, special heaps, optimized allocators, video memory, different drivers, etc. A custom allocation also means custom deallocation. This means, that the member |
Beta Was this translation helpful? Give feedback.
-
@kyouko-taiga I think we can close this discussion now, as we have decided to go with a simpler model. The complexity introduced in unsafe code by transforming different kinds of pointers to track type state is thought to be as likely to lead to bugs as it is to help. |
Beta Was this translation helpful? Give feedback.
-
I'd like to design of the API for manipulating pointers in Val. As discussed with other contributors already, we have the opportunity to experiment with typestates to create a "safer" API than Swift. One challenge, therefore, is to balance safety with convenience.
Note: The remainder of this post assumes familiarity with Swift's design.
Pointer properties
There are three ways to obtain a pointer:
Case (3) is a distinct because (1) and (2) will let us infer much more information about the properties of a pointer and the value(s) it references.
Because Val is statically typed, we can know the type of the referenced memory if we get a pointer to existing storage and the size of the referred memory by querying the metatype of the pointee's type. If we allocate memory dynamically, we can know the alignment of the pointer.
The memory referenced by a pointer can be in either initialized, uninitialized, or undetermined.
Uninitialized memory must be initialized before it is accessed for reading (i.e., with capability
let
,inout
, orsink
). Initialized memory must be deinitialized before it is accessed for re-initialization (i.e., with capabilityset
) or deallocated.Finally, a pointer may or may not hold the capability to modify its referenced memory.
Based on those properties, we can start with the following set of types:
[Initialized|Uninitialized][Mutable]Pointer<T>
: pointer to memory of typeT
.[Initialized|Uninitialized][Mutable]RawPointer
: pointer to untyped memory.The
Initialized
andUninitialized
qualifiers indicate whether the pointer is known to refer to initialized or uninitialized memory, respectively. A pointer without any of these qualifiers is considered to be referencing undetermined memory.The
Mutable
qualifier indicates whether the pointer is allowed to modify the referenced memory. (Note: a non-Mutable
pointer may reference memory that is mutable through another pointer.)There are cases where the size of an allocation can only be known at run-time. A recurring pattern in C APIs, for example, is to call a function that returns a pointer to an array along with the number of elements in that array.
APIs
I will now describe the APIs I envision for pointers, starting with the basis and then commenting on some convenient extensions.
Basis
I propose the following basis for the types listed above:
Note:
UninitializedPointer<T>
might be a useless abstraction.Notice that
allocate(alignment:)
doesn't accept acount
argument (unlike in Swift). That's because I think there's a better way to express that information and encode it in the type system. Specifically, Val has a way to represent fixed-size buffers:T[n]
. So I think it would make sense to reuse this feature to keep track of allocation sizes for typed pointers:T
, one usesUninitializedMutablePointer<T>
.T
s, one usesUninitializedMutablePointer<T[n]>
.I can think of two options for untyped pointers. The first is to define a dedicated type parameterized by a size, e.g.,
UnintializedMutableSizedPointer<n>
. The second is to useUninitializedMutablePointer<UInt8[n]>
and manually convert to another type as necessary. I propose to choose the second option for the sake of economy.About access effects on higher-order functions
There's a current limitation in Val's type system that hampers the expressiveness of higher-order functions. Namely, without support for polymorphic effects, it is not possible to define a function that accept a lambda with any arbitrary effect on its environment.
One workaround is to accept a lambda with
inout
access and wrap every lambda with an immutable environment into another instance at call site. This trick is not not powerful enough to deal withsink
parameters, though.Pointer arithmetic
Additive operations on pointers make sense when those pointers refer to array members. Hence, I propose to only extend
[Initialized|Uninitialized][Mutable]Pointer<T>
when its pointee is a buffer. Leveraging Val's ability to use generic types as traits, we could provide the following extensions:Note: the operation is safe because we can check that
position
is in bounds.Subtraction by a pointer is a special case. It is meaningful if and only if the two operands are pointers within the same array. Unless we encode provenance in the type system, I think we have no choice but to define an unsafe subtraction on all pointers of the same type:
Access
Under Val rules, a member projection causes
self
to have the same access effect as the projected value. As a result, we can't use aninout
subscript to access mutably a pointee referenced by alet
-bound pointer.Nonetheless, we could provide a
let
projection onInitialized[Mutable]Pointer<T>
to access the pointee's value without having to write a lambda, and additionally aninout
variant on mutable pointers.Initialization and assignment
In most cases, initializing the memory referenced by a pointer simply consists of storing an existing value. For convenience, then, we could provide the following extension:
Similarly, we could provide the following extension for assignment.
Outstanding issues
Statically unknown sizes
There are cases where the size of an allocation can only be known at run-time. A recurring pattern, for example, is to call a function that returns a pointer to an array along with the number of elements in that array. In such a situation, we can't use
Pointer<T[n]>
because we don't known
at compile-time.Taking inspiration from Swift, we could provide additional abstractions to handle these cases. Because we need one buffer pointer type per combination of pointer properties, I propose to define a single parameterizable abstraction.
Partially initialized memory
The proposed typestate model cannot represent partially initialized memory. As it stands, the model is kind of a all or nothing deal.
UninitializedMutablePointer<T[n]>
andInitializedMutablePointer<T[n]>
are pointers ton
uninitialized or initializedT
s. There's no way to represent a buffer where only thei
th element is initialized.The conservative approach to deal with this situation would to consider a partially initialized buffer to have an undetermined state and use unsafe conversions to get pointers to individual elements. Here's an example of the whole protocol:
The allocation creates a pointer
p
to a fully uninitialized buffer.p
is weakened as a pointer to undetermined memoryq
. In the loop, we create undetermined pointers to single instances usingoffset(by:)
, assume they point to uninitialized memory withassumed_unintialized()
, and initialize the referenced memory. Out of the loop, we know that all elements have been initialized, so strenghteningq
withassumed_initialized()
is safe. Finally, we weakenr
to drop its mutation capability and calluse(_:)
.Pointers to opaque types
C APIs often uses pointers to opaque types to represent "handles" to various objects. For example, consider the following C header:
somelib_container_handle_t
is a handle that does not expose any information about the object it actually represents.A simple way to represent such handles is to use untyped pointers, since
RawPointer
roughly corresponds toconst void*
. However, this approach erases the tiny bit of type safety offered by handles. In the above example,somelib_container_handle_t
is more precise thanvoid*
.In Swift, C handles are typically represented with
OpaqueType
, which isn't much better than using a raw pointer. The problem, however, is that there is no obvious way to represent the type of an incomplete C struct.One way to work around this issue could be to represent incomplete C structs as unhabited types on the Val side, this ensuring that no instance can every be created. Unfortunately, Val currently has only one unhabited type. It's called
Never
and is merely an alias for an empty sum type. We could add Swift-like enums to circumvent this issue.Unsoundness
It is easy to "cheat" Val safety guards using the API presented in this proposal. For example, consider the following program:
More generally, copying any pointer is enough to break the relationships Val could use to soundly reason about the assumptions encoded in the typestates. Therefore, I am not sure we can claim any operation on pointers' pointees to be safe.
Other considerations
Alignment
I have considered adding a set of types for pointers with known alignment, but eventually thought such an addition was not worth its complexity.
Provenance
The provenance of a pointer identifies the original allocation from which its value is derived. For example, consider the following C++ program:
It is likely that this program will print the same address twice. However, an optimizer will consider
p
andq
to be distinct pointers to preserve the soundness of their optimizations. The extra information that is used to distinguish pointers to the same address is called provenance. In this example, the provenance ofp
isn't the same as that ofq
and thereforep
andq
are not equal in the eye of an optimizer.We could encode provenance in the type system even if this information gets eventually erased. Rust has started exploring this direction.
Beta Was this translation helpful? Give feedback.
All reactions