-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Support function parameter type constraints/wildcards #9723
Comments
Over in #9260, there was a lot of discussion about how to make something like Introducing proper parametric types is a possible way out, but then we have two different systems for generic types: One can still return anonymous types from comptime functions, which gives introspection, reification and custom |
Thanks for the reference! I hadn't seen that issue yet. Reading through it was informative for the challenges of supporting this kind of inference without explicitly separating type constructor functions syntactically. One "automatic" equivalent for this proposal would be to enable type-inference only for the inner-most function including a
I definitely agree that adding On the other hand, I'm not sure that I agree that there's a feature gap between Can you spot a counter-example of something that's supported by |
I was thinking along the lines of additional constraints and comptime checks on the type parameter. But you're right, those can be implemented with helper functions, or even inline: const Foo = struct(T: type) {
value: if (meta.trait.isNumber(T)) T else @compileError("Not a number!"),
}; One thing you can't do with parametric structs is add or omit fields depending on the inputs, but I don't think this is used much anyway. There might be more, but I can't think of anything right now. I'm now sort of in favor of this proposal. The idea to build a generic type system from first-class types and compile-time code execution is both clever and powerful, but it creates a lot of dark corners semantically, and makes many things that work very simply in other languages either difficult or impossible. From a usability perspective, having a simple parametric type system, and maybe interfaces, would remove all magic from 95% of the use cases. Personally, I think that would be an appropriate power-complexity tradeoff for a language that aims to be a C replacement. |
Thanks for the input so far! To continue fleshing things out, I wanted to add another potential extension to this proposal. Part C: Generalized "Fat" PointersThe main idea here is that This entails a syntax for slices: Since With that connection in mind, we can generalize this to lots of other fat pointer types, as follows: Extending slices to multi-dimensional arraysWith the explicit
This can even be generalized to tensor types by introducing a most general n-dimensional array type: StridedTensor = struct(T: type, N: usize, dims: [N]usize, strides: [N]usize) Notice that this "parent" type can parameterize all of the above types. Furthermore, it permits us to have slice types with a dynamic number of dimensions:
N-D slices are useful for certain varieties of scientific computation and statistical computation (for instance, computing the variance along an axis, performing a generalized matrix mult./sum, or dynamically re-shaping structured data) Strided N-D slices support "views" of N-D slices. Such a type would be generated when slicing n-dimensional arrays (e.g. As one would expect, these behave as one big parameterized type "family". The above type aliases for arrays/tensors would all be "built-in" to Zig, but it's reasonable to ask how similar "aliases" can be defined for user-defined types. Here's a gist with some details Fat pointers to user-defined typesThis also provides a method to support C-style "flexible array members" in a safe way (see #173): const FlexibleType = struct(N: usize) {
foo: f32,
bar: [10]u32,
flex: [N]f32,
};
const foo = fn(x: *FlexibleType(var)) void {
// impl...
} or to enforce invariants between arrays/slices within a type: const ConstrainedArrays = struct(N: usize, M: usize, K: usize) {
M1: *[N][M]u32,
M2: *[M][K]f32,
}; The ABI implications for supporting runtime type parameters like this are very significant, in particular because it can mean that offset/size calculations require computation, slowing down field access significantly. In order to control this, it seems reasonable to enforce:
Just as with slices/arrays, a raw
† The cast from Note that these are the same rules that would be used for casting slices/arrays, as well. Other ExtensionsIt might make sense to consider expanding This still needs more careful thought before it’s ready to be considered as a proper proposal, though. Furthermore, if we do expand Are these generalizations worth it?Generalized slicesMulti-dimensional slice types, partially-constrained arrays, and n-dimensional tensors appear to bring high value for structured computation, esp. smoother inter-op with heterogeneous targets (including SIMD, GPU, etc.). For these types, the syntax change is relatively small and intuitive. Furthermore, there is a clear boundary to the syntax/feature: " Fat pointers to user-defined typesOn the other hand, expanding this to user-defined fat pointers seems comparatively limited in value, at least without a use case on hand. I left it included here for completeness, particularly w.r.t. C inter-op, but it's unlikely to meet Zig's standards for complexity/power.
On the optimistic side of things, any ideas that allow a user to explicitly opt-in to a broader range of fat-pointers, with predictable performance trade-offs, may make this feature more appealing (perhaps a re-parameterization of |
Since there are quite a few concepts in play now, here's a quick glossary in plain English. Any "hole" in a type pattern has 3 different "ABI modes" that affect casting and specialization:
In some sense, Meanwhile, Together, these form the 3 primary " Note: * |
This proposal provides some insight to a problematic const Any = struct { v: anytype };
comptime {
var a = Any{ .v = @as(f32, 4) };
const pf: *f32 = &a.v; // this should probably work, right?
const pa = &a.v; // is this the same as pf? Or can it change the type of a.v?
a.v = @as(u32, 10);
const wtf = pf.*; // what does this do?
} To support this use-case, we can allow the “type patterns” above in the place of the usual type for a declared variable. We say that The crucial comptime {
// `any` in a type means that there's some comptime-known substitution
// and that this may _change_ across the flow of the program
// We start with a concrete type (f32), but it's allowed to change
var a: any = @as(f32, 4);
const pf: *any = &a; // pf is allowed to point at any number of types
// (the compiler will track the active type)
const pa = &a; // This could be inferred as `*any` or `*f32`
a = @as(u32, 10); // This changes the underlying type of the comptime variable
// If pa is `*f32`, compile error now or upon de-referencing/observing pa
const wtf = pf.*; // This de-references `pf` based on its current "concrete" type, which is `*u32`
} or if we still want to use a const Wrapper = struct(T: type) { v: T };
comptime {
// We start with a concrete type == Wrapper(f32), but it's allowed to change
var a: Wrapper(any) = Wrapper(f32){.v = @as(f32, 4)};
const pf: *Wrapper(any) = &a.v;
const pa = &a.v; // This could be inferred as `*Wrapper(any)` or `*Wrapper(f32)`
a.v = @as(u32, 10); // This changes the underlying type of the comptime variable
// If pa is `*Wrapper(f32)`, compile error now or upon de-referencing/observing pa
const wtf = pf.*; // This de-references `pf` based on its current "concrete" type, which is `*Wrapper(u32)`
} I'm not sure if I'm sold on these dynamically-typed behaviors in general. Nonetheless, this was a quick demonstration of what they might look like in combination with this proposal. |
Do you think the cpp solution can be useful? template <typename T>
concept Addable = requires(T x)
{
x + x;
};
template <Addable T>
T sum(T a, T b)
{
return a + b;
}
template <typename T> requires Addable<T>
T add5(T a)
{
return a + 5;
} |
Motivation
The goals of this proposal are to:
comptime
polymorphism more symmetric, in terms of safety and syntaxThe key mechanism I propose is a limited, predictable form of pattern matching for types in function parameters, which is intended to allow straightforward enforcement of type constraints at compile-time and run-time.
Proposal
Part A: Introduce
any T
as a type constructor wildcardPattern matching at compile-time amounts to constrained
comptime
polymorphism. This can make it easier to infer function behavior from its prototype alone.Compare the monomorphic version of a basic
append
function versus its generic counterparts:These functions are intended to generalize
u8_append
by allowing variance over the inner type parameteru8
. However, the programmer must make a difficult choice: Either they writegeneric_append1
, providing an uninformative function signature which support convenient type inference, or they writegeneric_append2
with an informative function signature and automatically-enforced constraints, but which requires explicitly passing in the correct type separately.Wouldn't it be nice if we could just express this variance directly? We'd ideally like to be able to write functions like:
As you can see, by allowing implicit equality constraints between wildcards, this also provides a simple declarative syntax for expressing certain parameter type requirements, such as when two types must agree, and allows expressing some parameter-dependent return types more directly (avoiding complex
@TypeInfo(TypeOf(param))
expressions in some cases).Changes required
At first glance, this inference appears to be a basic unification problem. However, the proposal above is incompatible with Zig’s existing
fn(…) type
type constructors, since multiple functions may return instances of the samestruct {…}
definition and also perform non-invertible transformations on their parameters.So naively adding pattern matching like this for
fn(…) type
is undecidable for the compiler, and even if it weren’t, it would be confusing and unpredictable to the user, given multiple overlapping “interfaces” to one underlying type.The problem here is that
type
objects have no canonical constructor function, whose parameters we could associate with thetype
and use for unification. To enable this, we take inspiration from #1717 and borrow thefn
syntax forstruct
.A
struct
block can now take its own set of parameters:FixedArray
is called a type constructor function (atype(...)
), distinguishing these from fully-constructed types (e.g.FixedArray(u8, 5)
oru8
) and existing function-returning types (fn(…) type
).The key restriction that makes this proposal possible is that we only allow inferring parameters to
type(...)
functions, not tofn(...) type
functions. In the compiler, a constructedtype
saves the parameters provided to itstype(...)
function, so that these can be used for inference at the call site to a function as above.In general, a
type(A,B,C)
can be used like afn(A,B,C) type
, except that it supports type inference. With this restriction, the inference problem is decidable for the compiler - it becomes “word matching”, with the input type containing no metavariables. It’s also intended to be easy to understand for the user (feedback welcome, of course).Extending to
union
For completeness,
union
will need the same treatment asstruct
, but we already have aunion(tag)
syntax for tagged unions.Thankfully, the existing syntax isn't incompatible with a naive solution, although it is a bit awkward:
Any suggestions to improve this syntax are more than welcome.
Backwards compatibility
Existing usage of
fn(...) type
would be backwards-compatible with this change, but thesetype
objects are nottype(...)
functions, so they cannot be used for inference. To support inference, the module author has to expose atype(...)
function, either by defining it at the top-level or, less commonly, returning one from a function unevaluated.Part B: Runtime-Erasure via
erased T
As an extension to this proposal, we can support
erased T
to indicate an inferred parameter that should be constrained but not used to specialize the function. Note that this is the typical pattern used for runtime polymorphism.To put some familiar examples in the new terminology:
void *
as a run-time polymorphic type is really a* erased T
[*]T
, can be viewed as*[erased]T
.In each of these cases, the types pass through a function which “forgets” a certain part of the concrete type it operates on, and then delegates to other functions implemented on the concrete (fully-specified) type.
Mechanics
When
erased T
is used as a wildcard pattern for a function parameter, any constraints onT
are still enforced statically at the call-site, as usual. However,T
is not allowed to be used inside the function, and the function is not compile-time specialized (i.e. monomorphized) onT
.This feature is useful, for example, for ensuring at compile-time that multiple parameters are type-compatible, without specializing a function at
comptime
:This can also be used to make sure that a method dispatch table (i.e. vtable) is compatible with an object it is used with. It also makes the use of run-time vs.
comptime
polymorphism more symmetric.Here’s an example, where the “vtable” is a single function pointer:
As you’d expect with run-time polymorphic interfaces,
erased
types live behind pointers since their sizes are unknown. Such a pointer must be cast to a non-erasedtype
before being dereferenced.Note: In theory, it may be possible to dereference and use an
erased
type for a limited set of accesses/operations, provided that the operation is solely a function of its non-erased parameters. However, such behavior is considered out-of-scope for this proposal.Debug Safety
I believe this opens the door to additional run-time safety in debug builds: For instrumentation, the compiler should be able to replace an
erased
pointer with a fat pointer describing itserased
parameters. Then, when a specialized functionfn(x: *A)
is cast to anerased
function pointer*fn(x: * erased T)
, it can be wrapped in a check that inspects the fat pointer information to make sure that the underlying type isA
.Appendix: Type constructor functions as variables ("Sandwich" Inference)
"Sandwich" inference is when one of the type constructor functions in the signature is a metavariable to be inferred.
The sandwich part refers to the fact that the variables can appear in-between constant terms, rather than just at the top-most level of parameters. This can lead to pretty strange and complex patterns:
This inference is perfectly decidable as a unification problem, but these patterns often feel more confusing than helpful. For that reason, it may be preferable to forbid sandwich inference, at least until a clear use case arises.
Appendix: Syntax and keywords
In order to disambiguate between inferred parts of a term versus constants, we should require
erased
/any
for every appearance of a metavariableT
. Of course, a metavariableT
should never shadow a constant.The remaining details of the syntax are much more debatable, in particular the usage of
erased
/any
. There are other variations we might consider, as well:I've been preferring Option 1 on the grounds that Option 2 is too verbose and Option 4 is misleading by labelling comptime specialization with the
var
keyword. Option 3 is a good candidate as well, since it doesn't require introducing a new keyword, but it seems unintuitive to have a potentially-constrained metavariable labelled withundefined
, even if it's true that you're not allowed to use it within the function.Of course, this discussion is closely related to #5893
If this proposal is accepted, we might also want to consider an alternative slice syntax such as
*[var]T
, since many newcomers to the language seem to be suprised that[]T
is actually a reference type.Appendix: @TypeInfo/@type
@TypeInfo
/@Type
would be supported exactly as they are today. In particular, they would not support creating atype(...)
function.There are places in the standard library (e.g.
enums.zig
,mem.zig
,meta.zig
) that use@Type
today. These use cases would benefit from a built-in@TypeFunction()
which converts afn(...) TypeInfo
function into atype(...)
function, if inference is desired for these families of reified types.The text was updated successfully, but these errors were encountered: