Runtime type checking support #8211

roberth · 2023-04-12T16:14:26Z

Is your feature request related to a problem? Please describe.

The details of how to perform runtime type checking are a point of contention
Lazy type checking is too hard, with the effect that functions become more strict than they should be, leading to unnecessary infinite recursions and potentially making performance worse
Strictness of { foo, ... }: functions is not obvious. (this is secondary)

Describe the solution you'd like

A syntax for type checks
- Lazy type checks
- Strict type checks
A clear distinction between strict and lazy arguments
- as in evaluation order, which is highly related, so I think we should include this in the design for consistency
Runtime deduplication of runtime type checks using pointer equality

Introductory example

I don't care much about the syntax specifics, but involving @ lets us not steal used syntax. Don't let that distract you.

# single parameter lambda with strict check
# this checks `a` before evaluating the body
a @! builtins.isString:
  "hello " + a;

This is equivalent to

f' = a:
  assert builtins.isString a; "hello " + a;

Not much gained, I would agree. The real value is in lazy checks.

Lazy checks, how and why

# single parameter lambda with lazy check
# this function returns `{ sayHi = <thunk>; }` before `a` is checked
g = a @? builtins.isString :
    { sayHi = "hello " + a; };

The equivalent would be

g = a:
  let a_ = assert isString a; a;
  in let a = a_; in
  { sayHi = "hello " + a; };

While arguably that expression could be simplified, it illustrates the two points of friction:

having to define a let binding
dealing with two names instead of one

The benefit of lazy checks comes from the improvement to error messages. While strict checks come with a pretty good location, having the root cause close to one trace item earlier, a lack of checks, which is effectively required for lazy check today, culminate deep in some library code, far away from the root cause.

A less obvious benefit also comes from this depth. Usually an argument is propagated a few times before it is used, which means that potentially many functions exist in between, which all have a reason to add a type check. Implemented naively, similar to the code above, this creates a chain of thunks that adds basically no value. This can be prevented by introducing a new type of thunk, tLazyCheck that references the thunk to check, the checking function, and perhaps the location of the check for error reporting (fits after #8209).
However, before constructing a tLazyCheck, we can perform a pointer comparison on the checking function and avoid allocating a thunk if it's already going to be checked by that function. (Doesn't work for alternating check functions, but that seems like it would be rare.) This almost completely removes the memory overhead of lazy checks.

A useful, coherent syntax

Now that we have a syntax for strictness and laziness, we can generalize it to lambdas with formals. This seems like a natural extension based on the preceding.

# this function returns `{ sayHi = <thunk>; }` before `a` is checked
{ a @? builtins.isString, ... }:
  { sayHi = "hello " + a; };

Now that we have a symbol for laziness, we might even extend the language a bit to support lazy formals.

# Return { result = <thunk> } before checking isAttrs a
{ ... }?:
{
  result = a;
}

Module system style checks

The module system performs a non-obvious kind of runtime type checking: it checks everything, but only one Value at a time. (Usually and ideally)

This means that it is non-strict in the sense of evaluation, but strict in the sense of checking everything; almost a paradox when presented this way.

The module system is good at type checking, so this behavior has surprised people when they try to use it as a strict (in the sense of evaluation) type checker.

The @? type checking operator does not support this. Instead, or in addition, we could support this style of type checking. It relies on returning a new value that represents the checked value. @= might be a good symbol, as it effectively creates a new binding.
With it, structures can be checked deeply without loss of laziness, and lazy checks can be combined.
This format also allows parameters to be altered. A non-idempotent check may undo referential transparency if don't remove the optimization of repeated checks.

{ a @= checks.attrsOf checks.str }:
{
  b = a.b;
}

We can guarantee that a.b (and therefore b) is a string or exception. This is also the case if we used a.b through some other functions, but best of all, the exception will originate from the formal above, instead of whatever code triggered the check.

We might remove @? in favor of @= to simplify things.

Syntax details

As mentioned, the syntax I've used can perhaps be improved.

? already has a meaning: default value. Use a different symbol like ~, perhaps? I don't think that steals from path syntax.
~ and ! have an established, compatible meaning in Haskell.

This proposal only extends the function syntax. Operator priorities shouldn't be an issue afaict. I don't think I've proposed syntax that's already part of a possible valid expression.

We might also consider reusing a keyword. Perhaps in instead of @?, let instead of @=, !in instead of @!. This gets a bit ambiguous because of negation though. It also doesn't help much towards a syntax for lazy formals.

Describe alternatives you've considered

Not an alternative but an addition: a half-strict check, that checks eagerly when the value isn't still a thunk. This adds too little value in my opinion, and does not support check deduplication on those non-thunk values, which could lead to surprising performance problems if the check is expensive.

Remove @? in favor of @=.

Additional context

@zimbatm mentioned such an idea to me quite some time ago, that we might store a type with thunks. I think this elaboration shows the feasibility and merit of it.

@hsjobeki has done some research into syntax for type signatures, motivated by hoogle.dev's need for standardization and side benefits for other tooling. He does not propose an addition or change to the language syntax, as that would be mostly counterproductive. Runtime type checking is inherently part of the language though, so improvements to it are library improvements or changes to the syntax.

Priorities

Add 👍 to issues you find important.

The text was updated successfully, but these errors were encountered:

L-as · 2023-04-13T09:52:54Z

Two thoughts:

Maybe laziness should be chosen by user (invoking Nix), not by code.
- Any system like this should be amenable to as much static type checking as possible. We should consider how feasible this is.
  A last point: Is it worth it to invest into the Nix language when we already have Nickel? Seemingly you can automatically convert Nix code without changing semantics trivially. You could also make Nickel's syntax backward compatible with Nix, so no conversion is needed.

roberth added feature Feature request or proposal error-messages Confusing messages and better diagnostics language The Nix expression language; parser, interpreter, primops, evaluation, etc labels Apr 12, 2023

roberth mentioned this issue Apr 12, 2023

lib.path.hasPrefix, lib.path.removePrefix: init NixOS/nixpkgs#210423

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime type checking support #8211

Runtime type checking support #8211

roberth commented Apr 12, 2023

L-as commented Apr 13, 2023

Runtime type checking support #8211

Runtime type checking support #8211

Comments

roberth commented Apr 12, 2023

Introductory example

Lazy checks, how and why

A useful, coherent syntax

Module system style checks

Syntax details

L-as commented Apr 13, 2023