Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thunks that resolve to procs fail to preserve specialization information #4734

Closed
shritesh opened this issue Dec 11, 2022 · 5 comments · Fixed by #4907
Closed

Thunks that resolve to procs fail to preserve specialization information #4734

shritesh opened this issue Dec 11, 2022 · 5 comments · Fixed by #4907
Assignees
Labels
bug Something isn't working mono Relates to the Monomorphization compiler stage

Comments

@shritesh
Copy link
Contributor

shritesh commented Dec 11, 2022

I'm writing a Random Number Generator and wanted to have an easy way to compose generators, so I implemented the following andThen function, that takes in an existing generator, it's init and a "mapping function" for the new generator. This typechecks but panics during codegen.

https://github.com/shritesh/raytrace.roc/blob/shritesh/broken-generator/RNG.roc#L5-L39

Generator a b c : RNG, a, (RNG, b -> c) -> c

# Simple linear congruential generator
RNG := U32

default : RNG
default = @RNG 0

andThen : Generator a b c, a, (d, b -> e) -> Generator d e c
andThen = \generator, init, fn ->
    \rng, newInit, newFn ->
        newRng, value <- generator rng init
        newFn newRng (fn newInit value)

u32 : Generator {} U32 *
u32 = \@RNG seed, {}, fn ->
    value =
        seed
        |> Num.mulWrap 1664525
        |> Num.addWrap 1013904223

    fn (@RNG value) value

real : Generator {} F64 *
real =
    {}, u32Value <- andThen u32 {}
    max = Num.toF64 Num.maxU32 |> Num.add 1

    Num.toF64 u32Value / max

between : Generator { min : F64, max : F64 } F64 *
between =
    { min, max }, realValue <- andThen real {}

    min + (max - min) * realValue

The panic message is:

thread 'main' panicked at 'internal error: entered unreachable code: symbol/layout `21.IdentId(21)` ProcLayout {
    arguments: [
        Builtin(
            Int(
                U32,
            ),
        ),
        Builtin(
            Int(
                U32,
            ),
        ),
        LambdaSet(
            LambdaSet {
                set: [
                    ( 21.21, [LambdaSet(LambdaSet { set: [( 21.27, [])], representation: Interned(1, PhantomData) }), Struct { field_order_hash: FieldOrderHash(0), field_layouts: [] }, LambdaSet(LambdaSet { set: [( 16.50, [Builtin(Int(U64)), Builtin(Int(U64)), Struct { field_order_hash: FieldOrderHash(18117670831054429132), field_layouts: [Struct { field_order_hash: FieldOrderHash(15342153996709418787), field_layouts: [Builtin(Float(F64)), Builtin(Float(F64)), Builtin(Float(F64))] }, Builtin(Int(U32))] }, Builtin(Float(F64))])], representation: Interned(6, PhantomData) })]),
                ],
                representation: Interned(
                    13,
                    PhantomData,
                ),
            },
        ),
    ],
    result: Struct {
        field_order_hash: FieldOrderHash(
            18117670831054429132,
        ),
        field_layouts: [
            Struct {
                field_order_hash: FieldOrderHash(
                    15342153996709418787,
                ),
                field_layouts: [
                    Builtin(
                        Float(
                            F64,
                        ),
                    ),
                    Builtin(
                        Float(
                            F64,
                        ),
                    ),
                    Builtin(
                        Float(
                            F64,
                        ),
                    ),
                ],
            },
            Builtin(
                Int(
                    U32,
                ),
            ),
        ],
    },
    captures_niche: CapturesNiche(
        [
            LambdaSet(
                LambdaSet {
                    set: [
                        ( 21.27, []),
                    ],
                    representation: Interned(
                        1,
                        PhantomData,
                    ),
                },
            ),
            Struct {
                field_order_hash: FieldOrderHash(
                    0,
                ),
                field_layouts: [],
            },
            LambdaSet(
                LambdaSet {
                    set: [
                        ( 16.50, [Builtin(Int(U64)), Builtin(Int(U64)), Struct { field_order_hash: FieldOrderHash(18117670831054429132), field_layouts: [Struct { field_order_hash: FieldOrderHash(15342153996709418787), field_layouts: [Builtin(Float(F64)), Builtin(Float(F64)), Builtin(Float(F64))] }, Builtin(Int(U32))] }, Builtin(Float(F64))]),
                    ],
                    representation: Interned(
                        6,
                        PhantomData,
                    ),
                },
            ),
        ],
    ),
} combo must be in DeclarationToIndex', crates/compiler/mono/src/borrow.rs:165:9
stack backtrace:
   0:        0x104c91b70 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h1543c132bc4e188c
   1:        0x10425b7f4 - core::fmt::write::hda8e8eb84b49cbfc
   2:        0x104c8abb4 - std::io::Write::write_fmt::hb84c8996aec7120c
   3:        0x104c963e0 - std::panicking::default_hook::{{closure}}::hdf06011cb093de6a
   4:        0x104c96160 - std::panicking::default_hook::hd7ceb942fff7b170
   5:        0x104c96820 - std::panicking::rust_panic_with_hook::h053d4067a63a6fcb
   6:        0x104c96754 - std::panicking::begin_panic_handler::{{closure}}::hea9e6c546a23e8ff
   7:        0x104c950c0 - std::sys_common::backtrace::__rust_end_short_backtrace::hd64e012cf32134c6
   8:        0x104c964ec - _rust_begin_unwind
   9:        0x105dcc7ac - core::panicking::panic_fmt::hbfde5533e1c0592e
  10:        0x104909078 - roc_mono::borrow::ParamMap::get_param_offset::hbffeaee219e08b36
  11:        0x10490a0c4 - roc_mono::borrow::BorrowInfState::collect_expr::ha86bfec056116858
  12:        0x10490ab74 - roc_mono::borrow::BorrowInfState::collect_stmt::h43969ec2db8ff6ae
  13:        0x104908a40 - roc_mono::borrow::infer_borrow::h31c2266a2ae147ef
  14:        0x10492be8c - roc_mono::ir::Proc::insert_refcount_operations::h6428695833febccd
  15:        0x1048ad064 - roc_load_internal::file::update::ha42494b085ccc93d
  16:        0x10489ed30 - roc_load_internal::file::state_thread_step::he6a3f4e355eb5b81
  17:        0x1048a2fc8 - roc_load_internal::file::load_multi_threaded::h6f2ed5ce1f232de0
  18:        0x10489c908 - roc_load_internal::file::load::h5c8b96246126fbe7
  19:        0x104848670 - roc_load::load_and_monomorphize::hf3820d3b0f9afd40
  20:        0x1045a90ac - roc_cli::build::build_file::h7d8f2cc16ee70b32
  21:        0x1045b84b4 - roc_cli::build::h23083977acb1da5e
  22:        0x10445aad8 - roc::main::h3012192f41c3b7d7
  23:        0x104455a8c - std::sys_common::backtrace::__rust_begin_short_backtrace::hb993bd9ba7844407
  24:        0x104455aac - std::rt::lang_start::{{closure}}::hb8a8e41cbdc0d669
  25:        0x104c83e24 - std::rt::lang_start_internal::hef2161f9571a51d7
  26:        0x10445cde4 - _main
@ayazhafiz ayazhafiz added bug Something isn't working mono Relates to the Monomorphization compiler stage labels Dec 12, 2022
@ayazhafiz
Copy link
Member

Looks like another lambda set resolution bug. #4745 does not cut this one down. Needs a minimal repro

@ayazhafiz ayazhafiz added the Needs example Needs a Short, Self Contained, Correct (Compilable), Example label Dec 12, 2022
@ayazhafiz ayazhafiz self-assigned this Dec 12, 2022
@ayazhafiz
Copy link
Member

minimization:

interface B exposes [] imports []

Generator : ({} -> U8) -> U8

andThen : Generator -> Generator
andThen = \generator ->
    \newFn ->
        generator \newRng -> newFn newRng

between : Generator
between =
    andThen (\f -> f {})

it : U8
it = 
    between (\{} ->
        between (\{} ->
            10u8
        )
    )

expect
    it == 10u8

@ayazhafiz
Copy link
Member

The elaboration for

Generator : ({} -> U8) -> U8

andThen : Generator -> Generator
andThen = \generator ->
    \newFn ->
        generator \newRng -> newFn newRng

between : Generator
between =
    andThen (\f -> f {})

it : U8
it = 
    between (\{} ->
#   ^^^^^^^
        between (\{} ->
#       ^^^^^^^
            10u8
        )
    )

main = it

is

andThen = \generator -[2]-> \newFn -[8]-> generator \newRng -[10]-> newFn newRng
                                                                                           
between = andThen \f -[12]-> f {}
                                                                                           
it = between \{} -[14]-> between \{} -[15]-> 10
                                                                                           
main = it
                                                                                           
                                                                                           
between : ({} -[[14]]-> U8) -[[8 (({} -[[10 ({} -[[14]]-> U8)]]-> U8) -[[12]]-> U8)]]-> U8
between : ({} -[[15]]-> U8) -[[8 (({} -[[10 ({} -[[15]]-> U8)]]-> U8) -[[12]]-> U8)]]-> U8

which appears to be correct.

There is a missing specialization

thread 'issue_4734' panicked at 'internal error: entered unreachable code: symbol/layout `#UserApp.8` ProcLayout {
    arguments: [
        LambdaSet(
            LambdaSet {
                set: [
                    ( Test.15, []),
                ],
                representation: Interned(
                    0,
                    PhantomData,
                ),
            },
        ),
        LambdaSet(
            LambdaSet {
                set: [
                    ( Test.8, [LambdaSet(LambdaSet { set: [( Test.12, [])], representation: Interned(0, PhantomData) })]),
                ],
                representation: Interned(
                    1,
                    PhantomData,
                ),
            },
        ),
    ],
    result: Builtin(
        Int(
            U8,
        ),
    ),
    captures_niche: CapturesNiche(
        [
            LambdaSet(
                LambdaSet {
                    set: [
                        ( Test.12, []),
                    ],
                    representation: Interned(
                        0,
                        PhantomData,
                    ),
                },
            ),
        ],
    ),
} combo must be in DeclarationToIndex

but there is one present, where the diff is

❯ diff a.d b.d
6c6
<                     ( Test.15, []),
---
>                     ( Test.14, []),

so, we are missing a specialization of \newFn -[8]-> generator \newRng -[10]-> newFn newRng for the 15 variant. Perhaps this is because the lambda set niche of 8 is seen to be the closure 12 in both cases? But the arguments are materially different, so perhaps this suggests that higher up, we only see one layout to specialize for.

@ayazhafiz
Copy link
Member

Okay, so the problem here is that between is seen as a module thunk, and we don't fully preserve the shape of the function it returns when we specialize the thunk.

In particular, we have two calls to between:

between : ({} -[[14]]-> U8) -[[8 (({} -[[10 ({} -[[14]]-> U8)]]-> U8) -[[12]]-> U8)]]-> U8
between : ({} -[[15]]-> U8) -[[8 (({} -[[10 ({} -[[15]]-> U8)]]-> U8) -[[12]]-> U8)]]-> U8

but, since between is a module thunk, we only record its return layout when determining specialization. In both cases, the return layout is the unary lambda set 8 which captures a unary lambda set 12; i.e. the layout

LambdaSet {
    set: [
        ( Test.8, [LambdaSet(LambdaSet { set: [( Test.12, [])], representation: Interned(0, PhantomData) })]),
    ],
    representation: Interned(
        1,
        PhantomData,
    ),
},

We might think that somehow the nested captures of [[8 (({} -[[10 ({} -[[14]]-> U8)]]-> U8) -[[12]]-> U8)]]'s captured function arguments need to be represented in the layout. Unfortunately I don't think that's enough. For example, suppose we had instead

andThen =
  \generator -[2]->
  x = generator \{} -[9]-> 10
  \newFn -[10]-> Num.add (newFn {}) x
                                                
between = andThen \f -[12]-> f {}
                                       
it = between \{} -[14]-> between \{} -[15]-> 10
                                                
main = it

now there are two between specializations

between : ({} -[[14]]-> U8) -[[10 U8]]-> U8
between : ({} -[[15]]-> U8) -[[10 U8]]-> U8

but again, we will only wind up storing [10 U8] as the specialized layout of between. And now there is nothing deeper in the captures of the lambda 10 that can help us, we really do need to preserve the structural layout of the proc that between itself resolves to.

@ayazhafiz
Copy link
Member

@shritesh as a workaround for now, this should work if you explicitly wrap the between and real thunks in a lambda, i.e. something to the effect of

between : Generator { min : F64, max : F64 } F64 *
between = \seed, init, fn ->
    ({ min, max }, realValue <- andThen real {}

    min + (max - min) * realValue) seed init fn

@ayazhafiz ayazhafiz removed the Needs example Needs a Short, Self Contained, Correct (Compilable), Example label Dec 14, 2022
@ayazhafiz ayazhafiz changed the title Code typechecks but fails to codegen Thunks that resolve to procs fail to preserve specialization information Dec 14, 2022
ayazhafiz added a commit that referenced this issue Dec 28, 2022
An "eta-reduced thunk" is a top-level, non-capturing value (compiled as a thunk)
that returns a specialized procedure. Had the value been eta-expanded, it would
be a function that returns another function.

Because
  1. specialized functions are identified by their [function signature][RawFunctionLayout],
     consisting of arguments, lambda set, and return layout
  2. the runtime representation of a function is only its lambda set
  3. top-level, non-capturing values are identified only by their runtime representation

eta-reduced thunks lose information about the specialized procedures they resolve to, as their
layout identification includes only the lambda set of the returned procedure, but that returned
procedure is fully identified by its signature as described in (1).

To recover this information, eta-reduced thunks are installed a niche that is the [function
signature][RawFunctionLayout] of the specialized procedure they return.

**Example**

Consider

```ignore(illustrative)
andThen = \{} ->
    x = 10
    \newFn -[5]-> Num.add (newFn {}) x

between = andThen {}

it = between \{} -[7]-> between \{} -[8]-> 10

main = it
```

Here `between` is an eta-reduced thunk, for which we want two specializations

```ignore(illustrative)
between : ({} -[[7]]-> U8) -[[5 U8]]-> U8
between : ({} -[[8]]-> U8) -[[5 U8]]-> U8
```

Since `between` always resolves to the function `-[5]->`, its layout identification is

```ignore(illustrative)
{} -> [[5 U8]]
```

But, the construction of that returned lambda set `[[5 U8]]` is different between the two
specializations. To distinguish those differences, we attach the returned functions' layout as
a niche to each specialization of the thunk.

```ignore(illustrative)
between spec 1: {} -> [[5 U8]] (niche: η({} -[[7]]-> U8) -[[5 U8]]-> U8)
between spec 1: {} -> [[5 U8]] (niche: η({} -[[8]]-> U8) -[[5 U8]]-> U8)
```

**Composing with multimorphic lambda sets**

Eta-reduced thunk niches compose with specializations involved in multimorphic lambda sets
naturally, because the two concepts are disjoint.

Consider

```ignore(illustrative)
andThen = \x ->
    \newFn -[5]-> Str.concat (newFn {}) (Num.toStr x)

between = if Bool.true
    then andThen 11u8
    else andThen 22u16

it = between \{} -[7]-> between \{} -[8]-> "end"

main = it
```

For this program, we want two specializations of `between`

```ignore(illustrative)
between : ({} -[[7]]-> Str) -[[5 U16, 5 U8]]-> Str
between : ({} -[[8]]-> Str) -[[5 U16, 5 U8]]-> Str
```

It is again enough to specify that each `between` thunk returns a closure data of lambda set
shape `[[5 U16, 5 U8]]`, with an appropriate eta-reduced thunk niche.

The thunk itself does not capture, and the disambiguation of procedures in the multimorphic
lambda set is only relevant at the site of the construction of the lambda set (and its dispatch).

That is, the niches needed to distinguish the constructed procedures in the lambda set are only
relevant
  - **inside** the body of `between` (specifically, inside the specialization of `andThen`)
  - OR, **outside** the body of `between`, when matching on it to make the calls in `it`
And so, nothing extra is needed on the surface of the eta-reduced thunk's niche to identify
this program correctly.

**References**

See also #4734
ayazhafiz added a commit that referenced this issue Dec 28, 2022
An "eta-reduced thunk" is a top-level, non-capturing value (compiled as a thunk)
that returns a specialized procedure. Had the value been eta-expanded, it would
be a function that returns another function.

Because
  1. specialized functions are identified by their [function signature][RawFunctionLayout],
     consisting of arguments, lambda set, and return layout
  2. the runtime representation of a function is only its lambda set
  3. top-level, non-capturing values are identified only by their runtime representation

eta-reduced thunks lose information about the specialized procedures they resolve to, as their
layout identification includes only the lambda set of the returned procedure, but that returned
procedure is fully identified by its signature as described in (1).

To recover this information, eta-reduced thunks are installed a niche that is the [function
signature][RawFunctionLayout] of the specialized procedure they return.

**Example**

Consider

```ignore(illustrative)
andThen = \{} ->
    x = 10
    \newFn -[5]-> Num.add (newFn {}) x

between = andThen {}

it = between \{} -[7]-> between \{} -[8]-> 10

main = it
```

Here `between` is an eta-reduced thunk, for which we want two specializations

```ignore(illustrative)
between : ({} -[[7]]-> U8) -[[5 U8]]-> U8
between : ({} -[[8]]-> U8) -[[5 U8]]-> U8
```

Since `between` always resolves to the function `-[5]->`, its layout identification is

```ignore(illustrative)
{} -> [[5 U8]]
```

But, the construction of that returned lambda set `[[5 U8]]` is different between the two
specializations. To distinguish those differences, we attach the returned functions' layout as
a niche to each specialization of the thunk.

```ignore(illustrative)
between spec 1: {} -> [[5 U8]] (niche: η({} -[[7]]-> U8) -[[5 U8]]-> U8)
between spec 1: {} -> [[5 U8]] (niche: η({} -[[8]]-> U8) -[[5 U8]]-> U8)
```

**Composing with multimorphic lambda sets**

Eta-reduced thunk niches compose with specializations involved in multimorphic lambda sets
naturally, because the two concepts are disjoint.

Consider

```ignore(illustrative)
andThen = \x ->
    \newFn -[5]-> Str.concat (newFn {}) (Num.toStr x)

between = if Bool.true
    then andThen 11u8
    else andThen 22u16

it = between \{} -[7]-> between \{} -[8]-> "end"

main = it
```

For this program, we want two specializations of `between`

```ignore(illustrative)
between : ({} -[[7]]-> Str) -[[5 U16, 5 U8]]-> Str
between : ({} -[[8]]-> Str) -[[5 U16, 5 U8]]-> Str
```

It is again enough to specify that each `between` thunk returns a closure data of lambda set
shape `[[5 U16, 5 U8]]`, with an appropriate eta-reduced thunk niche.

The thunk itself does not capture, and the disambiguation of procedures in the multimorphic
lambda set is only relevant at the site of the construction of the lambda set (and its dispatch).

That is, the niches needed to distinguish the constructed procedures in the lambda set are only
relevant
  - **inside** the body of `between` (specifically, inside the specialization of `andThen`)
  - OR, **outside** the body of `between`, when matching on it to make the calls in `it`
And so, nothing extra is needed on the surface of the eta-reduced thunk's niche to identify
this program correctly.

**References**

See also #4734
ayazhafiz added a commit that referenced this issue Jan 16, 2023
ayazhafiz added a commit that referenced this issue Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working mono Relates to the Monomorphization compiler stage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants