Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erroneous bitcast for indefinite array allocation #10

Closed
NeuralCoder3 opened this issue Feb 22, 2022 · 3 comments
Closed

Erroneous bitcast for indefinite array allocation #10

NeuralCoder3 opened this issue Feb 22, 2022 · 3 comments
Assignees

Comments

@NeuralCoder3
Copy link
Collaborator

Allocations with a statically unknown size T::nat lead to <TODO> in LLVM code.

For instance, when an array of the same size as an indefinite array «⊤∷nat; r32» is created in Thorin,
invalid LLVM code is produced by the code generation.

Using Alloc2Malloc:
Before optimizations:

_1176: [:mem, :ptr («3∷nat; r32», 0∷nat)] = :alloc («3∷nat; r32», 0∷nat) mem_1175;
_1190: :ptr («⊤∷nat; r32», 0∷nat) = :bitcast (:ptr («⊤∷nat; r32», 0∷nat), :ptr («3∷nat; r32», 0∷nat)) _1176#1∷i1;
...
_1159: [:mem, :ptr («⊤∷nat; r32», 0∷nat)] = :alloc («⊤∷nat; r32», 0∷nat) _1158;

After optimizations

  _2412: [:mem, :ptr («3∷nat; r32», 0∷nat)] = :malloc («3∷nat; r32», 0∷nat) (mem_2410, 12∷nat);
  _2445: i64 = :bitcast (i64, nat) ⊤∷nat;
  _2449: i64 = :Wrap_mul (3∷nat, 0∷nat) (4∷i64, _2445);
  _2450: nat = :bitcast (nat, i64) _2449;
  _2452: [:mem, :ptr («⊤∷nat; r32», 0∷nat)] = :malloc («⊤∷nat; r32», 0∷nat) (_2412#0∷i1, _2450);

and the corresponding LLVM code:

  %_2755.i8 = call i8* @malloc(i64 12)
  %_2755 = bitcast i8* %_2755.i8 to [3 x float]*
  %_2788 = bitcast i64 <TODO> to i64
  %_2792 = mul nuw nsw i64 4, %_2788
  %_2793 = bitcast i64 %_2792 to i64
  %_2795.i8 = call i8* @malloc(i64 %_2793)

Without Alloc2Malloc:
After optimizations:

_2375: [:mem, :ptr («3∷nat; r32», 0∷nat)] = :alloc («3∷nat; r32», 0∷nat) mem_2374;
_2377: [:mem, :ptr («⊤∷nat; r32», 0∷nat)] = :alloc («⊤∷nat; r32», 0∷nat) _2375#0∷i1;

LLVM: (no malloc or alloc is generated)

%_2449 = getelementptr inbounds [0 x float], [0 x float]* <TODO>, i64 0, i64 %_2447

The problem here is that T::nat is used as size but does not exist in LLVM.
One solution for some cases might be a more extensive tracking of the sizes used in allocation.
For instance, the original allocation uses a constant size of 3 and the bitcasts of the array
are eliminated.
But this propagation does not extend to the newly generated array which should have the same size.

Comment: It is not possible to produce this error using valid impala code.

@leissa leissa self-assigned this Feb 22, 2022
@leissa
Copy link
Member

leissa commented Feb 22, 2022

So, here is the deal:

Thorin at its core is designed to be type-safe. This means you have to track all array sizes explicitly in the type. In order to support unsafe features such as C-like arrays, we need unsafe axioms such as :bitcast. On top of that, indices of arity ⊤∷nat such as 23∷(int ⊤∷nat) encode unsafe indices which makes «⊤∷nat; r32» an unsafe array.

If I understand your use case correctly, the real solution would be to simply not use an unsafe array but a safe array at the allocation. Later on, you can cast the safe array to an unsafe one like so:

N: nat = ...; // doesn't need to be a constant btw
_1: [:mem, :ptr («N; r32», 0∷nat)] = :alloc («N; r32», 0∷nat) mem1;
_2: :ptr («⊤∷nat; r32», 0∷nat) = :bitcast (:ptr («⊤∷nat; r32», 0∷nat), :ptr («N; r32», 0∷nat)) _1#1∷i1;

And here a little background regarding the :alloc vs :malloc:

Both things mean the same thing. The only difference is that the size computation in :alloc happens implicitly by looking at its type whereas :malloc performs this computation explicitly. This just makes our life in the backend a little bit easier, as we have the size computation now explicitly at the term level. I find this situation a bit unpleasant and would love to see a solution where we don't need two axioms which are basically doing the same thing. What is more, the size argument of :malloc isn't type-safe, which is even more weird. I'm happy to hear suggestions :)

Finally, the LLVM backend should probably throw an exception with a reasonable error message instead of just asserting or returning "<TODO>" but this is already covered in issue #9.

Does this solve your issue?

@NeuralCoder3
Copy link
Collaborator Author

In my use case, I get provided with an unsafe array (_2 in the example above) for instance as an argument in a function.
I then want to create an array of the same size.
Therefore, the allocation would need to know the size of the provided array at runtime.
Is this possible or do I have to rely on the user to also provide me with the second array?

@leissa
Copy link
Member

leissa commented Feb 23, 2022

An unsafe Array only makes sense when having a pointer to it and :ptr («3∷nat; r32», 0∷nat) really is the same thing as float* in C. So there is no way for Thorin knowing the size. What is more, if you have an array like «n; r32», this is not a pair consisting of the actual array and a "size" field initialized to n. It is just an array of size n where n is some other term that occurs in the program. This leaves you with the following options.

  1. Use safe arrays instead of unsafe ones. For example, the signature of a matmul would look like this:
    [T: *, n: nat, m: nat, o: nat] -> [a: :ptr «n, m; T», b: :ptr «m, o; T»] -> :ptr «n, o; T»
    
    (I'm taking some liberties with the syntax here, but I guess you see the point).
  2. Instead of directly using Thorin arrays, use this type: [n: Nat, :ptr «n; r32»].
    This is a pair of size and and array of exactly that size.
  3. Alternatively, you can you do sth like this: [i64, :ptr «⊤∷nat; r32»].
    This is basically this:
    struct MyArray {
        size_t size;
        float* array;
    };

Note that with approach 2) and 3) you duplicate size information when doing sth like the matmul. Finally, 3) is an unsafe C-style way of doing things and is only present in Thorin in order to communicate with unsafe code and should only be used when you really have to do this.

@leissa leissa mentioned this issue Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants