We investigate whether a transformer initialized at random can be trained to execute OISC programs, as a follow up of "Looped Transformers as Programmable Computers"
We use SUBLEQ, which is capable of defining a universal computer.
SUBLEQ(a, b, c):
mem[b] = mem[b] - mem[a]
if mem[b] ≤ 0:
goto instruction c
else:
PC = PC + 1
The input and output space is a
-
$N$ : number of bits for each stored integer -
$s$ : columns of scatch pad -
$m$ : number of memory locations -
$n$ : total number of columns, so number of instructions is$n-m-s$ .
We use a python script to simulate SUBLEQ
and generate the input/ouput pairs.
We use a the transformer encoder in the vanilla transfomer model.
- Train set 100k pairs, val set 5000. Each input/output pair is a randomly initialized memory and result of executing one step of SUBLEQ. There are 8 memory locations, 8 instructions, and 16 scratch pad locations.
- Transformer encoder depth 16 layers, width (hidden dimension) 1024, number of heads 4
- L1/L2/Huber loss
- Scale up the loss by mapping 0 --> -100, 1 --> +100
- Quantize the model output
x > eps --> x = 100
,x < eps --> -100
- Optmizer Adam, SGD
- LR schedule CosineRestartWithWarmUp, ReduceLROnPlateau