Cyrus-0101 · Cyrus-0101 · May 21, 2024 · May 20, 2024 · May 20, 2024 · May 20, 2024
diff --git a/README.md b/README.md
@@ -4,6 +4,8 @@
 
 - A hobbyist functional programming language and interpreter project done in Golang as a way of understanding Golang.
 
+> Inspired by Thorsten Ball's books, "Writing an Interpreter in Go" and "Writing a Compiler in Go". All credit goes to him for the inspiration.
+
 ## Introduction
 - The goal of this project is to turn a `tree-walking, on-the-fly evaluating interpreter` into a `bytecode compiler` and a `virtual machine` that executes the bytecode.
 
@@ -300,8 +302,9 @@ cyrus("name"); // Cyrus
 - The above might sound counter-intuitive, that interpreters and compilers are opposites, but while their approach is different, they share a lot of things in their construction. They both have a frontend that reads in source code in the source language, and turns it into a data structure.
 
 - In both, compiler and interpreter, the frontend is usually made up of a lexer and a parser, that generate a syntax tree. In the frontend they have similarities, after that when they both traverse the AST, their paths diverge.
+
 - Let's take a look at the lifecycle of code being translated to machine code below:
-  ![Compiler Lifecycle](/assets/compiler-lifecycle.png)
+    ![Compiler Lifecycle](/assets/compiler-lifecycle.png)
 
 1. The source code is tokenized and parsed by the lexer and parser respectively. This is the frontend. The source code is turned from text to AST.
 
@@ -378,4 +381,33 @@ cyrus("name"); // Cyrus
 
 - We can see we need to implement two instruction types in total: One for pushing to the stack and another for adding values in the stack.
 
-- Let's define the opcodes and how they are encoded in bytecode, then extend the compiler to generate instructions, then create a VM that decodes and executes the instructions. We'll create a new package `code` to define the bytecode instructions and the compiler.
+- Let's define the opcodes and how they are encoded in bytecode, then extend the compiler to generate instructions, then create a VM that decodes and executes the instructions. We'll create a new package `code` to define the bytecode instructions and the compiler.
+
+- What we know is that bytecode is made up of instructions, which are a series of bytes, and a single instruction is 1 byte wide.
+
+- In our `code` package we create instructions - a slice of bytes - and an `Opcode` byte. We define `Instructions []byte`, because its far more easy to  work around with a `[]byte`, and treat it implicitly than encode definitions in Go's type system.
+
+- `Bytecode` definition is missing because we'd run into a nasty import-cycle if we defined it in the `code` package. We will define it in the `compiler` package, later.
+
+- What if later on we wanted to push other things to the stack from our Chui code? String literals, for example. Putting those into the bytecode is also possible, since it’s just made of bytes, but it would also be a lot of bloat and would sooner
+or later become unwieldy.
+
+- That's where `constants` come into play. In this context, “constant” is short for “constant expression” and refers to expressions whose value doesn’t change, is constant, and can be determined at compile time:
+
+    ![Constants](/assets/constants.png)
+
+- This means we don't have to run the program to know what expressions evaluate to. A compiler can find them in the code and store the value they evaluate to. Then it can reference the constants in the instructions it generates, instead of embedding values directly in them. The resulting data structure is an integer abd can serve as an index to the data structure that holds all constants, known as `constant pool`, which is what our compiler will do.
+
+- When we get an integer literal (a constant expression) during compiling, we’ll evaluate it & keep track of the resulting *object.Integer, by storing it in memory and assigning it a number.
+- In the bytecode instructions we’ll refer to the *object.Integer by this number, when compiling is done and we pass the instructions to the VM for execution, we’ll also hand over all the constants found putting them in a data structure – our constant pool – where the number that has been assigned to each constant can be used as an index to retrieve it.
+
+- Each definition will have an Op prefix and the value in reference will be determined by `iota`, it (`iota`) will generate increasing byte values, because we don’t care about the actual values our opcodes represent. They only need to
+be distinct from each other and fit in one byte, `iota` makes sure of that for us.
+
+- The definition for `OpConstant` says that its only operand is two bytes wide, making it a `uint16`, limiting the maximum value to `65535`. - If we include `0` the number of representable values is then `65536`, which should be enough, since I don’t think we’re going to reference
+more than `65536` constants in our Chui programs.
+- This means using a `uint16` instead of, say, a
+`uint32`, helps keep the resulting instructions smaller, because of less unused bytes.
+
+- We want end to end as soon as possible, and not a system that can only be turned on once it’s feature-complete, our goal in this [PR #1](https://github.com/Cyrus-0101/chui/pull/1) is to build the smallest possible compiler, that should only do one thing for now: produce two `OpConstant` instructions that later instruct the VM to correctly load the integers 2 and 2 on to the stack.
+- In order to achieve that, the minimal compiler has to: traverse the AST passed, find the *ast.IntegerLiteral nodes, evaluate them by turning them into *object.Integer objects, add the objects to the `constant pool`, and finally emit `OpConstant` instructions that reference the constants in said pool.
diff --git a/assets/constants.png b/assets/constants.png
diff --git a/src/code/code.go b/src/code/code.go
@@ -0,0 +1,142 @@
+// Package code provides functionality for working with bytecode instructions.
+//
+// It defines the Instructions type, which is a slice of bytes, and the Opcode type, which is a byte.
+package code
+
+import (
+	"bytes"
+	"encoding/binary"
+	"fmt"
+)
+
+type Instructions []byte
+
+type Opcode byte
+
+const (
+	OpConstant Opcode = iota
+	OpAdd
+)
+
+// Definition represents the definition of an opcode, including its name and the widths of its operands, which is used to determine how many bytes to read to extract the operands.
+type Definition struct {
+	Name          string
+	OperandWidths []int
+}
+
+// definitions maps opcodes to their definitions.
+var definitions = map[Opcode]*Definition{
+	OpConstant: {"OpConstant", []int{2}},
+	OpAdd:      {"OpAdd", []int{}},
+}
+
+// Lookup() retrieves the definition of an opcode.
+func Lookup(op byte) (*Definition, error) {
+	def, ok := definitions[Opcode(op)]
+	if !ok {
+		return nil, fmt.Errorf("opcode %d undefined", op)
+	}
+
+	return def, nil
+}
+
+// Make() creates a bytecode instruction from an opcode and its operands.
+func Make(op Opcode, operands ...int) []byte {
+	def, ok := definitions[op]
+
+	if !ok {
+		return []byte{}
+	}
+
+	instructionLen := 1
+
+	for _, w := range def.OperandWidths {
+		instructionLen += w
+	}
+
+	instruction := make([]byte, instructionLen)
+	instruction[0] = byte(op)
+	offset := 1
+
+	for i, o := range operands {
+		width := def.OperandWidths[i]
+
+		switch width {
+
+		case 2:
+			binary.BigEndian.PutUint16(instruction[offset:], uint16(o))
+		}
+
+		offset += width
+	}
+
+	return instruction
+}
+
+// String() returns a string representation of the bytecode instructions, including the offset of each instruction in the bytecode.
+func (ins Instructions) String() string {
+	var out bytes.Buffer
+
+	i := 0
+
+	for i < len(ins) {
+		def, err := Lookup(ins[i])
+
+		if err != nil {
+			fmt.Fprintf(&out, "ERROR: %s\n", err)
+			continue
+		}
+
+		operands, read := ReadOperands(def, ins[i+1:])
+
+		fmt.Fprintf(&out, "%04d %s\n", i, ins.fmtInstruction(def, operands))
+
+		i += 1 + read
+	}
+
+	return out.String()
+}
+
+// fmtInstruction() formats an instruction for printing.
+func (ins Instructions) fmtInstruction(def *Definition, operands []int) string {
+	operandCount := len(def.OperandWidths)
+
+	if len(operands) != operandCount {
+		return fmt.Sprintf("ERROR: operand len %d does not match defined %d\n",
+			len(operands), operandCount)
+	}
+
+	switch operandCount {
+
+	case 0:
+		return def.Name
+
+	case 1:
+		return fmt.Sprintf("%s %d", def.Name, operands[0])
+	}
+
+	return fmt.Sprintf("ERROR: unhandled operandCount for %s\n", def.Name)
+}
+
+// ReadOperands() reads the operands of an instruction.
+func ReadOperands(def *Definition, ins Instructions) ([]int, int) {
+	operands := make([]int, len(def.OperandWidths))
+	offset := 0
+
+	for i, width := range def.OperandWidths {
+		switch width {
+
+		case 2:
+			operands[i] = int(ReadUint16(ins[offset:]))
+		}
+
+		offset += width
+	}
+
+	return operands, offset
+}
+
+// ReadUint16() reads a uint16 from a byte slice.
+func ReadUint16(ins Instructions) uint16 {
+	return binary.BigEndian.Uint16(ins)
+}
diff --git a/src/code/code_test.go b/src/code/code_test.go
@@ -0,0 +1,84 @@
+package code
+
+import "testing"
+
+func TestMake(t *testing.T) {
+	tests := []struct {
+		op       Opcode
+		operands []int
+		expected []byte
+	}{
+		{OpConstant, []int{65534}, []byte{byte(OpConstant), 255, 254}},
+		{OpAdd, []int{}, []byte{byte(OpAdd)}},
+	}
+
+	for _, tt := range tests {
+		instruction := Make(tt.op, tt.operands...)
+
+		if len(instruction) != len(tt.expected) {
+			t.Errorf("instruction has wrong length. want=%d, got=%d",
+				len(tt.expected), len(instruction))
+		}
+
+		for i, b := range tt.expected {
+			if instruction[i] != tt.expected[i] {
+				t.Errorf("wrong byte at pos %d. want=%d, got=%d",
+					i, b, instruction[i])
+			}
+		}
+	}
+}
+
+func TestInstructionsString(t *testing.T) {
+	instructions := []Instructions{
+		Make(OpAdd),
+		Make(OpConstant, 2),
+		Make(OpConstant, 65535),
+	}
+
+	expected := `0000 OpAdd
+0001 OpConstant 2
+0004 OpConstant 65535
+`
+	concatted := Instructions{}
+
+	for _, ins := range instructions {
+		concatted = append(concatted, ins...)
+	}
+
+	if concatted.String() != expected {
+		t.Errorf("instructions wrongly formatted.\nwant=%q\ngot=%q",
+			expected, concatted.String())
+	}
+}
+
+func TestReadOperands(t *testing.T) {
+	tests := []struct {
+		op        Opcode
+		operands  []int
+		bytesRead int
+	}{
+		{OpConstant, []int{65535}, 2},
+	}
+	for _, tt := range tests {
+		instruction := Make(tt.op, tt.operands...)
+
+		def, err := Lookup(byte(tt.op))
+
+		if err != nil {
+			t.Fatalf("definition not found: %q\n", err)
+		}
+
+		operandsRead, n := ReadOperands(def, instruction[1:])
+
+		if n != tt.bytesRead {
+			t.Fatalf("n wrong. want=%d, got=%d", tt.bytesRead, n)
+		}
+
+		for i, want := range tt.operands {
+			if operandsRead[i] != want {
+				t.Errorf("operand wrong. want=%d, got=%d", want, operandsRead[i])
+			}
+		}
+	}
+}
diff --git a/src/compiler/compiler.go b/src/compiler/compiler.go
@@ -0,0 +1,103 @@
+// Package compiler provides functionality for compiling AST nodes into bytecode instructions.
+//
+// It emits the result of the compilation, including the emitted instructions and the constant pool.
+package compiler
+
+import (
+	"chui/ast"
+	"chui/code"
+	"chui/object"
+	"fmt"
+)
+
+type Compiler struct {
+	instructions code.Instructions
+	constants    []object.Object
+}
+
+func New() *Compiler {
+	return &Compiler{
+		instructions: code.Instructions{},
+		constants:    []object.Object{},
+	}
+}
+
+func (c *Compiler) Compile(node ast.Node) error {
+	switch node := node.(type) {
+
+	case *ast.Program:
+		for _, s := range node.Statements {
+			err := c.Compile(s)
+
+			if err != nil {
+				return err
+			}
+		}
+
+	case *ast.ExpressionStatement:
+		err := c.Compile(node.Expression)
+
+		if err != nil {
+			return err
+		}
+
+	case *ast.InfixExpression:
+		err := c.Compile(node.Left)
+
+		if err != nil {
+			return err
+		}
+
+		err = c.Compile(node.Right)
+
+		if err != nil {
+			return err
+		}
+
+		switch node.Operator {
+		case "+":
+			c.emit(code.OpAdd)
+
+		default:
+			return fmt.Errorf("unknown operator %s", node.Operator)
+		}
+
+	case *ast.IntegerLiteral:
+		integer := &object.Integer{Value: node.Value}
+		c.emit(code.OpConstant, c.addConstant(integer))
+	}
+
+	return nil
+}
+
+func (c *Compiler) addConstant(obj object.Object) int {
+	c.constants = append(c.constants, obj)
+
+	return len(c.constants) - 1
+}
+
+func (c *Compiler) emit(op code.Opcode, operands ...int) int {
+	ins := code.Make(op, operands...)
+	pos := c.addInstruction(ins)
+
+	return pos
+}
+
+func (c *Compiler) addInstruction(ins []byte) int {
+	posNewInstruction := len(c.instructions)
+	c.instructions = append(c.instructions, ins...)
+
+	return posNewInstruction
+}
+
+func (c *Compiler) Bytecode() *Bytecode {
+	return &Bytecode{
+		Instructions: c.instructions,
+		Constants:    c.constants,
+	}
+}
+
+type Bytecode struct {
+	Instructions code.Instructions
+	Constants    []object.Object
+}