Skip to content

Commit

Permalink
Complete JIT compilation engine for arm64 target. (#276)
Browse files Browse the repository at this point in the history
This commit completes the baseline single pass JIT engine for arm64 target.
The implementation passes 100% of specification tests and all the e2e tests
that have been used for amd64. Notably, the engine is stable under high
concurrency where multiple gorutines are holding stores and each of them
has Wasm execution environment.

One thing to note is that the assembler (golang-asm) is not goroutine-safe,
so we have to take a lock on the assembler usage, therefore the compilation
cannot scale to multiple CPU cores. This will be resolved once we build our
homemade assembler in #233.

resolves #187

Signed-off-by: Takeshi Yoneda <takeshi@tetrate.io>
  • Loading branch information
mathetake authored Feb 22, 2022
1 parent 43a442f commit 312c0e6
Show file tree
Hide file tree
Showing 10 changed files with 216 additions and 149 deletions.
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,15 +35,17 @@ wazero is an early project, so APIs are subject to change until version 1.0.
There's the concept called "engine" in wazero (which is a word commonly used in Wasm runtimes). Engines are responsible for compiling and executing WebAssembly modules.
There are two types of engines are available for wazero:

1. _Interpreter_: a naive interpreter-based implementation of Wasm virtual machine. Its implementation doesn't have any platform (GOARCH, GOOS) specific code, therefore _interpreter_ engine can be used for any compilation target available for Go (such as `arm64`).
2. _JIT engine_: compiles WebAssembly modules, generates the machine code, and executing it all at runtime. Currently wazero only implements the JIT compiler for `amd64` target. Generally speaking, _JIT engine_ is faster than _Interpreter_ by order of magnitude. However, the implementation is immature and has a bunch of aspects that could be improved (for example, it just does a singlepass compilation and doesn't do any optimizations, etc.). Please refer to [internal/wasm/jit/RATIONALE.md](internal/wasm/jit/RATIONALE.md) for the design choices and considerations in our JIT engine.
1. _Interpreter_: a naive interpreter-based implementation of Wasm virtual machine. Its implementation doesn't have any platform (GOARCH, GOOS) specific code, therefore _interpreter_ engine can be used for any compilation target available for Go (such as `riscv64`).
2. _JIT engine_: compiles WebAssembly modules, generates the machine code, and executing it all at runtime. Currently wazero implements the JIT compiler for `amd64` and `arm64` target. Generally speaking, _JIT engine_ is faster than _Interpreter_ by order of magnitude. However, the implementation is immature and has a bunch of aspects that could be improved (for example, it just does a singlepass compilation and doesn't do any optimizations, etc.). Please refer to [internal/wasm/jit/RATIONALE.md](internal/wasm/jit/RATIONALE.md) for the design choices and considerations in our JIT engine.

Both of engines passes 100% of [WebAssembly spec test suites]((https://github.com/WebAssembly/spec/tree/wg-1.0/test/core)) (on supported platforms).

| Engine | Usage|GOARCH=amd64 | GOARCH=others |
|:----------:|:---:|:-------------:|:------:|
| Interpreter|`wazero.NewEngineInterpreter()`|||
| JIT engine |`wazero.NewEngineJIT()`|||
| Engine | Usage| amd64 | arm64 | others |
|:---:|:---:|:---:|:---:|:---:|
| Interpreter|`wazero.NewEngineInterpreter()`||||
| JIT engine |`wazero.NewEngineJIT()`||||

*Note:* JIT does not yet work on Windows. Please use the interpreter and track [this issue](https://github.com/tetratelabs/wazero/issues/270) if interested.

If you choose no configuration, ex `wazero.NewStore()`, the interpreter is used. You can also choose explicitly like so:
```go
Expand Down
20 changes: 5 additions & 15 deletions internal/wasm/jit/engine.go
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
package jit

import (
"encoding/hex"
"fmt"
"math"
"reflect"
Expand Down Expand Up @@ -533,19 +532,11 @@ jitentry:
switch status := e.exitContext.statusCode; status {
case jitCallStatusCodeReturned:
// Meaning that all the function frames above the previous call frame stack pointer are executed.
if e.globalContext.previousCallFrameStackPointer != e.globalContext.callFrameStackPointer {
panic("bug in JIT compiler")
}
case jitCallStatusCodeCallHostFunction:
// Not "callFrameTop" but take the below of peek with "callFrameAt(1)" as the top frame is for host function,
// but when making host function calls, we need to pass the memory instance of host function caller.
fn := e.compiledFunctions[e.exitContext.functionCallAddress]
callerCompiledFunction := e.callFrameAt(1).compiledFunction
if buildoptions.IsDebugMode {
if fn.source.FunctionKind == wasm.FunctionKindWasm {
panic("jitCallStatusCodeCallHostFunction is only for host functions")
}
}
saved := e.globalContext.previousCallFrameStackPointer
e.execHostFunction(fn.source.FunctionKind, fn.source.HostFunction,
ctx.WithMemory(callerCompiledFunction.source.ModuleInstance.Memory),
Expand Down Expand Up @@ -669,7 +660,9 @@ func (e *engine) addCompiledFunction(addr wasm.FunctionAddress, compiled *compil
}

func compileHostFunction(f *wasm.FunctionInstance) (*compiledFunction, error) {
compiler, err := newCompiler(f, nil)
compiler, done, err := newCompiler(f, nil)
defer done()

if err != nil {
return nil, err
}
Expand Down Expand Up @@ -706,7 +699,8 @@ func compileWasmFunction(f *wasm.FunctionInstance) (*compiledFunction, error) {
fmt.Printf("compilation target wazeroir:\n%s\n", wazeroir.Format(ir.Operations))
}

compiler, err := newCompiler(f, ir)
compiler, done, err := newCompiler(f, ir)
defer done()
if err != nil {
return nil, fmt.Errorf("failed to initialize assembly builder: %w", err)
}
Expand Down Expand Up @@ -879,10 +873,6 @@ func compileWasmFunction(f *wasm.FunctionInstance) (*compiledFunction, error) {
return nil, fmt.Errorf("failed to compile: %w", err)
}

if buildoptions.IsDebugMode {
fmt.Printf("compiled code in hex: %s\n", hex.EncodeToString(code))
}

return &compiledFunction{
source: f,
codeSegment: code,
Expand Down
7 changes: 4 additions & 3 deletions internal/wasm/jit/jit_amd64.go
Original file line number Diff line number Diff line change
Expand Up @@ -89,13 +89,14 @@ type archContext struct{}
func newArchContext() (ret archContext) { return }

// newCompiler returns a new compiler interface which can be used to compile the given function instance.
// The function returned must be invoked when finished compiling, so use `defer` to ensure this.
// Note: ir param can be nil for host functions.
func newCompiler(f *wasm.FunctionInstance, ir *wazeroir.CompilationResult) (compiler, error) {
func newCompiler(f *wasm.FunctionInstance, ir *wazeroir.CompilationResult) (compiler, func(), error) {
// We can choose arbitrary number instead of 1024 which indicates the cache size in the compiler.
// TODO: optimize the number.
b, err := asm.NewBuilder("amd64", 1024)
if err != nil {
return nil, fmt.Errorf("failed to create a new assembly builder: %w", err)
return nil, func() {}, fmt.Errorf("failed to create a new assembly builder: %w", err)
}

compiler := &amd64Compiler{
Expand All @@ -106,7 +107,7 @@ func newCompiler(f *wasm.FunctionInstance, ir *wazeroir.CompilationResult) (comp
ir: ir,
labels: map[string]*labelInfo{},
}
return compiler, nil
return compiler, func() {}, nil
}

func (c *amd64Compiler) String() string {
Expand Down
108 changes: 84 additions & 24 deletions internal/wasm/jit/jit_arm64.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import (
"encoding/binary"
"fmt"
"math"
"sync"
"unsafe"

asm "github.com/twitchyliquid64/golang-asm"
Expand Down Expand Up @@ -69,14 +70,27 @@ const (
// engine is the pointer to the "*engine" as uintptr.
func jitcall(codeSegment, engine uintptr)

// golang-asm is not goroutine-safe so we take lock until we complete the compilation.
// TODO: delete after https://github.com/tetratelabs/wazero/issues/233
var assemblerMutex = &sync.Mutex{}

func unlockAssembler() {
assemblerMutex.Unlock()
}

// newCompiler returns a new compiler interface which can be used to compile the given function instance.
// The function returned must be invoked when finished compiling, so use `defer` to ensure this.
// Note: ir param can be nil for host functions.
func newCompiler(f *wasm.FunctionInstance, ir *wazeroir.CompilationResult) (compiler, error) {
func newCompiler(f *wasm.FunctionInstance, ir *wazeroir.CompilationResult) (c compiler, done func(), err error) {
// golang-asm is not goroutine-safe so we take lock until we complete the compilation.
// TODO: delete after https://github.com/tetratelabs/wazero/issues/233
assemblerMutex.Lock()

// We can choose arbitrary number instead of 1024 which indicates the cache size in the compiler.
// TODO: optimize the number.
b, err := asm.NewBuilder("arm64", 1024)
if err != nil {
return nil, fmt.Errorf("failed to create a new assembly builder: %w", err)
return nil, unlockAssembler, fmt.Errorf("failed to create a new assembly builder: %w", err)
}

compiler := &arm64Compiler{
Expand All @@ -86,7 +100,7 @@ func newCompiler(f *wasm.FunctionInstance, ir *wazeroir.CompilationResult) (comp
ir: ir,
labels: map[string]*labelInfo{},
}
return compiler, nil
return compiler, unlockAssembler, nil
}

type arm64Compiler struct {
Expand Down Expand Up @@ -145,6 +159,7 @@ func (c *arm64Compiler) compile() (code []byte, staticData compiledFunctionStati
return
}

staticData = c.staticData
return
}

Expand Down Expand Up @@ -597,6 +612,10 @@ func (c *arm64Compiler) compileExitFromNativeCode(status jitCallStatusCode) erro

// compileHostFunction implements compiler.compileHostFunction for the arm64 architecture.
func (c *arm64Compiler) compileHostFunction(address wasm.FunctionAddress) error {
// The assembler skips the first instruction so we intentionally add NOP here.
// TODO: delete after #233
c.compileNOP()

// First we must update the location stack to reflect the number of host function inputs.
c.pushFunctionParams()

Expand Down Expand Up @@ -667,6 +686,15 @@ func (c *arm64Compiler) compileSwap(o *wazeroir.OperationSwap) error {
return nil
}

// Only used in test, but define this in the main file as sometimes
// we need to call this from the main code when debugging.
//nolint:unused
func (c *arm64Compiler) undefined() {
ud := c.newProg()
ud.As = obj.AUNDEF
c.addInstruction(ud)
}

// compileGlobalGet implements compiler.compileGlobalGet for the arm64 architecture.
func (c *arm64Compiler) compileGlobalGet(o *wazeroir.OperationGlobalGet) error {
c.maybeCompileMoveTopConditionalToFreeGeneralPurposeRegister()
Expand All @@ -687,7 +715,7 @@ func (c *arm64Compiler) compileGlobalGet(o *wazeroir.OperationGlobalGet) error {
intMov = arm64.AMOVWU
floatMov = arm64.AFMOVS
case wasm.ValueTypeF64:
intMov = arm64.AMOVW
intMov = arm64.AMOVD
floatMov = arm64.AFMOVD
}

Expand Down Expand Up @@ -763,7 +791,7 @@ func (c *arm64Compiler) compileReadGlobalAddress(globalIndex uint32) (destinatio
c.compileConstToRegisterInstruction(
// globalIndex is an index to []*GlobalInstance, therefore
// we have to multiply it by the size of *GlobalInstance == the pointer size == 8.
arm64.AMOVW, int64(globalIndex)*8, destinationRegister,
arm64.AMOVD, int64(globalIndex)*8, destinationRegister,
)

// "reservedRegisterForTemporary = &globals[0]"
Expand All @@ -773,7 +801,7 @@ func (c *arm64Compiler) compileReadGlobalAddress(globalIndex uint32) (destinatio
reservedRegisterForTemporary,
)

// "destinationRegister = [reservedRegisterForTemporary + destinationRegister] (== &globals[globalIndex])".
// "destinationRegister = [reservedRegisterForTemporary + destinationRegister] (== globals[globalIndex])".
c.compileMemoryWithRegisterOffsetToRegisterInstruction(
arm64.AMOVD,
reservedRegisterForTemporary, destinationRegister,
Expand Down Expand Up @@ -1192,7 +1220,7 @@ func (c *arm64Compiler) compileCallImpl(addr wasm.FunctionAddress, addrRegister
compiledFunctionAddressRegister)
} else {
// Shift addrRegister by 3 because the size of *compiledFunction equals 8 bytes.
c.compileConstToRegisterInstruction(arm64.ALSL, 3, addrRegister)
c.compileConstToRegisterInstruction(arm64.ALSLW, 3, addrRegister)
c.compileMemoryWithRegisterOffsetToRegisterInstruction(
arm64.AMOVD,
tmp, addrRegister,
Expand Down Expand Up @@ -1465,7 +1493,7 @@ func (c *arm64Compiler) compileDropRange(r *wazeroir.InclusiveRange) error {
c.maybeCompileMoveTopConditionalToFreeGeneralPurposeRegister()

// Save the live values because we pop and release values in drop range below.
liveValues := c.locationStack.stack[c.locationStack.sp-uint64(r.Start):]
liveValues := c.locationStack.stack[c.locationStack.sp-uint64(r.Start) : c.locationStack.sp]
c.locationStack.sp -= uint64(r.Start)

// Note: drop target range is inclusive.
Expand Down Expand Up @@ -1498,6 +1526,8 @@ func (c *arm64Compiler) compileSelect() error {
return err
}

c.markRegisterUsed(cv.register)

x1, x2, err := c.popTwoValuesOnRegisters()
if err != nil {
return err
Expand All @@ -1518,7 +1548,7 @@ func (c *arm64Compiler) compileSelect() error {
// So we explicitly assign a general purpuse register to x1 here.
if isZeroRegister(x1.register) {
// Mark x2 and cv's regiseters are used so they won't be chosen.
c.markRegisterUsed(x2.register, cv.register)
c.markRegisterUsed(x2.register)
// Pick the non-zero register for x1.
x1Reg, err := c.allocateRegister(generalPurposeRegisterTypeInt)
if err != nil {
Expand Down Expand Up @@ -1896,7 +1926,7 @@ func (c *arm64Compiler) compileIntegerDivPrecheck(is32Bit, isSigned bool, divide
brIfDividendNotMinInt := c.compilelBranchInstruction(arm64.ABNE)

// Otherwise, we raise overflow error.
c.compileExitFromNativeCode(jitCallStatusIntegerDivisionByZero)
c.compileExitFromNativeCode(jitCallStatusIntegerOverflow)

c.setBranchTargetOnNext(brIfDivisorNonMinusOne, brIfDividendNotMinInt)
}
Expand Down Expand Up @@ -2340,24 +2370,37 @@ func (c *arm64Compiler) compileITruncFromF(o *wazeroir.OperationITruncFromF) err
c.compileRegisterToRegisterInstruction(arm64.AMSR, zeroRegister, arm64.REG_FPSR)

var convinst obj.As
if o.InputType == wazeroir.Float32 && o.OutputType == wazeroir.SignedInt32 {
var is32bitFloat = o.InputType == wazeroir.Float32
if is32bitFloat && o.OutputType == wazeroir.SignedInt32 {
convinst = arm64.AFCVTZSSW
} else if o.InputType == wazeroir.Float32 && o.OutputType == wazeroir.SignedInt64 {
} else if is32bitFloat && o.OutputType == wazeroir.SignedInt64 {
convinst = arm64.AFCVTZSS
} else if o.InputType == wazeroir.Float64 && o.OutputType == wazeroir.SignedInt32 {
} else if !is32bitFloat && o.OutputType == wazeroir.SignedInt32 {
convinst = arm64.AFCVTZSDW
} else if o.InputType == wazeroir.Float64 && o.OutputType == wazeroir.SignedInt64 {
} else if !is32bitFloat && o.OutputType == wazeroir.SignedInt64 {
convinst = arm64.AFCVTZSD
} else if o.InputType == wazeroir.Float32 && o.OutputType == wazeroir.SignedUint32 {
} else if is32bitFloat && o.OutputType == wazeroir.SignedUint32 {
convinst = arm64.AFCVTZUSW
} else if o.InputType == wazeroir.Float32 && o.OutputType == wazeroir.SignedUint64 {
} else if is32bitFloat && o.OutputType == wazeroir.SignedUint64 {
convinst = arm64.AFCVTZUS
} else if o.InputType == wazeroir.Float64 && o.OutputType == wazeroir.SignedUint32 {
} else if !is32bitFloat && o.OutputType == wazeroir.SignedUint32 {
convinst = arm64.AFCVTZUDW
} else if o.InputType == wazeroir.Float64 && o.OutputType == wazeroir.SignedUint64 {
} else if !is32bitFloat && o.OutputType == wazeroir.SignedUint64 {
convinst = arm64.AFCVTZUD
}
c.compileSimpleConversion(convinst, generalPurposeRegisterTypeInt)

source, err := c.popValueOnRegister()
if err != nil {
return err
}

destinationReg, err := c.allocateRegister(generalPurposeRegisterTypeInt)
if err != nil {
return err
}

c.compileRegisterToRegisterInstruction(convinst, source.register, destinationReg)
c.locationStack.pushValueLocationOnRegister(destinationReg)

// Obtain the floating point status register value into the general purpose register,
// so that we can check if the conversion resulted in undefined behavior.
Expand All @@ -2366,12 +2409,30 @@ func (c *arm64Compiler) compileITruncFromF(o *wazeroir.OperationITruncFromF) err
// See https://developer.arm.com/documentation/ddi0595/2020-12/AArch64-Registers/FPSR--Floating-point-Status-Register
c.compileRegisterAndConstSourceToNoneInstruction(arm64.ACMP, reservedRegisterForTemporary, 1)

// If so, exit the execution with jitCallStatusCodeInvalidFloatToIntConversion.
br := c.compilelBranchInstruction(arm64.ABNE)
c.compileExitFromNativeCode(jitCallStatusCodeInvalidFloatToIntConversion)
brOK := c.compilelBranchInstruction(arm64.ABNE)

// If so, exit the execution with errors depending on whether or not the source value is NaN.
{
var floatcmp obj.As
if is32bitFloat {
floatcmp = arm64.AFCMPS
} else {
floatcmp = arm64.AFCMPD
}
c.compileTwoRegistersToNoneInstruction(floatcmp, source.register, source.register)
// VS flag is set if at least one of values for FCMP is NaN.
// https://developer.arm.com/documentation/dui0801/g/Condition-Codes/Comparison-of-condition-code-meanings-in-integer-and-floating-point-code
brIfSourceNaN := c.compilelBranchInstruction(arm64.ABVS)

// If the source value is not NaN, the operation was overflow.
c.compileExitFromNativeCode(jitCallStatusIntegerOverflow)
// Otherwise, the operation was invalid as this is trying to convert NaN to integer.
c.setBranchTargetOnNext(brIfSourceNaN)
c.compileExitFromNativeCode(jitCallStatusCodeInvalidFloatToIntConversion)
}

// Otherwise, we branch into the next instruction.
c.setBranchTargetOnNext(br)
c.setBranchTargetOnNext(brOK)
return nil
}

Expand Down Expand Up @@ -3336,7 +3397,6 @@ func (c *arm64Compiler) compileModuleContextInitialization() error {
arm64.AMOVD, tmpX,
reservedRegisterForEngine, engineModuleContextGlobalElement0AddressOffset,
)

}

// Update memoryElement0Address and memorySliceLen.
Expand Down
Loading

0 comments on commit 312c0e6

Please sign in to comment.