Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Added initial architecture decision records #1

Merged
merged 8 commits into from
Jun 7, 2024
35 changes: 35 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Algorand TypeScript

Algorand TypeScript is a partial implementation of the TypeScript programming language that runs on the Algorand Virtual Machine (AVM). It includes a statically typed framework for development of Algorand smart contracts and logic signatures, with TypeScript interfaces to underlying AVM functionality that works with standard TypeScript tooling.

It maintains the syntax and semantics of TypeScript such that a developer who knows TypeScript can make safe assumptions
about the behaviour of the compiled code when running on the AVM. Algorand TypeScript is also executable TypeScript that can be run
and debugged on a Node.js virtual machine with transpilation to EcmaScript and run from automated tests.
robdmoore marked this conversation as resolved.
Show resolved Hide resolved

# Guiding Principals

## Familiarity

Where the base language (TypeScript/EcmaScript) doesn't support a given feature natively (eg. unsigned fixed size integers),
prior art should be used to inspire an API that is familiar to a user of the base language and transpilation can be used to
ensure this code executes correctly.

## Leveraging TypeScript type system

TypeScript's type system should be used where ever possible to ensure code is type safe before compilation to create a fast
feedback loop and nudge users into the [pit of success](https://blog.codinghorror.com/falling-into-the-pit-of-success/).

## TEALScript compatibility

[TEALScript](https://github.com/algorandfoundation/tealscript/) is an existing TypeScript-like language to TEAL compiler however the source code is not executable TypeScript, and it does not prioritise semantic compatibility. Wherever possible, Algorand TypeScript should endeavour to be compatible with existing TEALScript contracts and where not possible migratable with minimal changes.

## Algorand Python

[Algorand Python](https://algorandfoundation.github.io/puya/) is the Python equivalent of Algorand TypeScript. Whilst there is a primary goal to produce an API which makes sense in the TypeScript ecosystem, a secondary goal is to minimise the disparity between the two APIs such that users who choose to, or are required to develop on both platforms are not facing a completely unfamiliar API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## ABI Abstraction
When possible, Algorand TypeScript should avoid putting the cognitive overhead of ABI encoding/decoding on the developer. For example, there should be no different between AVM byteslices and ABI encoded strings and they should be directly comparable and compatible until the point of encoding (returning, putting in state, array encoding, logging, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slightly more nuanced view of this.

There should be default typing options that meet this principle. Like how in Algorand Python if you use a UInt64 or String then Puya takes care of encoding/decoding for you.

But, I think it's important to also have control over that stuff when you want/need it, or there isn't a direct translation between AVM primitives and the ABI type. The way this was achieved in Algorand Python was to have primitive types (e.g. UInt64, String) that automatically get decoded and encoded on their way in/out and represented with the equivalent ABI type in ARC-32/4, but then to expose specific ABI encoded types in a separate (arc4) namespace, so when you want to work with the encoded data you can do that too (and those types all have a .native property to easily decode to the relevant underlying AVM type. Reference: https://algorandfoundation.github.io/puya/lg-types.html

Regardless, this is not relevant to these ADRs because they are talking about AVM primitive types. We plan on having a separate ADR to discuss how ABI types are handled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the scenarios where a developer wants to perform operations on the encoded bytes? Seems to be very rare. For edge cases TEALScript provides a rawBytes function that will give you the encoded bytes of the value.

Regardless, this is not relevant to these ADRs because they are talking about AVM primitive types. We plan on having a separate ADR to discuss how ABI types are handled.

Ah gotcha. I do think they are a bit linked though because compatibility between native types and ABI types is important. It's something develoeprs have tripped up on with Beaker and Algorand Python

# Architecture decisions

As part of developing Algorand TypeScript we are documenting key architecture decisions using [Architecture Decision Records (ADRs)](https://adr.github.io/). The following are the key decisions that have been made thus far:

- [2024-05-21: Primitive integer types](./architecture-decisions/2024-05-21_primitive-integer-types.md)
- [2024-05-21: Primitive byte and string types](./architecture-decisions/2024-05-21_primitive-bytes-and-strings.md)
122 changes: 122 additions & 0 deletions docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Architecture Decision Record - Primitive bytes and strings

- **Status**: Draft
- **Owner:** Tristan Menzel
- **Deciders**: Alessandro Cappellato (Algorand Foundation), Joe Polny (Algorand Foundation), Rob Moore (MakerX)
- **Date created**: 2024-05-21
- **Date decided**: N/A
- **Date updated**: 2024-05-22

## Context

See [Architecture Decision Record - Primitive integer types](./2024-05-21_primitive-bytes-and-strings.md) for related decision and context.
joe-p marked this conversation as resolved.
Show resolved Hide resolved

The AVM's only non-integer type is a variable length byte array. When *not* being interpreted as a `biguint`, leading zeros are significant and length is constant unless explicitly manipulated. Strings can only be represented in the AVM if they are encoded as bytes. The AVM supports byte literals in the form of base16, base64, and utf8 encoded strings. Once a literal has been parsed, the AVM has no concept of the original encoding or of utf8 characters. As a result, whilst a byte array can be indexed to receive a single byte (or a slice of bytes); it cannot be index to return a single utf8 *character* - unless one assumes all characters in the original string were ASCII (ie. single byte) characters.
joe-p marked this conversation as resolved.
Show resolved Hide resolved

EcmaScript provides two relevant types for bytes and strings.

- **string**: The native string type. Supports arbitrary length, concatenation, indexation/slicing of characters plus many utility methods (upper/lower/startswith/endswith/charcodeat/trim etc). Supports concat with binary `+` operator.
- **Uint8Array**: A variable length mutable array of 8-bit numbers. Supports indexing/slicing of 'bytes'.

TealScript uses a branded string to represent bytes. Base64/Base16 encoding/decoding is performed with specific ops. The prototype of these objects contains string specific apis that are not implemented.

Algorand Python has specific [Bytes and String types](https://algorandfoundation.github.io/puya/lg-types.html#avm-types) that have semantics that exactly match the AVM semantics. Python allows for operator overloading so these types also use native operators (where they align to functionality in the underlying AVM).


## Requirements

- Support bytes AVM type and a string type that supports ASCII UTF-8 strings
- Use idiomatic TypeScript expressions for string expressions, including concatenation operator (`+`)
- Semantic compatibility when executing on Node.js (e.g. in unit tests) and AVM
robdmoore marked this conversation as resolved.
Show resolved Hide resolved

## Principles

- **[AlgoKit Guiding Principles](https://github.com/algorandfoundation/algokit-cli/blob/main/docs/algokit.md#guiding-principles)** - specifically Seamless onramp, Leverage existing ecosystem, Meet devs where they are
- **[Algorand Python Principles](https://algorandfoundation.github.io/puya/principles.html#principles)**
- **[Algorand TypeScript Guiding Principles](../README.md#guiding-principals)**

## Options


### Option 1 - Direct use of native EcmaScript types

```ts
const b1 = "somebytes"

const b2 = new Uint8Array([1, 2, 3, 4])

const b3 = b1 + b1
```

Whilst binary data is often a representation of a utf-8 string, it is not always - so direct use of the string type is not a natural fit. It doesn't allow us to represent alternative encodings (b16/b64) and the existing api surface is very 'string' centric. Much of the api would also be expensive to implement on the AVM leading to a bunch of 'dead' methods hanging off the type (or a significant amount of work implementing all the methods).

The Uint8Array type is fit for purpose as an encoding mechanism but the API is not as friendly as it could be for writing declarative contracts. The `new` keyword feels unnatural for something that is ostensibly a primitive type. The fact that it is mutable also complicates the implementation the compiler produces for the AVM.
robdmoore marked this conversation as resolved.
Show resolved Hide resolved

### Option 2 - Define a class to represent Bytes

A `Bytes` class is defined with a very specific API tailored to operations which are available on the AVM:

```ts
class Bytes {
constructor(v: string) {
this.v = v
}

concat(other: Bytes): Bytes {
return new Bytes(this.v + other.v)
}

at(x: uint64): Bytes {
return new Bytes(this.v[x])
}

/* etc */
}

```

This solution provides great type safety and requires no transpilation to run _correctly_ on Node.js. However, non-primitive types in node have equality checked by reference. Again the `new` keyword feels unnatural. Due to lack of overloading, `+` will not work as expected however concatenations do not require the same understanding of "order of operations" and nesting as numeric operations, so a concat method isn't as unwieldy (but still isn't idiomatic).
robdmoore marked this conversation as resolved.
Show resolved Hide resolved

```ts
const a = new Bytes("Hello")
const b = new Bytes("World")

function testValue(x: Bytes) {
// No compile error, but will work on reference not value
switch(x) {
case a:
return b
case b:
return a
}
return new Bytes("default")
}
```

To have equality checks behave as expected we would need a transpilation step to replace bytes values in certain expressions with a primitive type.

### Option 3 - Implement bytes as a class but define it as a type + factory

We can iron out some of the rough edges of using a class by only exposing a factory method for `Bytes` and a resulting type `bytes`. This removes the need for the `new` keyword and lets us use a 'primitive looking' type alias (`bytes` versus `Bytes` - much like `string` and `String`). We can use [tagged templates](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates) to improve the ux of multipart concat expressions.

```ts

const a = Bytes("Hello")
const b = Bytes.fromHex("ABFF")
const c = Bytes.fromBase64("...")
const d = Bytes.fromInts(255, 123, 28, 20)


function testValue(x: bytes, y: bytes): bytes {
return Bytes`${x} and ${y}`
}

```

robdmoore marked this conversation as resolved.
Show resolved Hide resolved
## Preferred option

TBD

## Selected option

TBD
157 changes: 157 additions & 0 deletions docs/architecture-decisions/2024-05-21_primitive-integer-types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Architecture Decision Record - Primitive integer types

- **Status**: Draft
- **Owner:** Tristan Menzel
- **Deciders**: Alessandro Cappellato (Algorand Foundation), Joe Polny (Algorand Foundation), Rob Moore (MakerX)
- **Date created**: 2024-05-21
- **Date decided**: N/A
- **Date updated**: 2024-05-22

## Context

The AVM supports two integer types in its standard set of ops.

* **uint64**: An unsigned 64-bit integer where the AVM will error on over or under flows
* **biguint**: An unsigned variable bit, big-endian integer represented as an array of bytes with an indeterminate number of leading zeros which are truncated by several math ops. The max size of a biguint is 512-bits. Over and under flows will cause errors.

EcmaScript supports two numeric types.

* **number**: A floating point signed value with 64 bits of precision capable of a max safe integer value of 2^53 - 1. A number can be declared with a numeric literal, or with the `Number(...)` factory method.
* **bigint**: A signed arbitrary-precision integer with an implementation defined limit based on the platform. In practice this is greater than 512-bit. A bigint can be declared with a numeric literal and `n` suffix, or with the `BigInt(...)` factory method.

EcmaScript and TypeScript both do not support operator overloading, despite some [previous](https://github.com/tc39/notes/blob/main/meetings/2023-11/november-28.md#withdrawing-operator-overloading) [attempts](https://github.com/microsoft/TypeScript/issues/2319) to do so.

TealScript [makes use of branded `number` types](https://tealscript.netlify.app/guides/supported-types/numbers/) for all bit sizes from 8 => 512. Since the source code is never executed, the safe limits of the `number` type are not a concern. Compiled code does not perform overflow checks on calculations until a return value is being encoded meaning a uint<8> is effectively a uint<64> until it's returned.
robdmoore marked this conversation as resolved.
Show resolved Hide resolved

Algorand Python has specific [UInt64 and BigUint types](https://algorandfoundation.github.io/puya/lg-types.html#avm-types) that have semantics that exactly match the AVM semantics. Python allows for operator overloading so these types also use native operators (where they align to functionality in the underlying AVM).


## Requirements

- Support uint64 and biguint AVM types
- Use idiomatic TypeScript expressions for numeric expressions, including mathematical operators (`+`, `-`, `*`, `/`, etc.)
- Semantic compatibility when executing on Node.js (e.g. in unit tests) and AVM
robdmoore marked this conversation as resolved.
Show resolved Hide resolved

## Principles

- **[AlgoKit Guiding Principles](https://github.com/algorandfoundation/algokit-cli/blob/main/docs/algokit.md#guiding-principles)** - specifically Seamless onramp, Leverage existing ecosystem, Meet devs where they are
- **[Algorand Python Principles](https://algorandfoundation.github.io/puya/principles.html#principles)**
- **[Algorand TypeScript Guiding Principles](../README.md#guiding-principals)**

## Options

### Option 1 - Direct use of native EcmaScript types

EcmaScript's `number` type is ill-suited to representing either AVM type reliably as it does not have the safe range to cover the full range of a uint64. Being a floating point number, it would also require truncating after division.

EcmaScript's `bigint` is a better fit for both types but does not underflow when presented with a negative number, nor does it overflow at any meaningful limit for the AVM types.
robdmoore marked this conversation as resolved.
Show resolved Hide resolved

If we solved the over/under flow checking with transpilation we still face an issue that `uint64` and `biguint` would not have discrete types and thus, we would have no type safety against accidentally passing a `biguint` to a method that expects a `uint64` and vice versa.

### Option 2 - Define classes to represent the AVM types

A `UInt64` and `BigUint` class could be defined which make use of `bigint` internally to perform maths operations and check for over or under flows after each op.

```ts
class UInt64 {

private value: bigint

constructor(value: bigint | number) {
this.value = this.checkBounds(value)
}

add(other: UInt64): UInt64 {
return new UInt64(this.value + other.value)
}

/* etc */
}

```

This solution provides the ultimate in type safety and semantic/syntactic compatibility, and requires no transpilation to run _correctly_ on Node.js. The semantics should be obvious to anyone familiar with Object Oriented Programming. The downside is that neither EcmaScript nor TypeScript support operator overloading which results in more verbose and unwieldy math expressions.

```ts
const a = UInt64(500n)
const b = Uint64(256)

// Not supported (a compile error in TS, unhelpful behaviour in ES)
const c1 = a + b
// Works, but is verbose and unwieldy for more complicated expressions and isn't idiomatic TypeScript
const c2 = a.add(b)
robdmoore marked this conversation as resolved.
Show resolved Hide resolved

```

### Option 3 - Use tagged/branded number types

TypeScript allows you to intersect primitive types with a simple interface to brand a value in a way which is incompatible with another primitive branded with a different value within the type system.

```ts
// Constructors
declare function UInt64(v): uint64
declare function BigUint(v): uint64

// Branded types
type uint64 = bigint & { __type?: 'uint64' }
robdmoore marked this conversation as resolved.
Show resolved Hide resolved
type biguint = bigint & { __type?: 'biguint' }


const a: uint64 = 323n // Declare with type annotation
const b = UInt64(12n) // Declare with factory

// c1 type is `bigint`, but we can mandate a type hint with the compiler (c2)
const c1 = a + b
const c2: uint64 = a + b
// No TypeScript type error, but semantically ambiguous - is a+b performed as a biguint op or a uint64 one and then converted?
// (We could detect this as a compiler error though)
const c3: biguint = a + b

// Type error on b: Argument of type 'uint64' is not assignable to parameter of type 'biguint'. Nice!
test(a, b)

function test(x: uint64, y: biguint) {
// ...
}

```

This solution looks most like natural TypeScript / EcmaScript and results in math expressions that are much easier to read. The factory methods mimic native equivalents and should be familiar to existing developers.

The drawbacks of this solution are:
- Less implicit type safety as TypeScript will infer the type of any binary math expression to be the base numeric type (`number`). A type annotation will be required where ever an identifier is declared and additional type checking will be required by the compiler to catch instances of assigning one numeric type to the other.
- In order to have 'run on Node.js' semantics of a `uint64` or `biguint` match 'run on the AVM', a transpiler will be required to wrap numeric operations in logic that checks for over and under flows.

A variation of the above with non-optional `__type` tags would prevent accidental implicit assignment errors, but require explicit casting on all ops
robdmoore marked this conversation as resolved.
Show resolved Hide resolved

```ts
declare function Uint64(v): uint64
declare function BigUint(v): uint64

type uint64 = bigint & { __type: 'uint64' }
type biguint = bigint & { __type: 'biguint' }

// Require factory or cast on declaration
const a: uint64 = 323n as uint64
const b = Uint64(12n)

// Also require factory or cast on math
let c2: uint64

c2 = a + b // error
c2 = Uint64(a + b) // ok
c2 = (a + b) as uint64 // ok
```

This introduces a degree of type safety at the expense of legibility.

TealScript uses a similar approach to this, but uses `number` as the underlying type rather than `bigint`, which has the aforementioned downside of not being able to safely represent a 64-bit unsigned integer.


## Preferred option

TBD

## Selected option

TBD