From 4921462c80f06f898eff2efae7f0217e55abd39e Mon Sep 17 00:00:00 2001 From: "Rob Moore (MakerX)" Date: Mon, 27 May 2024 16:47:40 +0800 Subject: [PATCH 1/8] docs: Added initial architecture decision records --- docs/README.md | 35 ++++ .../2024-05-21_primitive-bytes-and-strings.md | 122 ++++++++++++++ .../2024-05-21_primitive-integer-types.md | 157 ++++++++++++++++++ 3 files changed, 314 insertions(+) create mode 100644 docs/README.md create mode 100644 docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md create mode 100644 docs/architecture-decisions/2024-05-21_primitive-integer-types.md diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 00000000..8a377851 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,35 @@ +# Algorand TypeScript + +Algorand TypeScript is a partial implementation of the TypeScript programming language that runs on the Algorand Virtual Machine (AVM). It includes a statically typed framework for development of Algorand smart contracts and logic signatures, with TypeScript interfaces to underlying AVM functionality that works with standard TypeScript tooling. + +It maintains the syntax and semantics of TypeScript such that a developer who knows TypeScript can make safe assumptions +about the behaviour of the compiled code when running on the AVM. Algorand TypeScript is also executable TypeScript that can be run +and debugged on a Node.js virtual machine with transpilation to EcmaScript and run from automated tests. + +# Guiding Principals + +## Familiarity + +Where the base language (TypeScript/EcmaScript) doesn't support a given feature natively (eg. unsigned fixed size integers), +prior art should be used to inspire an API that is familiar to a user of the base language and transpilation can be used to +ensure this code executes correctly. + +## Leveraging TypeScript type system + +TypeScript's type system should be used where ever possible to ensure code is type safe before compilation to create a fast +feedback loop and nudge users into the [pit of success](https://blog.codinghorror.com/falling-into-the-pit-of-success/). + +## TEALScript compatibility + +[TEALScript](https://github.com/algorandfoundation/tealscript/) is an existing TypeScript-like language to TEAL compiler however the source code is not executable TypeScript, and it does not prioritise semantic compatibility. Wherever possible, Algorand TypeScript should endeavour to be compatible with existing TEALScript contracts and where not possible migratable with minimal changes. + +## Algorand Python + +[Algorand Python](https://algorandfoundation.github.io/puya/) is the Python equivalent of Algorand TypeScript. Whilst there is a primary goal to produce an API which makes sense in the TypeScript ecosystem, a secondary goal is to minimise the disparity between the two APIs such that users who choose to, or are required to develop on both platforms are not facing a completely unfamiliar API. + +# Architecture decisions + +As part of developing Algorand TypeScript we are documenting key architecture decisions using [Architecture Decision Records (ADRs)](https://adr.github.io/). The following are the key decisions that have been made thus far: + +- [2024-05-21: Primitive integer types](./architecture-decisions/2024-05-21_primitive-integer-types.md) +- [2024-05-21: Primitive byte and string types](./architecture-decisions/2024-05-21_primitive-bytes-and-strings.md) diff --git a/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md b/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md new file mode 100644 index 00000000..fa1843b0 --- /dev/null +++ b/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md @@ -0,0 +1,122 @@ +# Architecture Decision Record - Primitive bytes and strings + +- **Status**: Draft +- **Owner:** Tristan Menzel +- **Deciders**: Alessandro Cappellato (Algorand Foundation), Joe Polny (Algorand Foundation), Rob Moore (MakerX) +- **Date created**: 2024-05-21 +- **Date decided**: N/A +- **Date updated**: 2024-05-22 + +## Context + +See [Architecture Decision Record - Primitive integer types](./2024-05-21_primitive-bytes-and-strings.md) for related decision and context. + +The AVM's only non-integer type is a variable length byte array. When *not* being interpreted as a `biguint`, leading zeros are significant and length is constant unless explicitly manipulated. Strings can only be represented in the AVM if they are encoded as bytes. The AVM supports byte literals in the form of base16, base64, and utf8 encoded strings. Once a literal has been parsed, the AVM has no concept of the original encoding or of utf8 characters. As a result, whilst a byte array can be indexed to receive a single byte (or a slice of bytes); it cannot be index to return a single utf8 *character* - unless one assumes all characters in the original string were ASCII (ie. single byte) characters. + +EcmaScript provides two relevant types for bytes and strings. + + - **string**: The native string type. Supports arbitrary length, concatenation, indexation/slicing of characters plus many utility methods (upper/lower/startswith/endswith/charcodeat/trim etc). Supports concat with binary `+` operator. + - **Uint8Array**: A variable length mutable array of 8-bit numbers. Supports indexing/slicing of 'bytes'. + +TealScript uses a branded string to represent bytes. Base64/Base16 encoding/decoding is performed with specific ops. The prototype of these objects contains string specific apis that are not implemented. + +Algorand Python has specific [Bytes and String types](https://algorandfoundation.github.io/puya/lg-types.html#avm-types) that have semantics that exactly match the AVM semantics. Python allows for operator overloading so these types also use native operators (where they align to functionality in the underlying AVM). + + +## Requirements + +- Support bytes AVM type and a string type that supports ASCII UTF-8 strings +- Use idiomatic TypeScript expressions for string expressions, including concatenation operator (`+`) +- Semantic compatibility when executing on Node.js (e.g. in unit tests) and AVM + +## Principles + +- **[AlgoKit Guiding Principles](https://github.com/algorandfoundation/algokit-cli/blob/main/docs/algokit.md#guiding-principles)** - specifically Seamless onramp, Leverage existing ecosystem, Meet devs where they are +- **[Algorand Python Principles](https://algorandfoundation.github.io/puya/principles.html#principles)** +- **[Algorand TypeScript Guiding Principles](../README.md#guiding-principals)** + +## Options + + +### Option 1 - Direct use of native EcmaScript types + +```ts +const b1 = "somebytes" + +const b2 = new Uint8Array([1, 2, 3, 4]) + +const b3 = b1 + b1 +``` + +Whilst binary data is often a representation of a utf-8 string, it is not always - so direct use of the string type is not a natural fit. It doesn't allow us to represent alternative encodings (b16/b64) and the existing api surface is very 'string' centric. Much of the api would also be expensive to implement on the AVM leading to a bunch of 'dead' methods hanging off the type (or a significant amount of work implementing all the methods). + +The Uint8Array type is fit for purpose as an encoding mechanism but the API is not as friendly as it could be for writing declarative contracts. The `new` keyword feels unnatural for something that is ostensibly a primitive type. The fact that it is mutable also complicates the implementation the compiler produces for the AVM. + +### Option 2 - Define a class to represent Bytes + +A `Bytes` class is defined with a very specific API tailored to operations which are available on the AVM: + +```ts +class Bytes { + constructor(v: string) { + this.v = v + } + + concat(other: Bytes): Bytes { + return new Bytes(this.v + other.v) + } + + at(x: uint64): Bytes { + return new Bytes(this.v[x]) + } + + /* etc */ +} + +``` + +This solution provides great type safety and requires no transpilation to run _correctly_ on Node.js. However, non-primitive types in node have equality checked by reference. Again the `new` keyword feels unnatural. Due to lack of overloading, `+` will not work as expected however concatenations do not require the same understanding of "order of operations" and nesting as numeric operations, so a concat method isn't as unwieldy (but still isn't idiomatic). + +```ts +const a = new Bytes("Hello") +const b = new Bytes("World") + +function testValue(x: Bytes) { + // No compile error, but will work on reference not value + switch(x) { + case a: + return b + case b: + return a + } + return new Bytes("default") +} +``` + +To have equality checks behave as expected we would need a transpilation step to replace bytes values in certain expressions with a primitive type. + +### Option 3 - Implement bytes as a class but define it as a type + factory + +We can iron out some of the rough edges of using a class by only exposing a factory method for `Bytes` and a resulting type `bytes`. This removes the need for the `new` keyword and lets us use a 'primitive looking' type alias (`bytes` versus `Bytes` - much like `string` and `String`). We can use [tagged templates](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates) to improve the ux of multipart concat expressions. + +```ts + +const a = Bytes("Hello") +const b = Bytes.fromHex("ABFF") +const c = Bytes.fromBase64("...") +const d = Bytes.fromInts(255, 123, 28, 20) + + +function testValue(x: bytes, y: bytes): bytes { + return Bytes`${x} and ${y}` +} + +``` + +## Preferred option + +TBD + +## Selected option + +TBD diff --git a/docs/architecture-decisions/2024-05-21_primitive-integer-types.md b/docs/architecture-decisions/2024-05-21_primitive-integer-types.md new file mode 100644 index 00000000..2dbb4380 --- /dev/null +++ b/docs/architecture-decisions/2024-05-21_primitive-integer-types.md @@ -0,0 +1,157 @@ +# Architecture Decision Record - Primitive integer types + +- **Status**: Draft +- **Owner:** Tristan Menzel +- **Deciders**: Alessandro Cappellato (Algorand Foundation), Joe Polny (Algorand Foundation), Rob Moore (MakerX) +- **Date created**: 2024-05-21 +- **Date decided**: N/A +- **Date updated**: 2024-05-22 + +## Context + +The AVM supports two integer types in its standard set of ops. + +* **uint64**: An unsigned 64-bit integer where the AVM will error on over or under flows +* **biguint**: An unsigned variable bit, big-endian integer represented as an array of bytes with an indeterminate number of leading zeros which are truncated by several math ops. The max size of a biguint is 512-bits. Over and under flows will cause errors. + +EcmaScript supports two numeric types. + +* **number**: A floating point signed value with 64 bits of precision capable of a max safe integer value of 2^53 - 1. A number can be declared with a numeric literal, or with the `Number(...)` factory method. +* **bigint**: A signed arbitrary-precision integer with an implementation defined limit based on the platform. In practice this is greater than 512-bit. A bigint can be declared with a numeric literal and `n` suffix, or with the `BigInt(...)` factory method. + +EcmaScript and TypeScript both do not support operator overloading, despite some [previous](https://github.com/tc39/notes/blob/main/meetings/2023-11/november-28.md#withdrawing-operator-overloading) [attempts](https://github.com/microsoft/TypeScript/issues/2319) to do so. + +TealScript [makes use of branded `number` types](https://tealscript.netlify.app/guides/supported-types/numbers/) for all bit sizes from 8 => 512. Since the source code is never executed, the safe limits of the `number` type are not a concern. Compiled code does not perform overflow checks on calculations until a return value is being encoded meaning a uint<8> is effectively a uint<64> until it's returned. + +Algorand Python has specific [UInt64 and BigUint types](https://algorandfoundation.github.io/puya/lg-types.html#avm-types) that have semantics that exactly match the AVM semantics. Python allows for operator overloading so these types also use native operators (where they align to functionality in the underlying AVM). + + +## Requirements + +- Support uint64 and biguint AVM types +- Use idiomatic TypeScript expressions for numeric expressions, including mathematical operators (`+`, `-`, `*`, `/`, etc.) +- Semantic compatibility when executing on Node.js (e.g. in unit tests) and AVM + +## Principles + +- **[AlgoKit Guiding Principles](https://github.com/algorandfoundation/algokit-cli/blob/main/docs/algokit.md#guiding-principles)** - specifically Seamless onramp, Leverage existing ecosystem, Meet devs where they are +- **[Algorand Python Principles](https://algorandfoundation.github.io/puya/principles.html#principles)** +- **[Algorand TypeScript Guiding Principles](../README.md#guiding-principals)** + +## Options + +### Option 1 - Direct use of native EcmaScript types + +EcmaScript's `number` type is ill-suited to representing either AVM type reliably as it does not have the safe range to cover the full range of a uint64. Being a floating point number, it would also require truncating after division. + +EcmaScript's `bigint` is a better fit for both types but does not underflow when presented with a negative number, nor does it overflow at any meaningful limit for the AVM types. + +If we solved the over/under flow checking with transpilation we still face an issue that `uint64` and `biguint` would not have discrete types and thus, we would have no type safety against accidentally passing a `biguint` to a method that expects a `uint64` and vice versa. + +### Option 2 - Define classes to represent the AVM types + +A `UInt64` and `BigUint` class could be defined which make use of `bigint` internally to perform maths operations and check for over or under flows after each op. + +```ts +class UInt64 { + + private value: bigint + + constructor(value: bigint | number) { + this.value = this.checkBounds(value) + } + + add(other: UInt64): UInt64 { + return new UInt64(this.value + other.value) + } + + /* etc */ +} + +``` + +This solution provides the ultimate in type safety and semantic/syntactic compatibility, and requires no transpilation to run _correctly_ on Node.js. The semantics should be obvious to anyone familiar with Object Oriented Programming. The downside is that neither EcmaScript nor TypeScript support operator overloading which results in more verbose and unwieldy math expressions. + +```ts +const a = UInt64(500n) +const b = Uint64(256) + +// Not supported (a compile error in TS, unhelpful behaviour in ES) +const c1 = a + b +// Works, but is verbose and unwieldy for more complicated expressions and isn't idiomatic TypeScript +const c2 = a.add(b) + +``` + +### Option 3 - Use tagged/branded number types + +TypeScript allows you to intersect primitive types with a simple interface to brand a value in a way which is incompatible with another primitive branded with a different value within the type system. + +```ts +// Constructors +declare function UInt64(v): uint64 +declare function BigUint(v): uint64 + +// Branded types +type uint64 = bigint & { __type?: 'uint64' } +type biguint = bigint & { __type?: 'biguint' } + + +const a: uint64 = 323n // Declare with type annotation +const b = UInt64(12n) // Declare with factory + +// c1 type is `bigint`, but we can mandate a type hint with the compiler (c2) +const c1 = a + b +const c2: uint64 = a + b +// No TypeScript type error, but semantically ambiguous - is a+b performed as a biguint op or a uint64 one and then converted? +// (We could detect this as a compiler error though) +const c3: biguint = a + b + +// Type error on b: Argument of type 'uint64' is not assignable to parameter of type 'biguint'. Nice! +test(a, b) + +function test(x: uint64, y: biguint) { + // ... +} + +``` + +This solution looks most like natural TypeScript / EcmaScript and results in math expressions that are much easier to read. The factory methods mimic native equivalents and should be familiar to existing developers. + +The drawbacks of this solution are: + - Less implicit type safety as TypeScript will infer the type of any binary math expression to be the base numeric type (`number`). A type annotation will be required where ever an identifier is declared and additional type checking will be required by the compiler to catch instances of assigning one numeric type to the other. + - In order to have 'run on Node.js' semantics of a `uint64` or `biguint` match 'run on the AVM', a transpiler will be required to wrap numeric operations in logic that checks for over and under flows. + +A variation of the above with non-optional `__type` tags would prevent accidental implicit assignment errors, but require explicit casting on all ops + +```ts +declare function Uint64(v): uint64 +declare function BigUint(v): uint64 + +type uint64 = bigint & { __type: 'uint64' } +type biguint = bigint & { __type: 'biguint' } + +// Require factory or cast on declaration +const a: uint64 = 323n as uint64 +const b = Uint64(12n) + +// Also require factory or cast on math +let c2: uint64 + +c2 = a + b // error +c2 = Uint64(a + b) // ok +c2 = (a + b) as uint64 // ok +``` + +This introduces a degree of type safety at the expense of legibility. + +TealScript uses a similar approach to this, but uses `number` as the underlying type rather than `bigint`, which has the aforementioned downside of not being able to safely represent a 64-bit unsigned integer. + + +## Preferred option + +TBD + +## Selected option + +TBD From ec5e69ec4fa78d3e1dc7adb6fe621b7fe1f45845 Mon Sep 17 00:00:00 2001 From: "Rob Moore (MakerX)" Date: Tue, 28 May 2024 01:42:08 +0800 Subject: [PATCH 2/8] docs: Feedback on primitive ADRs --- .../2024-05-21_primitive-bytes-and-strings.md | 1 + .../2024-05-21_primitive-integer-types.md | 9 +++++---- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md b/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md index fa1843b0..14bb1132 100644 --- a/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md +++ b/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md @@ -105,6 +105,7 @@ const a = Bytes("Hello") const b = Bytes.fromHex("ABFF") const c = Bytes.fromBase64("...") const d = Bytes.fromInts(255, 123, 28, 20) +const e = Bytes`${a} World!` function testValue(x: bytes, y: bytes): bytes { diff --git a/docs/architecture-decisions/2024-05-21_primitive-integer-types.md b/docs/architecture-decisions/2024-05-21_primitive-integer-types.md index 2dbb4380..7a1a1ddb 100644 --- a/docs/architecture-decisions/2024-05-21_primitive-integer-types.md +++ b/docs/architecture-decisions/2024-05-21_primitive-integer-types.md @@ -21,7 +21,7 @@ EcmaScript supports two numeric types. EcmaScript and TypeScript both do not support operator overloading, despite some [previous](https://github.com/tc39/notes/blob/main/meetings/2023-11/november-28.md#withdrawing-operator-overloading) [attempts](https://github.com/microsoft/TypeScript/issues/2319) to do so. -TealScript [makes use of branded `number` types](https://tealscript.netlify.app/guides/supported-types/numbers/) for all bit sizes from 8 => 512. Since the source code is never executed, the safe limits of the `number` type are not a concern. Compiled code does not perform overflow checks on calculations until a return value is being encoded meaning a uint<8> is effectively a uint<64> until it's returned. +TealScript [makes use of branded `number` types](https://tealscript.netlify.app/guides/supported-types/numbers/) for all bit sizes from 8 => 512, although it doesn't allow `number` variables, you must specify the actual type you want (e.g. `uint64`). Since the source code is never executed, the safe limits of the `number` type are not a concern. Compiled code does not perform overflow checks on calculations until a return value is being encoded meaning a uint<8> is effectively a uint<64> until it's returned. Algorand Python has specific [UInt64 and BigUint types](https://algorandfoundation.github.io/puya/lg-types.html#avm-types) that have semantics that exactly match the AVM semantics. Python allows for operator overloading so these types also use native operators (where they align to functionality in the underlying AVM). @@ -122,7 +122,9 @@ The drawbacks of this solution are: - Less implicit type safety as TypeScript will infer the type of any binary math expression to be the base numeric type (`number`). A type annotation will be required where ever an identifier is declared and additional type checking will be required by the compiler to catch instances of assigning one numeric type to the other. - In order to have 'run on Node.js' semantics of a `uint64` or `biguint` match 'run on the AVM', a transpiler will be required to wrap numeric operations in logic that checks for over and under flows. -A variation of the above with non-optional `__type` tags would prevent accidental implicit assignment errors, but require explicit casting on all ops +TEALScript uses a similar approach to this, but uses `number` as the underlying type rather than `bigint`, which has the aforementioned downside of not being able to safely represent a 64-bit unsigned integer, but does has the advantage of being directly compatible with raw numbers (e.g. `3` rather than `3n`) and JavaScript prototype methods that return `number` like `array.length`. The requirement for semantic compatibility dictates that we need to use `bigint` rather than `number` since it's the correct type to represent the data and we will be able to create wrapper classes for things like arrays that have more explicitly typed methods for things like length. + +A variation of the above with non-optional `__type` tags would prevent accidental implicit assignment errors, but require explicit casting on all ops and any methods that return a `number` such as the `length` method on arrays. ```ts declare function Uint64(v): uint64 @@ -143,9 +145,8 @@ c2 = Uint64(a + b) // ok c2 = (a + b) as uint64 // ok ``` -This introduces a degree of type safety at the expense of legibility. +This introduces a degree of type safety with the in-built TypeScript type system at the expense of legibility. -TealScript uses a similar approach to this, but uses `number` as the underlying type rather than `bigint`, which has the aforementioned downside of not being able to safely represent a 64-bit unsigned integer. ## Preferred option From 29c9a99880128c10b5968cddbe9214213c6959fa Mon Sep 17 00:00:00 2001 From: "Rob Moore (MakerX)" Date: Tue, 28 May 2024 13:22:19 +0800 Subject: [PATCH 3/8] docs: Added more explicit note about + operator for Bytes --- .../2024-05-21_primitive-bytes-and-strings.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md b/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md index 14bb1132..0dfcdb69 100644 --- a/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md +++ b/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md @@ -97,7 +97,7 @@ To have equality checks behave as expected we would need a transpilation step to ### Option 3 - Implement bytes as a class but define it as a type + factory -We can iron out some of the rough edges of using a class by only exposing a factory method for `Bytes` and a resulting type `bytes`. This removes the need for the `new` keyword and lets us use a 'primitive looking' type alias (`bytes` versus `Bytes` - much like `string` and `String`). We can use [tagged templates](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates) to improve the ux of multipart concat expressions. +We can iron out some of the rough edges of using a class by only exposing a factory method for `Bytes` and a resulting type `bytes`. This removes the need for the `new` keyword and lets us use a 'primitive looking' type alias (`bytes` versus `Bytes` - much like `string` and `String`). We can use [tagged templates](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates) to improve the user experience of multipart concat expressions in lieu of having the `+` operator. ```ts From 42661d004410e68ae1d728221d6762d8f436ccbb Mon Sep 17 00:00:00 2001 From: Joe Polny <50534337+joe-p@users.noreply.github.com> Date: Thu, 30 May 2024 13:12:22 -0400 Subject: [PATCH 4/8] docs: Minor typos Co-authored-by: Neil Campbell --- .../2024-05-21_primitive-bytes-and-strings.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md b/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md index 0dfcdb69..c611a06d 100644 --- a/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md +++ b/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md @@ -9,9 +9,9 @@ ## Context -See [Architecture Decision Record - Primitive integer types](./2024-05-21_primitive-bytes-and-strings.md) for related decision and context. +See [Architecture Decision Record - Primitive integer types](./2024-05-21_primitive-integer-types.md) for related decision and context. -The AVM's only non-integer type is a variable length byte array. When *not* being interpreted as a `biguint`, leading zeros are significant and length is constant unless explicitly manipulated. Strings can only be represented in the AVM if they are encoded as bytes. The AVM supports byte literals in the form of base16, base64, and utf8 encoded strings. Once a literal has been parsed, the AVM has no concept of the original encoding or of utf8 characters. As a result, whilst a byte array can be indexed to receive a single byte (or a slice of bytes); it cannot be index to return a single utf8 *character* - unless one assumes all characters in the original string were ASCII (ie. single byte) characters. +The AVM's only non-integer type is a variable length byte array. When *not* being interpreted as a `biguint`, leading zeros are significant and length is constant unless explicitly manipulated. Strings can only be represented in the AVM if they are encoded as bytes. The AVM supports byte literals in the form of base16, base64, and utf8 encoded strings. Once a literal has been parsed, the AVM has no concept of the original encoding or of utf8 characters. As a result, whilst a byte array can be indexed to receive a single byte (or a slice of bytes); it cannot be indexed to return a single utf8 *character* - unless one assumes all characters in the original string were ASCII (ie. single byte) characters. EcmaScript provides two relevant types for bytes and strings. From 1facddfef948ab63b0390e7939941e88896ee502 Mon Sep 17 00:00:00 2001 From: "Rob Moore (MakerX)" Date: Fri, 31 May 2024 20:19:35 +0800 Subject: [PATCH 5/8] docs: PR feedback --- .../2024-05-21_primitive-bytes-and-strings.md | 52 +++++++++++++----- .../2024-05-21_primitive-integer-types.md | 55 +++++++++++++------ 2 files changed, 75 insertions(+), 32 deletions(-) diff --git a/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md b/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md index c611a06d..595181e0 100644 --- a/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md +++ b/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md @@ -5,20 +5,13 @@ - **Deciders**: Alessandro Cappellato (Algorand Foundation), Joe Polny (Algorand Foundation), Rob Moore (MakerX) - **Date created**: 2024-05-21 - **Date decided**: N/A -- **Date updated**: 2024-05-22 +- **Date updated**: 2024-05-31 ## Context See [Architecture Decision Record - Primitive integer types](./2024-05-21_primitive-integer-types.md) for related decision and context. -The AVM's only non-integer type is a variable length byte array. When *not* being interpreted as a `biguint`, leading zeros are significant and length is constant unless explicitly manipulated. Strings can only be represented in the AVM if they are encoded as bytes. The AVM supports byte literals in the form of base16, base64, and utf8 encoded strings. Once a literal has been parsed, the AVM has no concept of the original encoding or of utf8 characters. As a result, whilst a byte array can be indexed to receive a single byte (or a slice of bytes); it cannot be indexed to return a single utf8 *character* - unless one assumes all characters in the original string were ASCII (ie. single byte) characters. - -EcmaScript provides two relevant types for bytes and strings. - - - **string**: The native string type. Supports arbitrary length, concatenation, indexation/slicing of characters plus many utility methods (upper/lower/startswith/endswith/charcodeat/trim etc). Supports concat with binary `+` operator. - - **Uint8Array**: A variable length mutable array of 8-bit numbers. Supports indexing/slicing of 'bytes'. - -TealScript uses a branded string to represent bytes. Base64/Base16 encoding/decoding is performed with specific ops. The prototype of these objects contains string specific apis that are not implemented. +The AVM's only non-integer type is a variable length byte array. When *not* being interpreted as a `biguint`, leading zeros are significant and length is constant unless explicitly manipulated. Strings can only be represented in the AVM if they are encoded as bytes. The AVM supports byte literals in the form of base16, base64, and UTF-8 encoded strings. Once a literal has been parsed, the AVM has no concept of the original encoding or of UTF-8 characters. As a result, whilst a byte array can be indexed to receive a single byte (or a slice of bytes); it cannot be indexed to return a single UTF-8 *character* - unless one assumes all characters in the original string were ASCII (i.e. single byte) characters. Algorand Python has specific [Bytes and String types](https://algorandfoundation.github.io/puya/lg-types.html#avm-types) that have semantics that exactly match the AVM semantics. Python allows for operator overloading so these types also use native operators (where they align to functionality in the underlying AVM). @@ -26,8 +19,8 @@ Algorand Python has specific [Bytes and String types](https://algorandfoundation ## Requirements - Support bytes AVM type and a string type that supports ASCII UTF-8 strings -- Use idiomatic TypeScript expressions for string expressions, including concatenation operator (`+`) -- Semantic compatibility when executing on Node.js (e.g. in unit tests) and AVM +- Use idiomatic TypeScript expressions for string expressions +- Semantic compatibility between AVM execution and TypeScript execution (e.g. in unit tests) ## Principles @@ -40,6 +33,13 @@ Algorand Python has specific [Bytes and String types](https://algorandfoundation ### Option 1 - Direct use of native EcmaScript types + +EcmaScript provides two relevant types for bytes and strings. + + - **string**: The native string type. Supports arbitrary length, concatenation, indexation/slicing of characters plus many utility methods (upper/lower/startswith/endswith/charcodeat/trim etc). Supports concat with binary `+` operator. + - **Uint8Array**: A variable length mutable array of 8-bit numbers. Supports indexing/slicing of 'bytes'. + + ```ts const b1 = "somebytes" @@ -52,7 +52,29 @@ Whilst binary data is often a representation of a utf-8 string, it is not always The Uint8Array type is fit for purpose as an encoding mechanism but the API is not as friendly as it could be for writing declarative contracts. The `new` keyword feels unnatural for something that is ostensibly a primitive type. The fact that it is mutable also complicates the implementation the compiler produces for the AVM. -### Option 2 - Define a class to represent Bytes + + +### Option 2 - Branded strings (TEALScript approach) + + +TEALScript uses a branded `string` to represent `bytes` and native `string` to represent UTF-8 bytes. Base64/Base16 encoding/decoding is performed with specific methods. + +```typescript +const someString = "foo" +const someHexValue = hex("0xdeadbeef") // branded "bytes" +``` + +Bytes and UTF-8 strings are typed via branded `string` types. UTF-8 strings are the most common use case for strings, thus have the JavaScript `String` prototype functions when working with byteslice, which provides a familiar set of function signatures. This option also enables the usage of `+` for concatenation. + +To differentiate between ABI `string` and AVM `byteslice`, a branded type, `bytes`, can be used to represent non-encoded byteslices that may or may not be UTF-8 strings. + +Additional functions can be used when wanting to have string literals of a specific encoding represent a string or byteslice. + + +The downside of this approach is that the string prototype has a huge range of methods that are not all relevant and some of them return `number`, which is [not a semantically relevant type](./2024-05-21_primitive-integer-types.md). + + +### Option 3 - Define a class to represent Bytes A `Bytes` class is defined with a very specific API tailored to operations which are available on the AVM: @@ -75,12 +97,14 @@ class Bytes { ``` -This solution provides great type safety and requires no transpilation to run _correctly_ on Node.js. However, non-primitive types in node have equality checked by reference. Again the `new` keyword feels unnatural. Due to lack of overloading, `+` will not work as expected however concatenations do not require the same understanding of "order of operations" and nesting as numeric operations, so a concat method isn't as unwieldy (but still isn't idiomatic). +This solution provides great type safety and requires no transpilation to run _correctly_ on Node.js. However, non-primitive types in Node.js have equality checked by reference. Again the `new` keyword feels unnatural. Due to lack of overloading, `+` will not work as expected however concatenations do not require the same understanding of "order of operations" and nesting as numeric operations, so a `concat` method isn't as unwieldy (but still isn't idiomatic). ```ts const a = new Bytes("Hello") const b = new Bytes("World") +const ab = a.concat(b) + function testValue(x: Bytes) { // No compile error, but will work on reference not value switch(x) { @@ -95,7 +119,7 @@ function testValue(x: Bytes) { To have equality checks behave as expected we would need a transpilation step to replace bytes values in certain expressions with a primitive type. -### Option 3 - Implement bytes as a class but define it as a type + factory +### Option 4 - Implement bytes as a class but define it as a type + factory We can iron out some of the rough edges of using a class by only exposing a factory method for `Bytes` and a resulting type `bytes`. This removes the need for the `new` keyword and lets us use a 'primitive looking' type alias (`bytes` versus `Bytes` - much like `string` and `String`). We can use [tagged templates](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates) to improve the user experience of multipart concat expressions in lieu of having the `+` operator. diff --git a/docs/architecture-decisions/2024-05-21_primitive-integer-types.md b/docs/architecture-decisions/2024-05-21_primitive-integer-types.md index 7a1a1ddb..3d0ff22d 100644 --- a/docs/architecture-decisions/2024-05-21_primitive-integer-types.md +++ b/docs/architecture-decisions/2024-05-21_primitive-integer-types.md @@ -5,7 +5,7 @@ - **Deciders**: Alessandro Cappellato (Algorand Foundation), Joe Polny (Algorand Foundation), Rob Moore (MakerX) - **Date created**: 2024-05-21 - **Date decided**: N/A -- **Date updated**: 2024-05-22 +- **Date updated**: 2024-05-31 ## Context @@ -30,7 +30,7 @@ Algorand Python has specific [UInt64 and BigUint types](https://algorandfoundati - Support uint64 and biguint AVM types - Use idiomatic TypeScript expressions for numeric expressions, including mathematical operators (`+`, `-`, `*`, `/`, etc.) -- Semantic compatibility when executing on Node.js (e.g. in unit tests) and AVM +- Semantic compatibility between AVM execution and TypeScript execution (e.g. in unit tests) ## Principles @@ -40,15 +40,15 @@ Algorand Python has specific [UInt64 and BigUint types](https://algorandfoundati ## Options -### Option 1 - Direct use of native EcmaScript types +### Option 1 - Native types EcmaScript's `number` type is ill-suited to representing either AVM type reliably as it does not have the safe range to cover the full range of a uint64. Being a floating point number, it would also require truncating after division. EcmaScript's `bigint` is a better fit for both types but does not underflow when presented with a negative number, nor does it overflow at any meaningful limit for the AVM types. -If we solved the over/under flow checking with transpilation we still face an issue that `uint64` and `biguint` would not have discrete types and thus, we would have no type safety against accidentally passing a `biguint` to a method that expects a `uint64` and vice versa. +If we solved the over/under flow checking with a custom TypeScript transformer we still face an issue that `uint64` and `biguint` would not have discrete types for the compiler to know the difference between them and also we would have no type safety against accidentally passing a `biguint` to a method that expects a `uint64` and vice versa. -### Option 2 - Define classes to represent the AVM types +### Option 2 - Wrapper classes A `UInt64` and `BigUint` class could be defined which make use of `bigint` internally to perform maths operations and check for over or under flows after each op. @@ -70,22 +70,22 @@ class UInt64 { ``` -This solution provides the ultimate in type safety and semantic/syntactic compatibility, and requires no transpilation to run _correctly_ on Node.js. The semantics should be obvious to anyone familiar with Object Oriented Programming. The downside is that neither EcmaScript nor TypeScript support operator overloading which results in more verbose and unwieldy math expressions. +This solution provides the ultimate in type safety and semantic/syntactic compatibility, and requires no custom TypeScript transformer to run _correctly_ on Node.js. The semantics should be obvious to anyone familiar with Object Oriented Programming. The downside is that neither EcmaScript nor TypeScript support operator overloading which results in more verbose and unwieldy math expressions. The lack of idiomatic TypeScript mathematical operators is a deal breaker that rules this option out. ```ts const a = UInt64(500n) const b = Uint64(256) -// Not supported (a compile error in TS, unhelpful behaviour in ES) +// Not supported (a compile error in TS) const c1 = a + b // Works, but is verbose and unwieldy for more complicated expressions and isn't idiomatic TypeScript const c2 = a.add(b) ``` -### Option 3 - Use tagged/branded number types +### Option 3 - Branded `bigint` -TypeScript allows you to intersect primitive types with a simple interface to brand a value in a way which is incompatible with another primitive branded with a different value within the type system. +TypeScript allows you to intersect primitive types with a simple interface to brand a value in a way which is incompatible with another primitive branded with a different value within the type system. In this option the base type that is branded is `bigint`, which aligns to th discussion in Option 1 about the logical type to represent `uint64` and `biguint`. ```ts // Constructors @@ -97,34 +97,37 @@ type uint64 = bigint & { __type?: 'uint64' } type biguint = bigint & { __type?: 'biguint' } -const a: uint64 = 323n // Declare with type annotation +const a: uint64 = 323n // Declare with type annotation and raw `bigint` literal const b = UInt64(12n) // Declare with factory +const b2 = UInt64(12) // Factory could also take `number` literals (compiler could check they aren't negative and are integers) // c1 type is `bigint`, but we can mandate a type hint with the compiler (c2) const c1 = a + b const c2: uint64 = a + b + // No TypeScript type error, but semantically ambiguous - is a+b performed as a biguint op or a uint64 one and then converted? // (We could detect this as a compiler error though) const c3: biguint = a + b // Type error on b: Argument of type 'uint64' is not assignable to parameter of type 'biguint'. Nice! test(a, b) - function test(x: uint64, y: biguint) { // ... } ``` -This solution looks most like natural TypeScript / EcmaScript and results in math expressions that are much easier to read. The factory methods mimic native equivalents and should be familiar to existing developers. +This solution looks like normal TypeScript and results in math expressions that are much easier to read. The factory methods (e.g. `UInt64(4n)`) mimics native equivalents and should be familiar to existing developers. The drawbacks of this solution are: - - Less implicit type safety as TypeScript will infer the type of any binary math expression to be the base numeric type (`number`). A type annotation will be required where ever an identifier is declared and additional type checking will be required by the compiler to catch instances of assigning one numeric type to the other. - - In order to have 'run on Node.js' semantics of a `uint64` or `biguint` match 'run on the AVM', a transpiler will be required to wrap numeric operations in logic that checks for over and under flows. + - Less implicit type safety for branded types as TypeScript will infer the type of any binary math expression to be the base numeric type (a type annotation will be required where ever an identifier is declared, and the compiler will need to enforce this) + - In order to have TypeScript execution semantics of a `uint64` or `biguint` match the AVM, a custom TypeScript transformer will be required to wrap numeric operations in logic that checks for over and under flows line-by-line; this is straightforward to write though and has been successfully spiked out + - Additional type checking will be required by the compiler to catch instances of assigning one numeric type to the other (accidental implicit assignment) e.g. assigning a `uint64` value to `biguint`. + -TEALScript uses a similar approach to this, but uses `number` as the underlying type rather than `bigint`, which has the aforementioned downside of not being able to safely represent a 64-bit unsigned integer, but does has the advantage of being directly compatible with raw numbers (e.g. `3` rather than `3n`) and JavaScript prototype methods that return `number` like `array.length`. The requirement for semantic compatibility dictates that we need to use `bigint` rather than `number` since it's the correct type to represent the data and we will be able to create wrapper classes for things like arrays that have more explicitly typed methods for things like length. +### Option 4 Explicitly tagged brand types -A variation of the above with non-optional `__type` tags would prevent accidental implicit assignment errors, but require explicit casting on all ops and any methods that return a `number` such as the `length` method on arrays. +A variation of option 3 with non-optional `__type` tags would prevent accidental implicit assignment errors when assigning between (say) `uint64` and `biguint`, but require explicit casting on all ops and any methods that return the base type. ```ts declare function Uint64(v): uint64 @@ -145,13 +148,29 @@ c2 = Uint64(a + b) // ok c2 = (a + b) as uint64 // ok ``` -This introduces a degree of type safety with the in-built TypeScript type system at the expense of legibility. +This introduces a degree of type safety with the in-built TypeScript type system at the significant expense of legibility and writability. + + +### Option 5 Branded `number` (TEALScript approach) + +TEALScript uses a similar approach to option 3, but uses `number` as the underlying type rather than `bigint`. This has the advantage of being directly compatible with casting raw numbers (e.g. `const x: uint64 = 3` rather than `const x: uint64 = 3n`). +Furthermore, any JavaScript prototype methods that return `number` like `array.length` will be similarly able to be directly used and casted to `uint64` rather than wrapping in a factory method (e.g. `const x: uint64 = [1, 2, 3].length` rather than `const x = UInt64([1, 2, 3].length)`). It's not currently clear if any such methods will be exposed within the stub types that emerge in Algorand TypeScript though; if option 1 isn't chosen then ideally we would want to avoid exposing `number` within Algorand TypeScript altogether. Key prototypes that have `number` include `string` (see [Primitive bytes and strings](./2024-05-21_primitive-bytes-and-strings.md)) and array (but TypeScript allows you to define wrapper classes that support [iteration](https://www.typescriptlang.org/docs/handbook/iterators-and-generators.html) [and](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_syntax) [spreading](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol/isConcatSpreadable) so we can likely avoid `Array` prototype). + +If `number` is used as the base brand type for `uint64` and `bigint` is used as the base brand type for `biguint` (a type that TEALScript doesn't implement, so not a breaking change) then accidental implicit assignment errors are prevented by the TypeScript type system. + +A key issue with using `number` as the base type is that per option 1, it's semantically a floating point number, not an integer. It is possible for the compiler to check for and disallow non-integer constant literals though, which would prevent a non-integer value appearing outside of division. A custom TypeScript transformer will need to wrap division operations to allow the result to be truncated as an integer; this is a violation of the semantic compatibility principle, but given a branded type would be used rather than `number` (the fact the base type is `number` is largely hidden from the developer) it probably doesn't violate the principle of least surprise and may be considered an acceptable compromise. + +The other problem with use of `number` as the base brand type is that you will lose precision and get linting errors when representing a number greater than 53-bits as a constant literal e.g. `const x: uint64 = 9007199254740992`. It *may* be possible for a custom TypeScript transformer to get the value before precision is lost (needs investigation) and then disable that particular linting tool, but that is a fairly clear violation of semantic compatibility. The workaround would have to be that the compiler detects numbers > `Number.MAX_SAFE_INTEGER` and complains and instead you would have to use the factory syntax with a `bigint` constant literal e.g. `const x = UInt64(9007199254740992n)`. ## Preferred option -TBD +Either option 3 or option 5 depending on comfort level in using a floating point number type as the base type for `uint64`, requiring extra compiler checks & more complex custom transformers to overcome this, and not being able to cleanly represent very large integers as a constant literal vs lack of TypeScript protection against accidental implicit assignment of `uint64` and `biguint` (but can be checked by the compiler), and needing to avoid prototype methods that return `number` (although this matches semantic compatibility so may be a good idea anyway). + +Option 3 is also a breaking change for TEALScript, which would require `number` literals to either be suffixed with the `bigint` suffix (`n`) or be wrapped in a `UInt64()` factory call. + +Option 1 and 2 are excluded because they don't meet the requirements of semantic compatibility and least surprise. Option 4 is excluded, because the resulting syntax is unpractical. ## Selected option From 0ad42fe727eb56ca4e6b6bea9fc306baab4b4d8e Mon Sep 17 00:00:00 2001 From: Tristan Menzel Date: Fri, 31 May 2024 16:13:55 -0700 Subject: [PATCH 6/8] docs: Add additional cons for using bigint as the base type --- .../2024-05-21_primitive-integer-types.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/architecture-decisions/2024-05-21_primitive-integer-types.md b/docs/architecture-decisions/2024-05-21_primitive-integer-types.md index 3d0ff22d..11f9e50c 100644 --- a/docs/architecture-decisions/2024-05-21_primitive-integer-types.md +++ b/docs/architecture-decisions/2024-05-21_primitive-integer-types.md @@ -123,6 +123,8 @@ The drawbacks of this solution are: - Less implicit type safety for branded types as TypeScript will infer the type of any binary math expression to be the base numeric type (a type annotation will be required where ever an identifier is declared, and the compiler will need to enforce this) - In order to have TypeScript execution semantics of a `uint64` or `biguint` match the AVM, a custom TypeScript transformer will be required to wrap numeric operations in logic that checks for over and under flows line-by-line; this is straightforward to write though and has been successfully spiked out - Additional type checking will be required by the compiler to catch instances of assigning one numeric type to the other (accidental implicit assignment) e.g. assigning a `uint64` value to `biguint`. + - Literals will require an `n` suffix + - `bigint` cannot be used to index an object/array (only `number | string | symbol`) ### Option 4 Explicitly tagged brand types From 4390ae21e9db4b927f2f44a8a15928b19df30607 Mon Sep 17 00:00:00 2001 From: Tristan Menzel Date: Mon, 3 Jun 2024 12:11:26 -0700 Subject: [PATCH 7/8] docs: Add examples of string type separate to the bytes type and include additional notes about semantic compatability when working with the native EcmaScript string --- .../2024-05-21_primitive-bytes-and-strings.md | 61 +++++++++++++++++-- 1 file changed, 55 insertions(+), 6 deletions(-) diff --git a/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md b/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md index 595181e0..cb0b3e57 100644 --- a/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md +++ b/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md @@ -48,7 +48,9 @@ const b2 = new Uint8Array([1, 2, 3, 4]) const b3 = b1 + b1 ``` -Whilst binary data is often a representation of a utf-8 string, it is not always - so direct use of the string type is not a natural fit. It doesn't allow us to represent alternative encodings (b16/b64) and the existing api surface is very 'string' centric. Much of the api would also be expensive to implement on the AVM leading to a bunch of 'dead' methods hanging off the type (or a significant amount of work implementing all the methods). +Whilst binary data is often a representation of a utf-8 string, it is not always - so direct use of the string type is not a natural fit. It doesn't allow us to represent alternative encodings (b16/b64) and the existing api surface is very 'string' centric. Much of the api would also be expensive to implement on the AVM leading to a bunch of 'dead' methods hanging off the type (or a significant amount of work implementing all the methods). The signatures of these methods also use `number` which is [not a semantically relevant type](./2024-05-21_primitive-integer-types.md). + +Achieving semantic compatability with EcmaScript's `String` type would also be very expensive as it uses utf-16 encoding underneath whilst an ABI string is utf-8 encoded. A significant number of ops (and program size) would be required to convert between the two. If we were to ignore this and use utf-8 at runtime, apis such as `.length` would return different results. For example `"😄".length` in ES returns `2` whilst utf-8 encoding would yield `1` codepoint or `4` bytes, similarly indexing and slicing would yield different results. The Uint8Array type is fit for purpose as an encoding mechanism but the API is not as friendly as it could be for writing declarative contracts. The `new` keyword feels unnatural for something that is ostensibly a primitive type. The fact that it is mutable also complicates the implementation the compiler produces for the AVM. @@ -71,12 +73,12 @@ To differentiate between ABI `string` and AVM `byteslice`, a branded type, `byte Additional functions can be used when wanting to have string literals of a specific encoding represent a string or byteslice. -The downside of this approach is that the string prototype has a huge range of methods that are not all relevant and some of them return `number`, which is [not a semantically relevant type](./2024-05-21_primitive-integer-types.md). +The downsides of using `string` are listed in Option 1. ### Option 3 - Define a class to represent Bytes -A `Bytes` class is defined with a very specific API tailored to operations which are available on the AVM: +A `Bytes` class and `Str` (Name TBD) class are defined with a very specific API tailored to operations which are available on the AVM: ```ts class Bytes { @@ -95,6 +97,10 @@ class Bytes { /* etc */ } +class Str { + /* implementation */ +} + ``` This solution provides great type safety and requires no transpilation to run _correctly_ on Node.js. However, non-primitive types in Node.js have equality checked by reference. Again the `new` keyword feels unnatural. Due to lack of overloading, `+` will not work as expected however concatenations do not require the same understanding of "order of operations" and nesting as numeric operations, so a `concat` method isn't as unwieldy (but still isn't idiomatic). @@ -102,7 +108,7 @@ This solution provides great type safety and requires no transpilation to run _c ```ts const a = new Bytes("Hello") const b = new Bytes("World") - +const c = new Str("Example string") const ab = a.concat(b) function testValue(x: Bytes) { @@ -121,10 +127,24 @@ To have equality checks behave as expected we would need a transpilation step to ### Option 4 - Implement bytes as a class but define it as a type + factory -We can iron out some of the rough edges of using a class by only exposing a factory method for `Bytes` and a resulting type `bytes`. This removes the need for the `new` keyword and lets us use a 'primitive looking' type alias (`bytes` versus `Bytes` - much like `string` and `String`). We can use [tagged templates](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates) to improve the user experience of multipart concat expressions in lieu of having the `+` operator. +We can iron out some of the rough edges of using a class by only exposing a factory method for `Bytes`/`Str` and a resulting type `bytes`/`str`. This removes the need for the `new` keyword and lets us use a 'primitive looking' type alias (`bytes` versus `Bytes`, `str` versus `Str` - much like `string` and `String`). We can use [tagged templates](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates) to improve the user experience of multipart concat expressions in lieu of having the `+` operator. ```ts +export type bytes = { + readonly length: uint64 + + at(i: Uint64Compat): bytes + + concat(other: BytesCompat): bytes +} & symbol + +export function Bytes(value: TemplateStringsArray, ...replacements: BytesCompat[]): bytes +export function Bytes(value: BytesCompat): bytes +export function Bytes(value: BytesCompat | TemplateStringsArray, ...replacements: BytesCompat[]): bytes { + /* implementation */ +} + const a = Bytes("Hello") const b = Bytes.fromHex("ABFF") const c = Bytes.fromBase64("...") @@ -136,11 +156,40 @@ function testValue(x: bytes, y: bytes): bytes { return Bytes`${x} and ${y}` } +const f = Str`Example string` + ``` +Whilst we still can't accept string literals on their own, the tagged template is almost as concise. + +Having `bytes` and `str` behave like a primitive value type (value equality) whilst not _actually_ being a primitive is not strictly semantically compatible with EcmaScript however the lowercase type names (plus factory with no `new` keyword) communicates the intention of it being a primitive value type and there is an existing precedence of introducing new value types to the language in a similar pattern (`bigint` and `BigInt`). Essentially - if EcmaScript were to have a primitive bytes type, this is most likely what it would look like. + ## Preferred option -TBD +Option 3 can be excluded because the requirement for a `new` keyword feels unnatural for representing a primitive value type. + +Option 1 and 2 are not preferred as they make maintaining semantic compatability with EcmaScript impractical. + +Option 4 gives us the most natural feeling api whilst still giving us full control over the api surface. It doesn't support the `+` operator, but supports interpolation and `.concat` which gives us most of what `+` provides other than augmented assignment (ie. `+=`). + +We should select an appropriate name for the type representing an AVM string. It should not conflict with the semantically incompatible EcmaScript type `string`. + - `str`/`Str`: + - ✅ Short + - ✅ obvious what it is + - ✅ obvious equivalent in ABI types + - ❌ NOT obvious how it differs from EcmaScript `string` + - `utf8`/`Utf8`: + - ✅ Short + - ✅ reasonably obvious what it is + - 🤔 less obvious equivalent in ABI types + - ✅ obvious how it differs to `string` + - `utf8string`/`Utf8String` + - ❌ Verbose + - ✅ obvious equivalent in ABI types + - ✅ very obvious what it is + - ✅ obvious how it differs to `string` + + ## Selected option From 23ef10cf10b67cac604ed8e88621bc84abcdcf0d Mon Sep 17 00:00:00 2001 From: Tristan Menzel Date: Fri, 7 Jun 2024 10:01:40 -0700 Subject: [PATCH 8/8] docs: Add selected option --- .../2024-05-21_primitive-bytes-and-strings.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md b/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md index cb0b3e57..38a9b24f 100644 --- a/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md +++ b/docs/architecture-decisions/2024-05-21_primitive-bytes-and-strings.md @@ -193,4 +193,4 @@ We should select an appropriate name for the type representing an AVM string. It ## Selected option -TBD +Option 4 has been selected as the best option \ No newline at end of file