docs: ADR for arc4 types #6

tristanmenzel · 2024-06-07T00:02:04Z

No description provided.

architecture-decisions/2024-06-06_arc4-types.md

robdmoore · 2024-06-07T01:19:57Z

architecture-decisions/2024-06-06_arc4-types.md

+
+Cons:
+ - Several types will exist twice (string/uint64/bool etc) and this could be confusing to new developers who aren't fully across arc4
+ - 


The other con is that developers need to have some sort of awareness of arc4 encoding and decoding (but it's made as easy for them as possible, and if they use standard types like uint64 then they don't need to know about it since it's handled for them)

Yes I think this is the main con here. Even for developers that are familiar with ARC4, it's easy to get lost where ARC4 types are being used and where "native" types are being used. We saw this problem a lot with Beaker where people struggled to perform simple operations like incrementing a counter in state. There's also been similar problems with Algorand Python with adding numbers as well.

Where AlgorandPython landed on this is that you preference using native types everywhere you can - full functionality and performance and encoding / decoding taken care of for you.

The more complex types where there is no native equivalent you have to reach for an arc4 type and be more cognizant of encoding/decoding.

I feel like this is a fair balance and the result is that basic stuff works well and is intuitive (strings and uints).

I feel like this is a fair balance and the result is that basic stuff works well and is intuitive (strings and uints).

I think the problem is that many developers view arrays and objects as "basic stuff" as well.

For me the main question to answer is: as a developer, when do I want to work directly on encoded bytes? The only answer I can think of is when I want to hyper-optimize something which is requires direct manipulation of the bytes. Thus, I think it makes sense to have the "default" experience be all native types but have a way to encode/decode raw bytes for these edge cases vs forcing all users to think about encoding/decoding when they don't need to.

Part of the context here is that there is an intent to implement native, optimised arrays and structs into the puya compiler and the encoding will be different to ARC-4, so having a way to differentiate between the two is useful. It's hard right now to visualise where the only option is ARC-4 arrays though for sure.

But why should that impact the user? Of course we want the runtime representation to be as optimized as possible. In what scenarios would the user care about how it's represented?

architecture-decisions/2024-06-06_arc4-types.md

joe-p · 2024-06-07T11:33:45Z

architecture-decisions/2024-06-06_arc4-types.md

+
+Cons:
+ - Several types will exist twice (string/uint64/bool etc) and this could be confusing to new developers who aren't fully across arc4
+ - 


Yes I think this is the main con here. Even for developers that are familiar with ARC4, it's easy to get lost where ARC4 types are being used and where "native" types are being used. We saw this problem a lot with Beaker where people struggled to perform simple operations like incrementing a counter in state. There's also been similar problems with Algorand Python with adding numbers as well.

Co-authored-by: Rob Moore (MakerX) <rob.moore@makerx.com.au>

joe-p · 2024-06-13T20:23:55Z

docs/architecture-decisions/2024-06-06_arc4-types.md

+
+Option 1 can be excluded as it doesn't allow us to optimise the in memory representation of types independently to that which is prescribed by the arc4 encoding spec.
+
+Option 2 is feasible however puya would need to implement math operators on sized uints < 64bit and that effort would largely be a waste of time as this math would be more expensive than 64bit math. TealScript does not face this problem as it does not attempt to maintain semantic compatability. This option also does not allow power users to deal directly with encoded data. 


TealScript does not face this problem as it does not attempt to maintain semantic compatability.

I'm confused by this statement. It uses uint64 for small uints but I don't see how that relates to semantic compatibility

This option also does not allow power users to deal directly with encoded data.

Sure it does, for example in TEALScript there is a rawBytes function which will give you the encoded data.

TealScript allows a uint<N> expression to exceed the bounds of an n-bit number and only performs bounds checking when it's assigned to a uint<N> variable again. (Side note, is this checking new? I thought it previously only did it at serialization boundaries). This is not semantically compatible with the source language where a number can't temporarily exceed it's safe limits without ill effects (in the case of floating point math it would be loss of precision rather than an overflow error).

Eg. this should cause an overflow error but it does not

const x: uint8 = 255; const y: uint8 = 255; const z: uint8 = x + y - x;

For your second point, from memory TealScript truncates the length header from dynamic arrays of uints and only restores it at serialization boundaries so rawBytes(x) on a dynamic array will not match the raw bytes actually passed in. The fact that it's even close to the actual raw bytes though is merely a coincidence of TealScript's in memory representation being close to the arc4 encoding. This will NOT be the case in Puya meaning there won't be a zero/near zero cost method of accessing the raw bytes of an ABI param unless we implement a separate type to represent it as such. Of course one can always access application args directly to get the raw bytes, but by that time you've already paid the cost of decoding the data into our in-memory representation unless we have a separate type which can be used in the method args which represents the data in it's encoded format.

TealScript allows a uint expression to exceed the bounds of an n-bit number and only performs bounds checking when it's assigned to a uint variable again. (Side note, is this checking new? I thought it previously only did it at serialization boundaries).

Correct, when it actually does the checking has changed over time, but the right now it is uint64 up until it's put in a const, used in array, logged, used in method/event arg, etc. More fancy optimization could be done here but this seems to be sufficient.

This is not semantically compatible with the source language where a number can't temporarily exceed it's safe limits without ill effects (in the case of floating point math it would be loss of precision rather than an overflow error).

I suppose my confusion comes from the fact that uint8 has no semantic meaning because it's not a regular TypeScript type. We can have it behave however we'd like, thus I don't see the meaning behind "semantically compatible with the source language."

so rawBytes(x) on a dynamic array will not match the raw bytes actually passed in. T

It does in fact return the encoded bytes. I realize that the name is a bit misleading and I've been thinking about changing it to encodedBytes and perhaps have another function do what you are describing (just return the runtime representation)

Of course one can always access application args directly to get the raw bytes, but by that time you've already paid the cost of decoding the data into our in-memory representation unless we have a separate type which can be used in the method args which represents the data in it's encoded format.

This can simply be a compiler optimization. For example, if an argument is only used in its encoded form, don't both decoding.

It doesn't matter if uint8 is a native JavaScript or TypeScript type. uint8 in itself has semantic meaning and that meaning should be respected by the compiler (by either error-ing or cycling or clamping overflows). Obviously we can implement this in the compiler, but there is a cost and that cost may not be worth it when we could just guide devs to use uint64 math.

robdmoore reviewed Jun 7, 2024

View reviewed changes

architecture-decisions/2024-06-06_arc4-types.md Outdated Show resolved Hide resolved

robdmoore reviewed Jun 7, 2024

View reviewed changes

architecture-decisions/2024-06-06_arc4-types.md Outdated Show resolved Hide resolved

robdmoore reviewed Jun 7, 2024

View reviewed changes

architecture-decisions/2024-06-06_arc4-types.md Outdated Show resolved Hide resolved

robdmoore reviewed Jun 7, 2024

View reviewed changes

joe-p reviewed Jun 7, 2024

View reviewed changes

tristanmenzel and others added 3 commits June 7, 2024 11:37

docs: ADR for arc4 types

8bd681e

docs: Fix typo

449b078

Co-authored-by: Rob Moore (MakerX) <rob.moore@makerx.com.au>

docs: Address PR feedback

1d9d5ab

tristanmenzel force-pushed the arc4-types branch from abe39cc to 1d9d5ab Compare June 7, 2024 18:38

docs: Update preferred option

fd2eea4

joe-p reviewed Jun 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: ADR for arc4 types #6

docs: ADR for arc4 types #6

tristanmenzel commented Jun 7, 2024

robdmoore Jun 7, 2024

joe-p Jun 7, 2024

robdmoore Jun 11, 2024

joe-p Jun 11, 2024

robdmoore Jun 11, 2024

joe-p Jun 11, 2024

joe-p Jun 7, 2024

joe-p Jun 13, 2024

tristanmenzel Jun 13, 2024

joe-p Jun 13, 2024

tristanmenzel Jun 13, 2024


		Option 1 can be excluded as it doesn't allow us to optimise the in memory representation of types independently to that which is prescribed by the arc4 encoding spec.

		Option 2 is feasible however puya would need to implement math operators on sized uints < 64bit and that effort would largely be a waste of time as this math would be more expensive than 64bit math. TealScript does not face this problem as it does not attempt to maintain semantic compatability. This option also does not allow power users to deal directly with encoded data.

docs: ADR for arc4 types #6

Are you sure you want to change the base?

docs: ADR for arc4 types #6

Conversation

tristanmenzel commented Jun 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment