Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: ADR for arc4 types #6

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

docs: ADR for arc4 types #6

wants to merge 4 commits into from

Conversation

tristanmenzel
Copy link
Contributor

No description provided.


Cons:
- Several types will exist twice (string/uint64/bool etc) and this could be confusing to new developers who aren't fully across arc4
-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other con is that developers need to have some sort of awareness of arc4 encoding and decoding (but it's made as easy for them as possible, and if they use standard types like uint64 then they don't need to know about it since it's handled for them)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think this is the main con here. Even for developers that are familiar with ARC4, it's easy to get lost where ARC4 types are being used and where "native" types are being used. We saw this problem a lot with Beaker where people struggled to perform simple operations like incrementing a counter in state. There's also been similar problems with Algorand Python with adding numbers as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where AlgorandPython landed on this is that you preference using native types everywhere you can - full functionality and performance and encoding / decoding taken care of for you.

The more complex types where there is no native equivalent you have to reach for an arc4 type and be more cognizant of encoding/decoding.

I feel like this is a fair balance and the result is that basic stuff works well and is intuitive (strings and uints).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is a fair balance and the result is that basic stuff works well and is intuitive (strings and uints).

I think the problem is that many developers view arrays and objects as "basic stuff" as well.

For me the main question to answer is: as a developer, when do I want to work directly on encoded bytes? The only answer I can think of is when I want to hyper-optimize something which is requires direct manipulation of the bytes. Thus, I think it makes sense to have the "default" experience be all native types but have a way to encode/decode raw bytes for these edge cases vs forcing all users to think about encoding/decoding when they don't need to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the context here is that there is an intent to implement native, optimised arrays and structs into the puya compiler and the encoding will be different to ARC-4, so having a way to differentiate between the two is useful. It's hard right now to visualise where the only option is ARC-4 arrays though for sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why should that impact the user? Of course we want the runtime representation to be as optimized as possible. In what scenarios would the user care about how it's represented?

architecture-decisions/2024-06-06_arc4-types.md Outdated Show resolved Hide resolved
architecture-decisions/2024-06-06_arc4-types.md Outdated Show resolved Hide resolved

Cons:
- Several types will exist twice (string/uint64/bool etc) and this could be confusing to new developers who aren't fully across arc4
-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think this is the main con here. Even for developers that are familiar with ARC4, it's easy to get lost where ARC4 types are being used and where "native" types are being used. We saw this problem a lot with Beaker where people struggled to perform simple operations like incrementing a counter in state. There's also been similar problems with Algorand Python with adding numbers as well.

tristanmenzel and others added 3 commits June 7, 2024 11:37
Co-authored-by: Rob Moore (MakerX) <rob.moore@makerx.com.au>

Option 1 can be excluded as it doesn't allow us to optimise the in memory representation of types independently to that which is prescribed by the arc4 encoding spec.

Option 2 is feasible however puya would need to implement math operators on sized uints < 64bit and that effort would largely be a waste of time as this math would be more expensive than 64bit math. TealScript does not face this problem as it does not attempt to maintain semantic compatability. This option also does not allow power users to deal directly with encoded data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TealScript does not face this problem as it does not attempt to maintain semantic compatability.

I'm confused by this statement. It uses uint64 for small uints but I don't see how that relates to semantic compatibility

This option also does not allow power users to deal directly with encoded data.

Sure it does, for example in TEALScript there is a rawBytes function which will give you the encoded data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TealScript allows a uint<N> expression to exceed the bounds of an n-bit number and only performs bounds checking when it's assigned to a uint<N> variable again. (Side note, is this checking new? I thought it previously only did it at serialization boundaries). This is not semantically compatible with the source language where a number can't temporarily exceed it's safe limits without ill effects (in the case of floating point math it would be loss of precision rather than an overflow error).

Eg. this should cause an overflow error but it does not

    const x: uint8 = 255;
    const y: uint8 = 255;
    const z: uint8 = x + y - x;

For your second point, from memory TealScript truncates the length header from dynamic arrays of uints and only restores it at serialization boundaries so rawBytes(x) on a dynamic array will not match the raw bytes actually passed in. The fact that it's even close to the actual raw bytes though is merely a coincidence of TealScript's in memory representation being close to the arc4 encoding. This will NOT be the case in Puya meaning there won't be a zero/near zero cost method of accessing the raw bytes of an ABI param unless we implement a separate type to represent it as such. Of course one can always access application args directly to get the raw bytes, but by that time you've already paid the cost of decoding the data into our in-memory representation unless we have a separate type which can be used in the method args which represents the data in it's encoded format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TealScript allows a uint expression to exceed the bounds of an n-bit number and only performs bounds checking when it's assigned to a uint variable again. (Side note, is this checking new? I thought it previously only did it at serialization boundaries).

Correct, when it actually does the checking has changed over time, but the right now it is uint64 up until it's put in a const, used in array, logged, used in method/event arg, etc. More fancy optimization could be done here but this seems to be sufficient.

This is not semantically compatible with the source language where a number can't temporarily exceed it's safe limits without ill effects (in the case of floating point math it would be loss of precision rather than an overflow error).

I suppose my confusion comes from the fact that uint8 has no semantic meaning because it's not a regular TypeScript type. We can have it behave however we'd like, thus I don't see the meaning behind "semantically compatible with the source language."

so rawBytes(x) on a dynamic array will not match the raw bytes actually passed in. T

It does in fact return the encoded bytes. I realize that the name is a bit misleading and I've been thinking about changing it to encodedBytes and perhaps have another function do what you are describing (just return the runtime representation)

Of course one can always access application args directly to get the raw bytes, but by that time you've already paid the cost of decoding the data into our in-memory representation unless we have a separate type which can be used in the method args which represents the data in it's encoded format.

This can simply be a compiler optimization. For example, if an argument is only used in its encoded form, don't both decoding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter if uint8 is a native JavaScript or TypeScript type. uint8 in itself has semantic meaning and that meaning should be respected by the compiler (by either error-ing or cycling or clamping overflows). Obviously we can implement this in the compiler, but there is a cost and that cost may not be worth it when we could just guide devs to use uint64 math.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants