Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Represent small values as single bytes #4929

Merged
merged 8 commits into from
Nov 8, 2023
Merged

Conversation

IGI-111
Copy link
Contributor

@IGI-111 IGI-111 commented Aug 9, 2023

Description

This change leverages the SB/LB instructions to change the memory representation of all small enough values to make them fit in a single byte instead of a full word.

The type size and passing calculations have been changed to align elements of structs and enums to full words.

Structs and data section entries are filled with right-padding to align their elements to words. Enums are still left-padded. The Data section generation has been refactored to allow for these two padding modes.

Arrays and slices contain no inner padding, byte sequences will now be properly consecutive and packed. Though, as a whole, they may be right padded in certain circumstances to maintain word alignment.

Direct usages of LW/SW have been changed to LB/SB where appropriate.

The LWDataId virtual instruction has been changed to LoadDataId to better represent the fact that it can load both word and byte sized values.

Checklist

  • I have linked to any relevant issues.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation where relevant (API docs, the reference, and the Sway book).
  • I have added tests that prove my fix is effective or that my feature works.
  • I have added (or requested a maintainer to add) the necessary Breaking* or New Feature labels where relevant.
  • I have done my best to ensure that my PR adheres to the Fuel Labs Code Review Standards.
  • I have requested a review from the relevant team or maintainers.

@IGI-111 IGI-111 added compiler: ir IRgen and sway-ir including optimization passes breaking May cause existing user code to break. Requires a minor or major release. compiler: codegen Everything to do with IR->ASM, register allocation, etc. labels Aug 9, 2023
@IGI-111 IGI-111 requested review from vaivaswatha and a team August 9, 2023 10:52
@IGI-111 IGI-111 self-assigned this Aug 9, 2023
@IGI-111 IGI-111 force-pushed the IGI-111/int-memory-repr branch 2 times, most recently from fddc699 to 0bdf48e Compare August 9, 2023 12:28
@IGI-111 IGI-111 marked this pull request as draft August 9, 2023 17:12
@IGI-111 IGI-111 force-pushed the IGI-111/int-memory-repr branch 2 times, most recently from 65d5718 to 9dca3bd Compare August 22, 2023 15:41
@xunilrj xunilrj force-pushed the IGI-111/int-memory-repr branch 2 times, most recently from f923ebf to 3c12320 Compare October 11, 2023 13:03
@xunilrj
Copy link
Contributor

xunilrj commented Oct 12, 2023

The most important changes made by this PR:

1 - u8, bool and unit are 1 byte now; That means that [u8;3], for example, is 3 bytes.
2 - "Virtual" loads are now realized to lb or lw depending on the case of course.
3 - When a 1-byte value is inside a struct it is left aligned (padding on the right). For example:

struct A { a: u8, b: u64 }
let a = A { a: 1, b: 2 } // memory: 10000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000010

4 - Type checking/inference on arrays changed a little bit. Now the first item of the array does not necessarily start as Unknown, but with the array element type, if it is known. This fixes a problem with let a: [u8; 3] = [1, 2, 3] where the type checking would decide for [Numeric; 3], fallback to [u64; 3] later and then complain that types do not match.

5 - StorageVec offset was calculated in words. THis generates problems with types smaller than 8 bytes and now we add padding in these cases.

All the fuels-rs changes are here: FuelLabs/fuels-rs#1163

@xunilrj xunilrj force-pushed the IGI-111/int-memory-repr branch 3 times, most recently from 636eaa5 to d56632a Compare October 19, 2023 11:47
@xunilrj xunilrj force-pushed the IGI-111/int-memory-repr branch 2 times, most recently from c37e200 to 4bbf431 Compare October 24, 2023 18:10
@xunilrj xunilrj marked this pull request as ready for review October 24, 2023 19:25
@xunilrj xunilrj mentioned this pull request Oct 25, 2023
ironcev
ironcev previously approved these changes Oct 30, 2023
Copy link
Member

@ironcev ironcev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! I have a feeling finding all the places in code that need to be touched by this change wasn't trivial at all.

Speaking of which, my wish is to see #5227 implemented soon 😉

And afterwards same support for u16 and u32 😄

(And afterwards shuffling struct types to get optimized data layout 😄)

@ironcev
Copy link
Member

ironcev commented Nov 2, 2023

@xunilrj Well, after pulling the latest change it looks to me that we were doing the same thing in parallel 😄 I was also onto removing all the size_bytes_round_up_to_word_alignment calls and packing the things while working on #5227 on top of this PR. I'll wait until this PR is merged, because as it looks now, a lot of clean up will already be done here which is great. Forget it, just got confused when switching between the branches 😄 But the question below still holds.

One question. In the sway-core/src/asm_generation/fuel/functions.rs line 799 this is the way how the var_size_words is calculated for string arrays:

TypeContent::StringArray(n) => size_bytes_round_up_to_word_alignment!(n),

Is this the right formula for getting var_size_words for string arrays? The way I understand it, for the string "123" n will be 3 and we will and up with 8 words.

@xunilrj
Copy link
Contributor

xunilrj commented Nov 2, 2023

Is this the right formula for getting var_size_words for string arrays? The way I understand it, for the string "123" n will be 3 and we will and up with 8 words.

That is a nice question. I think this is wrong. But we have older tests expecting string arrays to be padded. My idea was to fix this when doing the layout issue you are working at the moment.

@ironcev
Copy link
Member

ironcev commented Nov 2, 2023

Ok I'll then take a closer look at the padding logic for string arrays and see if something needs to be changed.

@ironcev
Copy link
Member

ironcev commented Nov 4, 2023

Storage definitely makes heavy assumptions on memory layout.
I think I stumbled upon a slight issue in storage.rs line 229.

let value_size_in_words = ir_type_size_in_bytes(context, ty) / 8;
let constant_size_in_words = ir_type_size_in_bytes(context, &constant.ty) / 8;

This code has assumed that the raw type sizes are word aligned. Our new u8, bool, and unit break this assumption.
If I interpret right the code that handles unions/enums, if we had e.g.

enum E {
    A: u8,
    B: u64,
}

for variant A, the value_size_in_words would be 1 and the constant_size_in_words zero, and the remaining code would do an unnecessary word of padding and the word of u8 data
on top of it.

I am fixing this on the go and extending test while refactoring access to memory layout information.

@xunilrj
Copy link
Contributor

xunilrj commented Nov 4, 2023

for variant A, the value_size_in_words would be 1 and the constant_size_in_words zero

Yes, but I don't remember a place that deals with variant size alone. ir_type_size_in_bytes foe enums returns 8 bytes (tag) plus max(variant size). Where the variant is right aligned.

enum E {
    A: u8,    // 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 AA
    B: u64,  // 00 00 00 00 00 00 00 01 BB BB BB BB BB BB BB BB
}

@ironcev
Copy link
Member

ironcev commented Nov 4, 2023

Yes, but I don't remember a place that deals with variant size alone. ir_type_size_in_bytes foe enums returns 8 bytes (tag) plus max(variant size). Where the variant is right aligned.

It looks to me that storage::serialize_to_word might be such a place. Enums are structs with two fields, one for the tag and one (union) for the variant. The size_in_bytes doesn't have special handling for enums (and I would leave it so), and for union, it will return the word aligned size of the biggest variant without tag included.

So I assume that serialize_to_word, when it encounters an enum, it will first serialize the tag when serializing struct, and then recursively proceed with serializing the union part, but without the tag.

Unfortunately I have issues running locally the tests that require contract deployment so I cannot immediately test the described case locally and on the build machine I got an error indeed with the changed test, but saying that the contract cannot be found?! Will let you know about the test outcome once I manage to run it.

@ironcev
Copy link
Member

ironcev commented Nov 5, 2023

In the end, there were two issues in serializing constants into storage slots.

One was the one explained above. In case of enums with one byte variants mixed with (multi) word variants the one byte variants were not properly padded (there was an additional shift to the right as commented above.)

The second issue was not only in enums but in general. u8s and bools were not properly converted to BE u64. They were positioned as the most significant byte instead as least significant.

Interestingly, the existing tests didn't catch the issue, because the storage access test is actually not calling the contract methods, only one, and the basic storage test that is very detailed didn't have constants in the storage statement that would reproduce the issue.

Note that the issue is only in initializing the storage via constants. Reading and writing from and into the storage works fine afterwards. That's why the existing tests pass.

I've pushed the fix together with the test suit.

@xunilrj Can you please triple-check the logic? 😄

Also, the existing storage tests should be extended to check structs, but I've already started doing that in the refactoring of the access to the memory layout information. So I propose to do it there and leave the existing storage tests in this PR as they are.

@ironcev
Copy link
Member

ironcev commented Nov 6, 2023

As it goes, fixed one bug introduced another 😄 It turned out we do have testing of storage initialization within the SDK tests (which I, shame on me, never run locally so far). These tests also need to be extended to cover one-byte values in enums.

Long story short, it looks like we will need a logic here similar to one we have in initializing data section, where the outer context (struct or enum) decides on the padding (left or right) of the single byte values. It has nothing to do with endianess as wrongly pointed above.

Let me now triple-check that changed logic on my own before pushing 😄, extend the tests etc. and provide the fix.

Copy link
Member

@ironcev ironcev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storage initialization still needs to be fixed as pointed in the comment above.

@IGI-111 IGI-111 requested a review from a team November 7, 2023 03:53
@IGI-111 IGI-111 enabled auto-merge (squash) November 8, 2023 12:04
@IGI-111 IGI-111 disabled auto-merge November 8, 2023 12:04
@xunilrj xunilrj merged commit 95481fc into master Nov 8, 2023
31 checks passed
@xunilrj xunilrj deleted the IGI-111/int-memory-repr branch November 8, 2023 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking May cause existing user code to break. Requires a minor or major release. compiler: codegen Everything to do with IR->ASM, register allocation, etc. compiler: ir IRgen and sway-ir including optimization passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants