-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving slice #5110
Labels
language feature
Core language features visible to end users
Comments
This was referenced Oct 3, 2023
This was referenced Nov 29, 2023
IGI-111
added a commit
that referenced
this issue
Jul 5, 2024
## Description This PR implements `match` for string slices including radix trie optimization and is a task of #5110. For example a simple `match` like ``` fn return_match_on_str_slice(param: str) -> u64 { match param { "get_a" => { 1u64 }, "get_a_b" => { 2u64 }, "get_b" => { 3u64 }, _ => { 1000u64 }, } } ``` will generate code following this logic: ``` let packed_string = "get_a_b" if str.len() == 5 if str[0..4] == "get_" at packed_string[0] if str[4..5] == "b" at packed_string[6] return branch 2 if str[4..5] == "a" at packed_string[4] return branch 0 return wildcard branch return wildcard branch if str.len() == 7 if str[0..7] == "get_a_b" at packed_string[0] return branch 1 return wildcard branch return wildcard branch ``` In logical terms, this boils down to checking the length and an `O(N)` check on the string. Albeit the bytecode will be more complex because of all the branches. Another interesting optimization is the "packed string literal" that coalesces all "match arms string slices" into just one string. In the case above, given that one of the arms contains all the necessary strings for all other comparisons, we will create just one string literal. Saving a lot of bytes in the data section. The section below describes how `rustc` deals with this desugaring. I think these choices make more sense to us for two reasons: 1 - Avoid testing common prefixes multiple times will spend less gas in general (needs more testing); 2 - packing all strings will decrease the data section size. This is the bytecode generated in this case: ``` fn return_match_on_str_slice(param: str) -> u64 { match param { "get_a" => { 1u64 }, "get_a_b" => { 2u64 }, "get_b" => { 3u64 }, _ => { 1000u64 }, } } @ /home/xunilrj/github/sway/test/src/e2e_vm_tests/test_programs/should_pass/language/match_expressions_all/src/main.sw:22:1 0x0000017c PSHL 0xf ;; [149, 0, 0, 15] 0x00000180 PSHH 0x80000 ;; [150, 8, 0, 0] 0x00000184 MOVE R59 $sp ;; [26, 236, 80, 0] 0x00000188 CFEI 0x90 ;; [145, 0, 0, 144] 0x0000018c MOVE $writable R58 ;; [26, 67, 160, 0] 0x00000190 MOVE R19 R62 ;; [26, 79, 224, 0] match param { "get_a" => { 1u64 }, "get_a_b" => { 2u64 }, "get_b" => { 3u64 }, _ => { 1000u64 }, } @ /home/xunilrj/github/sway/test/src/e2e_vm_tests/test_programs/should_pass/language/match_expressions_all/src/main.sw:23:5 0x00000194 ADDI R17 R59 0x80 ;; 0x00000198 MOVI R18 0x10 ;; 0x0000019c MCP R17 $writable R18 ;; 0x000001a0 MOVI R17 0x7 ;; 0x7 = "get_a_b".len() @ <autogenerated>:1:1 0x000001a4 LW $writable R59 0x11 ;; R59 + 0x11 = a.len() 0x000001a8 EQ $writable $writable R17 ;; a.len() == 0x7 0x000001ac JNZF $writable $zero 0x3c ;; if false jump to 2a0? 0x000001b0 MOVI R17 0x5 ;; we have two arms with length equals 0x5 0x000001b4 LW $writable R59 0x11 ;; R59 + 0x11 = a.len() 0x000001b8 EQ $writable $writable R17 ;; a.len() == 0x5 0x000001bc MOVI R17 0x3e8 ;; 0x3e8 = 1000 (wildcard return value) 0x000001c0 JNZF $writable $zero 0x1 ;; if true jump to 1c8 0x000001c4 JMPF $zero 0x35 ;; if false jump to 29c (will return R17) 0x000001c8 LW $writable R63 0x3 ;; R63 = start of data section, will load 13c 0x000001cc ADD $writable $writable $pc ;; $writable = 0x308 = packed strings 0x000001d0 ADDI R17 R59 0x20 ;; 0x000001d4 SW R59 $writable 0x4 ;; R59 + 0x4 = packed strings 0x000001d8 MOVI $writable 0x7 ;; 0x000001dc SW R59 $writable 0x5 ;; R59 + 0x5 = 0x7 0x000001e0 ADDI $writable R59 0x30 ;; 0x000001e4 MOVI R18 0x10 ;; 0x000001e8 MCP $writable R17 R18 ;; R59 + 0x30 = R59 + 0x20 0x000001ec MOVI R18 0x4 ;; 0x4 = "get_".len() 0x000001f0 LW $writable R59 0x10 ;; 0x000001f4 ADDI $writable $writable 0x0 ;; 0x000001f8 LW R17 R59 0x6 ;; R17 = a.ptr() 0x000001fc ADDI R17 R17 0x0 ;; 0x00000200 MEQ $writable $writable R17 R18 ;; a[0..4] = packed[0..4] 0x00000204 MOVI R17 0x3e8 ;; 0x3e8 = 1000 (wildcard return value) 0x00000208 JNZF $writable $zero 0x1 ;; if true jump to 210 0x0000020c JMPF $zero 0x23 ;; if false jump to 29c (will return R17) .... .data_section: 0x00000300 .bytes as hex ([]), len i0, as ascii "" 0x00000300 .word i18446744073709486084, as hex be bytes ([FF, FF, FF, FF, FF, FF, 00, 04]) 0x00000308 .bytes as hex ([67, 65, 74, 5F, 61, 5F, 62]), len i7, as ascii "get_a_b" 0x00000310 .word i500, as hex be bytes ([00, 00, 00, 00, 00, 00, 01, F4]) 0x00000318 .word i316, as hex be bytes ([00, 00, 00, 00, 00, 00, 01, 3C]) 0x00000320 .word i244, as hex be bytes ([00, 00, 00, 00, 00, 00, 00, F4]) 0x00000328 .word i176, as hex be bytes ([00, 00, 00, 00, 00, 00, 00, B0]) 0x00000330 .word i100, as hex be bytes ([00, 00, 00, 00, 00, 00, 00, 64]) ``` ## How `rustc` desugar `match` For comparison, this is the generated ASM with comments on how Rust tackles this. First, this is the function used: ``` #[inline(never)] fn f(a: &str) -> u64 { match a { "get_method" => 0, "get_tokens" => 1, "get_something_else" => 2, "get_tokens_2" => 3, "clear" => 4, "get_m" => 5, _ => 6, } } ``` This is the LLVM IR generated. There is a match on the length of each string slice arms. The valid range is (5, 18), everything outside of this is the wildcard match arm. This range will be important later. ``` efine internal fastcc noundef i64 @example::f::hdb860bcd6d383112(ptr noalias nocapture noundef nonnull readonly align 1 %a.0, i64 noundef %a.1) unnamed_addr { start: switch i64 %a.1, label %bb13 [ i64 10, label %"_ZN73_$LT$$u5b$A$u5d$$u20$as$u20$core..slice..cmp..SlicePartialEq$LT$B$GT$$GT$5equal17h510120b4d3581de7E.exit" i64 18, label %"_ZN73_$LT$$u5b$A$u5d$$u20$as$u20$core..slice..cmp..SlicePartialEq$LT$B$GT$$GT$5equal17h510120b4d3581de7E.exit30" i64 12, label %"_ZN73_$LT$$u5b$A$u5d$$u20$as$u20$core..slice..cmp..SlicePartialEq$LT$B$GT$$GT$5equal17h510120b4d3581de7E.exit35" i64 5, label %"_ZN73_$LT$$u5b$A$u5d$$u20$as$u20$core..slice..cmp..SlicePartialEq$LT$B$GT$$GT$5equal17h510120b4d3581de7E.exit40" ] ``` this is how "f" is called ``` mov rbx, qword ptr [rsp + 32] mov r14, qword ptr [rsp + 40] mov rsi, qword ptr [rsp + 48] <- length of the string slice mov rdi, r14 <- ptr to string slice call _ZN4main1f17h126a5dfd4e318ebcE ``` this is `f` body. `ja .LBB8_12` jumps into a simple return, returning EAX as 6. It is the wildcard return value. The cleverness of this is that when `RSI` is smaller than 5, it will become negative (because of `add rsi, -5`, wrapping into huge unsigned ints, and will also trigger `JA` (which stands for `Jump Above`), effectively jumping when the slice length is outside of the expected range which is (5, 18). After that, it uses a jump table based on the string length minus 5. Everywhere the string length is invalid, the jump address is `LBB8_12`., still returning `EAX` as 6. ``` _ZN4main1f17h126a5dfd4e318ebcE: .cfi_startproc mov eax, 6 add rsi, -5 cmp rsi, 13 ja .LBB8_12 lea rcx, [rip + .LJTI8_0] movsxd rdx, dword ptr [rcx + 4*rsi] add rdx, rcx jmp rdx ``` ``` .LBB8_12: ret ``` This is the jump table used: ``` .LJTI8_0: .long .LBB8_9-.LJTI8_0 .long .LBB8_12-.LJTI8_0 .long .LBB8_12-.LJTI8_0 .long .LBB8_12-.LJTI8_0 .long .LBB8_12-.LJTI8_0 .long .LBB8_2-.LJTI8_0 <- 5th entry is length = 10 (remember we add -5 to the length) .long .LBB8_12-.LJTI8_0 .long .LBB8_8-.LJTI8_0 .long .LBB8_12-.LJTI8_0 .long .LBB8_12-.LJTI8_0 .long .LBB8_12-.LJTI8_0 .long .LBB8_12-.LJTI8_0 .long .LBB8_12-.LJTI8_0 .long .LBB8_6-.LJTI8_0 ``` The interesting entry is entry 5, which has two strings: "get_method" and "get_tokens". Here we can see that `rust` actually compares the complete string slice twice. Even though they have an intersection. ``` .LBB8_2: movabs rcx, 7526752397670245735=6874656D5F746567="htem_teg" (inverted "get_meth") xor rcx, qword ptr [rdi] movzx edx, word ptr [rdi + 8] xor rdx, 25711=646F="do" (inverted "od") or rdx, rcx je .LBB8_3 movabs rcx, 7308057365947114855=656B6F745F746567="ekot_teg" (inverted "get_toke") xor rcx, qword ptr [rdi] movzx edx, word ptr [rdi + 8] xor rdx, 29550=736E="sn" (inverted "ns") or rdx, rcx je .LBB8_5 ``` ``` .LBB8_3: xor eax, eax <- returns 0 ret ``` ``` .LBB8_5: mov eax, 1 <- returns 1 ret ``` This is comparable to what `clang` is doing: rust-lang/rust#61961 ## Code and Bytecode This PR also implements code printing when printing bytecode. For now this is only enable for tests. It gnerates something like: ``` match param { "get_a" => { 1u64 }, "get_a_b" => { 2u64 }, "get_b" => { 3u64 }, _ => { 1000u64 }, } @ /home/xunilrj/github/sway/test/src/e2e_vm_tests/test_programs/should_pass/language/match_expressions_all/src/main.sw:23:5 0x00000194 ADDI R17 R59 0x80 ;; 0x00000198 MOVI R18 0x10 ;; 0x0000019c MCP R17 $writable R18 ;; 0x000001a0 MOVI R17 0x7 ;; 0x7 = "get_a_b".len() @ <autogenerated>:1:1 0x000001a4 LW $writable R59 0x11 ;; R59 + 0x11 = a.len() 0x000001a8 EQ $writable $writable R17 ;; a.len() == 0x7 ``` As we can see, not great, but helpful nonetheless. We can (should?) improve this by better "carrying" spans in all transformations and lowerings. ## Checklist - [x] I have linked to any relevant issues. - [x] I have commented my code, particularly in hard-to-understand areas. - [ ] I have updated the documentation where relevant (API docs, the reference, and the Sway book). - [ ] If my change requires substantial documentation changes, I have [requested support from the DevRel team](https://github.com/FuelLabs/devrel-requests/issues/new/choose) - [ ] I have added tests that prove my fix is effective or that my feature works. - [ ] I have added (or requested a maintainer to add) the necessary `Breaking*` or `New Feature` labels where relevant. - [ ] I have done my best to ensure that my PR adheres to [the Fuel Labs Code Review Standards](https://github.com/FuelLabs/rfcs/blob/master/text/code-standards/external-contributors.md). - [ ] I have requested a review from the relevant team or maintainers. --------- Co-authored-by: Joshua Batty <joshpbatty@gmail.com> Co-authored-by: IGI-111 <igi-111@protonmail.com>
8 tasks
IGI-111
pushed a commit
that referenced
this issue
Aug 5, 2024
## Description Part of #5110. This PR implements supports for the slice new syntax: `&[T]`. The old syntax is still supported and will be deprecated in a future PR. ## Checklist - [x] I have linked to any relevant issues. - [x] I have commented my code, particularly in hard-to-understand areas. - [ ] I have updated the documentation where relevant (API docs, the reference, and the Sway book). - [ ] If my change requires substantial documentation changes, I have [requested support from the DevRel team](https://github.com/FuelLabs/devrel-requests/issues/new/choose) - [x] I have added tests that prove my fix is effective or that my feature works. - [x] I have added (or requested a maintainer to add) the necessary `Breaking*` or `New Feature` labels where relevant. - [x] I have done my best to ensure that my PR adheres to [the Fuel Labs Code Review Standards](https://github.com/FuelLabs/rfcs/blob/master/text/code-standards/external-contributors.md). - [x] I have requested a review from the relevant team or maintainers.
IGI-111
pushed a commit
that referenced
this issue
Aug 7, 2024
## Description This PR is part of #5110 and introduces two new intrinsic: `__slice` and `__elem_at`. `__slice` allows the creation of slices by slicing arrays or other slices. Whilst `__elem_at` returns a reference to an item inside the slice or the array. ## Out of bounds checks These intrinsic will not generate any runtime checks, these must be done manually, when and where appropriate; but they do a complete static analysis of all indices, to avoid runtime buffer overflows, when possible. That means that at runtime, it is possible to do a buffer overflow when reading/writing, which is an "undefined behaviour" as to what will happen. ## Empty Array This PR also solves a problem with empty arrays. Before empty arrays such as `let a = []` were being type-checked as `[Never; 0]`, which means that any code after them was being marked as dead. Now we correctly type check them as `[Unknown; 0]` and return a more friendly error. ``` 4 | 5 | // Empty array 6 | let a = []; | ^^ Type must be known at this point 7 | } | ____ ``` ## Check of constants inside fns This PR also solves a problem with not checking `const` expressions inside `fns`. We, for example, do not allow slices in constants, but we were only checking globals. Now we check constants also inside functions, methods etc... ## Small improvements for our e2e We can now `dbg` inside our e2e harness and get results like the ones below. One needs to include the lib `test/src/e2e_vm_tests/utils` and cal `something.dbg()` or `something.dbgln()`. There is no magic, and structs/enums will need to manually implement the `Dbg` trait. This is only to facilitate the debugging of our e2e tests. ![image](https://github.com/user-attachments/assets/2f25c50e-b7b3-4199-8bf4-699473919e6c) ## Checklist - [ ] I have linked to any relevant issues. - [ ] I have commented my code, particularly in hard-to-understand areas. - [ ] I have updated the documentation where relevant (API docs, the reference, and the Sway book). - [ ] If my change requires substantial documentation changes, I have [requested support from the DevRel team](https://github.com/FuelLabs/devrel-requests/issues/new/choose) - [ ] I have added tests that prove my fix is effective or that my feature works. - [ ] I have added (or requested a maintainer to add) the necessary `Breaking*` or `New Feature` labels where relevant. - [ ] I have done my best to ensure that my PR adheres to [the Fuel Labs Code Review Standards](https://github.com/FuelLabs/rfcs/blob/master/text/code-standards/external-contributors.md). - [ ] I have requested a review from the relevant team or maintainers.
esdrubal
pushed a commit
that referenced
this issue
Aug 13, 2024
## Description Part of #5110. This PR implements supports for the slice new syntax: `&[T]`. The old syntax is still supported and will be deprecated in a future PR. ## Checklist - [x] I have linked to any relevant issues. - [x] I have commented my code, particularly in hard-to-understand areas. - [ ] I have updated the documentation where relevant (API docs, the reference, and the Sway book). - [ ] If my change requires substantial documentation changes, I have [requested support from the DevRel team](https://github.com/FuelLabs/devrel-requests/issues/new/choose) - [x] I have added tests that prove my fix is effective or that my feature works. - [x] I have added (or requested a maintainer to add) the necessary `Breaking*` or `New Feature` labels where relevant. - [x] I have done my best to ensure that my PR adheres to [the Fuel Labs Code Review Standards](https://github.com/FuelLabs/rfcs/blob/master/text/code-standards/external-contributors.md). - [x] I have requested a review from the relevant team or maintainers.
esdrubal
pushed a commit
that referenced
this issue
Aug 13, 2024
## Description This PR is part of #5110 and introduces two new intrinsic: `__slice` and `__elem_at`. `__slice` allows the creation of slices by slicing arrays or other slices. Whilst `__elem_at` returns a reference to an item inside the slice or the array. ## Out of bounds checks These intrinsic will not generate any runtime checks, these must be done manually, when and where appropriate; but they do a complete static analysis of all indices, to avoid runtime buffer overflows, when possible. That means that at runtime, it is possible to do a buffer overflow when reading/writing, which is an "undefined behaviour" as to what will happen. ## Empty Array This PR also solves a problem with empty arrays. Before empty arrays such as `let a = []` were being type-checked as `[Never; 0]`, which means that any code after them was being marked as dead. Now we correctly type check them as `[Unknown; 0]` and return a more friendly error. ``` 4 | 5 | // Empty array 6 | let a = []; | ^^ Type must be known at this point 7 | } | ____ ``` ## Check of constants inside fns This PR also solves a problem with not checking `const` expressions inside `fns`. We, for example, do not allow slices in constants, but we were only checking globals. Now we check constants also inside functions, methods etc... ## Small improvements for our e2e We can now `dbg` inside our e2e harness and get results like the ones below. One needs to include the lib `test/src/e2e_vm_tests/utils` and cal `something.dbg()` or `something.dbgln()`. There is no magic, and structs/enums will need to manually implement the `Dbg` trait. This is only to facilitate the debugging of our e2e tests. ![image](https://github.com/user-attachments/assets/2f25c50e-b7b3-4199-8bf4-699473919e6c) ## Checklist - [ ] I have linked to any relevant issues. - [ ] I have commented my code, particularly in hard-to-understand areas. - [ ] I have updated the documentation where relevant (API docs, the reference, and the Sway book). - [ ] If my change requires substantial documentation changes, I have [requested support from the DevRel team](https://github.com/FuelLabs/devrel-requests/issues/new/choose) - [ ] I have added tests that prove my fix is effective or that my feature works. - [ ] I have added (or requested a maintainer to add) the necessary `Breaking*` or `New Feature` labels where relevant. - [ ] I have done my best to ensure that my PR adheres to [the Fuel Labs Code Review Standards](https://github.com/FuelLabs/rfcs/blob/master/text/code-standards/external-contributors.md). - [ ] I have requested a review from the relevant team or maintainers.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The PR #4996 introduced string slices, but they are still very limited, and the ergonomics of needing
__to_str_array
to convert from literals to string arrays is horrible.This issue will track everything that is necessary to improve the string slices:
raw_slice
error message #5145String
str
andstr[N]
match
for string slices #6202__slice
and__slice_elem
intrinsic Slice/Array intrinsics:__slice
and__elem_at
#6282AbiEncode
/AbiDecode
will need to wait for trait calls on refs.Index
trait for arrays and slicesAnd the last step would be to remove string arrays entirely from the language. Which means
removing all string arrays intrinsic
Add support for escape codes in string literals Add support for escape codes in string literals #4993
The text was updated successfully, but these errors were encountered: