Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono] Unsafe.[Write/Read]Unaligned doesn't unroll 64B blocks #106822

Open
matouskozak opened this issue Aug 22, 2024 · 1 comment
Open

[mono] Unsafe.[Write/Read]Unaligned doesn't unroll 64B blocks #106822

matouskozak opened this issue Aug 22, 2024 · 1 comment
Labels
Milestone

Comments

@matouskozak
Copy link
Member

matouskozak commented Aug 22, 2024

In dotnet/perf-autofiling-issues#33182, we discovered that MonoJIT generates two memcpy calls for

[StructLayout(LayoutKind.Sequential, Size = 64)]
public struct Block64 {}

Unsafe.WriteUnaligned(ref dest, Unsafe.ReadUnaligned<Block64>(ref src));
1  il_seq_point intr il: 0x0
2  il_seq_point il: 0x1
3  load_membase R37 <- [fp + 0x18]
4  add_imm R38 <- fp [32]
5  move R40 <- R38
6  move R41 <- R37
7  iconst R42 <- [64]
8  voidcall [void string:memcpy (byte*,byte*,int)] [r0 <- R40] [r1 <- R41] [r2 <- R42] clobbers: c
9  il_seq_point il: 0x8, nonempty-stack
10 add_imm R43 <- fp [32]
11 load_membase R44 <- [fp + 0x10]
12 move R46 <- R44
13 move R47 <- R43
14 iconst R48 <- [64]
15 voidcall [void string:memcpy (byte*,byte*,int)] [r0 <- R46] [r1 <- R47] [r2 <- R48] clobbers: c
16 il_seq_point il: 0xd, nonempty-stack
17 il_seq_point il: 0xd
18 il_seq_point il: 0xe

For this scenario, MonoJIT takes the "safe" path in mini_emit_memcpy_internal (passes size / align > MAX_INLINE_COPIES) instead of using mini_emit_memcpy that handles with copy unrolling.

In comparison Unsafe.As<byte, Block64>(ref dest) = Unsafe.As<byte, Block64>(ref src); leads to:

1  il_seq_point intr il: 0x0
2  il_seq_point il: 0x1
3  il_seq_point il: 0x7, nonempty-stack
4  il_seq_point il: 0xd, nonempty-stack
5  load_membase R39 <- [fp + 0x10]
6  nop
7  load_membase R40 <- [fp + 0x18]
8  nop
9  iconst R41 <- [64]
10 voidcall [void string:memcpy (byte*,byte*,int)] [r0 <- R39] [r1 <- R40] [r2 <- R41] clobbers: c
11 il_seq_point il: 0x17

This is causing serious regression on MonoJIT dotnet/perf-autofiling-issues#33182 and more. Fixing this would bring over 400+ microbenchmark improvements (dotnet/perf-autofiling-issues#41406 (comment))

Copy link
Contributor

Tagging subscribers to this area: @lambdageek, @steveisok
See info in area-owners.md if you want to be subscribed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant