Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A faster s2b function #637

Merged
merged 4 commits into from
Aug 19, 2019
Merged

A faster s2b function #637

merged 4 commits into from
Aug 19, 2019

Conversation

zhangyunhao116
Copy link
Contributor

@zhangyunhao116 zhangyunhao116 commented Aug 18, 2019

A faster s2b function

The new function just use the return stack space to store the final value, without the allocation of a temporary struct. s2bFast is 100% faster if the Go compiler doesn't use deeper optimization in some code, if there is deeper optimization (inline for example), s2bFast is 5%~15% faster. The s2bFast is always same as s2b in any situations, both functions identically from the perspective of caller. You can see this in ASM, the new one has a smaller stack space and without locals.

Environment: go1.12.7 darwin/amd64
(The go code)

func s2b(s string) []byte {
sh := (*StringHeader)(unsafe.Pointer(&s))
bh := SliceHeader{
Data: sh.Data,
Len: sh.Len,
Cap: sh.Len,
}
return ([]byte)(unsafe.Pointer(&bh))
}

func s2bFast(s string) (b []byte) {
bh := (*SliceHeader)(unsafe.Pointer(&b))
sh := *(*StringHeader)(unsafe.Pointer(&s))
bh.Data = sh.Data
bh.Len = sh.Len
bh.Cap = sh.Len
return b
}

(In ASM)

"".s2b STEXT nosplit size=88 args=0x28 locals=0x20
0x0000 00000 (main.go:33) TEXT "".s2b(SB), NOSPLIT|ABIInternal, $32-40
0x0000 00000 (main.go:33) SUBQ $32, SP
0x0004 00004 (main.go:33) MOVQ BP, 24(SP)
0x0009 00009 (main.go:33) LEAQ 24(SP), BP
0x000e 00014 (main.go:33) FUNCDATA $0, gclocals·9fad110d66c97cf0b58d28cccea80b12(SB)
0x000e 00014 (main.go:33) FUNCDATA $1, gclocals·7d2d5fca80364273fb07d5820a76fef4(SB)
0x000e 00014 (main.go:33) FUNCDATA $3, gclocals·ebb0e8ce1793da18f0378b883cb3e122(SB)
0x000e 00014 (main.go:33) FUNCDATA $4, "".s2b.stkobj(SB)
0x000e 00014 (main.go:35) PCDATA $2, $0
0x000e 00014 (main.go:35) PCDATA $0, $0
0x000e 00014 (main.go:35) XORPS X0, X0
0x0011 00017 (main.go:35) MOVUPS X0, "".bh(SP)
0x0015 00021 (main.go:35) MOVQ $0, "".bh+16(SP)
0x001e 00030 (main.go:36) MOVQ "".s+40(SP), AX
0x0023 00035 (main.go:36) MOVQ AX, "".bh(SP)
0x0027 00039 (main.go:37) MOVQ "".s+48(SP), AX
0x002c 00044 (main.go:37) MOVQ AX, "".bh+8(SP)
0x0031 00049 (main.go:38) PCDATA $0, $1
0x0031 00049 (main.go:38) MOVQ "".s+48(SP), CX
0x0036 00054 (main.go:38) MOVQ CX, "".bh+16(SP)
0x003b 00059 (main.go:40) PCDATA $2, $1
0x003b 00059 (main.go:40) MOVQ "".bh(SP), DX
0x003f 00063 (main.go:40) PCDATA $2, $0
0x003f 00063 (main.go:40) PCDATA $0, $2
0x003f 00063 (main.go:40) MOVQ DX, "".~r1+56(SP)
0x0044 00068 (main.go:40) MOVQ AX, "".~r1+64(SP)
0x0049 00073 (main.go:40) MOVQ CX, "".~r1+72(SP)
0x004e 00078 (main.go:40) MOVQ 24(SP), BP
0x0053 00083 (main.go:40) ADDQ $32, SP
0x0057 00087 (main.go:40) RET
0x0000 48 83 ec 20 48 89 6c 24 18 48 8d 6c 24 18 0f 57 H.. H.l$.H.l$..W
0x0010 c0 0f 11 04 24 48 c7 44 24 10 00 00 00 00 48 8b ....$H.D$.....H.
0x0020 44 24 28 48 89 04 24 48 8b 44 24 30 48 89 44 24 D$(H..$H.D$0H.D$
0x0030 08 48 8b 4c 24 30 48 89 4c 24 10 48 8b 14 24 48 .H.L$0H.L$.H..$H
0x0040 89 54 24 38 48 89 44 24 40 48 89 4c 24 48 48 8b .T$8H.D$@H.L$HH.
0x0050 6c 24 18 48 83 c4 20 c3 l$.H.. .

"".s2bFast STEXT nosplit size=43 args=0x28 locals=0x0
0x0000 00000 (main.go:43) TEXT "".s2bV1(SB), NOSPLIT|ABIInternal, $0-40
0x0000 00000 (main.go:43) FUNCDATA $0, gclocals·39d1b96ca581879f548ad2c8aeb3a5fe(SB)
0x0000 00000 (main.go:43) FUNCDATA $1, gclocals·7d2d5fca80364273fb07d5820a76fef4(SB)
0x0000 00000 (main.go:43) FUNCDATA $3, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (main.go:43) FUNCDATA $4, "".s2bV1.stkobj(SB)
0x0000 00000 (main.go:43) PCDATA $2, $0
0x0000 00000 (main.go:43) PCDATA $0, $1
0x0000 00000 (main.go:43) MOVQ $0, "".b+24(SP)
0x0009 00009 (main.go:43) XORPS X0, X0
0x000c 00012 (main.go:43) MOVUPS X0, "".b+32(SP)
0x0011 00017 (main.go:45) MOVQ "".s+16(SP), AX
0x0016 00022 (main.go:45) PCDATA $0, $2
0x0016 00022 (main.go:45) MOVQ "".s+8(SP), CX
0x001b 00027 (main.go:46) MOVQ CX, "".b+24(SP)
0x0020 00032 (main.go:47) MOVQ AX, "".b+32(SP)
0x0025 00037 (main.go:48) MOVQ AX, "".b+40(SP)
0x002a 00042 (main.go:49) RET
Benchmark code here

func Benchmarks2b(b *testing.B) {
for i := 0; i < b.N; i++ {
s2b("111")
}
}

func Benchmarks2bFast(b *testing.B) {
for i := 0; i < b.N; i++ {
s2bFast("111")
}
}

Benchmark result ( Enable all optimizations)

goos: darwin
goarch: amd64
pkg: main/utils
Benchmarks2b-8 2000000000 0.29 ns/op
Benchmarks2bFast-8 2000000000 0.26 ns/op

Benchmark result ( Disable inline for benchmark, simulate no optimization situation )

goos: darwin
goarch: amd64
pkg: main/utils
Benchmarks2b-8 500000000 3.48 ns/op
Benchmarks2bFast-8 2000000000 1.56 ns/op

Copy link
Collaborator

@erikdubbelboer erikdubbelboer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find, just one question.

bytesconv.go Show resolved Hide resolved
@zhangyunhao116
Copy link
Contributor Author

zhangyunhao116 commented Aug 19, 2019

I also think so before, but from the point of view of Go compiler ASM, keep sh as *(*reflect.StringHeader)(unsafe.Pointer(&s)) costing lower than keep it as a pointer. There may some compiler optimizations in this case. Let's see it in ASM.
(Go code)

func s2bV1(s string) (b []byte) {
bh := (*SliceHeader)(unsafe.Pointer(&b))
sh := *(*StringHeader)(unsafe.Pointer(&s))
bh.Data = sh.Data
bh.Len = sh.Len
bh.Cap = sh.Len
return b
}

func s2bV2(s string) (b []byte) {
bh := (*SliceHeader)(unsafe.Pointer(&b))
sh := (*StringHeader)(unsafe.Pointer(&s))
bh.Data = sh.Data
bh.Len = sh.Len
bh.Cap = sh.Len
return b
}

(In ASM, without GC code)

"".s2bV1 STEXT nosplit size=43 args=0x28 locals=0x0
0x0000 00000 (t_main2.go:18) TEXT "".s2bV1(SB), NOSPLIT|ABIInternal, $0-40
0x0011 00017 (t_main2.go:20) MOVQ "".s+16(SP), AX
0x0016 00022 (t_main2.go:20) MOVQ "".s+8(SP), CX
0x001b 00027 (t_main2.go:21) MOVQ CX, "".b+24(SP)
0x0020 00032 (t_main2.go:22) MOVQ AX, "".b+32(SP)
0x0025 00037 (t_main2.go:23) MOVQ AX, "".b+40(SP)
0x002a 00042 (t_main2.go:24) RET

"".s2bV2 STEXT nosplit size=48 args=0x28 locals=0x0
0x0000 00000 (t_main2.go:27) TEXT "".s2bV2(SB), NOSPLIT|ABIInternal, $0-40
0x0011 00017 (t_main2.go:30) MOVQ "".s+8(SP), AX
0x0016 00022 (t_main2.go:30) MOVQ AX, "".b+24(SP)
0x001b 00027 (t_main2.go:31) MOVQ "".s+16(SP), AX
0x0020 00032 (t_main2.go:31) MOVQ AX, "".b+32(SP)
0x0025 00037 (t_main2.go:32) MOVQ "".s+16(SP), AX
0x002a 00042 (t_main2.go:32) MOVQ AX, "".b+40(SP)
0x002f 00047 (t_main2.go:33) RET

We can see the version one can use more registers and the version two can use only one, so V1 has fewer instructions in ASM, actually V1 use 5 instructions, V2 use 6 instructions. (Size of V1 is 43, and size of V2 is 48, and this is the only difference.)

@erikdubbelboer erikdubbelboer merged commit c5413ff into valyala:master Aug 19, 2019
@erikdubbelboer
Copy link
Collaborator

Interesting. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants