-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cni: fix data-race on lazy init by ensureExec(). #82
Conversation
Before this patch, the package doesn't init CNIConfig.exec field. It is lazy init by CNIConfig.ensureExec() during AddNetworkList. After enhancement [1] , there are always more than two goroutines(one for loopback and other one for eth0) to call ensureExec(). Go runtime doesn't ensure that value-assignment is atomic, and then it is possible to use nil and panic, like [2]. > Programs that modify data being simultaneously accessed by multiple > goroutines must serialize such access. > > from https://golang.org/ref/mem Advice section. I think it should be fixed in github.com/containernetworking/cni library but we need to delivery containerd release 1.6 version in time. I would like to init CNIConfig like what the `ensureExec()` [3] does instead of lazy init. ------------ How to reproduce it: ```golang package main import "sync" type Exec interface { FindInPath(v int) } type rawExec struct { v int } func (r *rawExec) FindInPath(v int) { r.v += v } type cniConfig struct { exec Exec } func (c *cniConfig) addNetwork(v int) { if c.exec == nil { c.exec = &rawExec{} } c.exec.FindInPath(v) } func main() { c := &cniConfig{} for i := 0; i < 100000; i++ { c.exec = nil // reset var wg sync.WaitGroup for j := 0; j <= 10; j++ { wg.Add(1) go func(k int) { defer wg.Done() c.addNetwork(k) }(j) } wg.Wait() } } ``` And run it ```bash ➜ /tmp go version go version go1.17.3 linux/amd64 ➜ /tmp go run main.go panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x455a40] goroutine 264932 [running]: main.(*rawExec).FindInPath(0x0, 0x0) /tmp/main.go:14 main.(*cniConfig).addNetwork(...) /tmp/main.go:25 main.main.func1(0x0) /tmp/main.go:40 +0xb6 created by main.main /tmp/main.go:37 +0xb8 exit status 2 ➜ /tmp ``` Root Cause: The panic meesage is `main.(*rawExec).FindInPath(0x0, 0x0)` which means that the interface `exec` var has the underly type `rawExec` but its value is zero `FindInPath(0x0`, like what [2] mentioned. ```golang-assembly // go tool compile -S ./main.go // "".(*cniConfig).addNetwork STEXT size=148 args=0x10 locals=0x20 funcid=0x0 ... ... # (AX) is cniConfig address -> checkout c.exec == nil 0x0014 00020 (./main.go:22) CMPQ (AX), $0 0x0018 00024 (./main.go:22) JNE 95 # if c.exec == nil # # Since the go interface is struct iface # # https://github.com/golang/go/blob/release-branch.go1.17/src/runtime/runtime2.go#L202 # # type iface struct { # tab *itab # data unsafe.Pointer # } # # The value-assignment will be two instructions(one for tab and # the other one for data. 0x001a 00026 (./main.go:21) MOVQ AX, "".c+40(SP) 0x001f 00031 (./main.go:25) MOVQ BX, ""..autotmp_4+16(SP) # # AX will be rawExec type information # # type."".rawExec SRODATA size=120 # ... # rel 24+8 t=1 runtime.memequal64·f+0 # rel 32+8 t=1 runtime.gcbits.+0 # rel 40+4 t=5 type..namedata.*main.rawExec-+0 # rel 44+4 t=5 type.*"".rawExec+0 # rel 48+8 t=1 type..importpath."".+0 # rel 56+8 t=1 type."".rawExec+96 # rel 80+4 t=5 type..importpath."".+0 # rel 96+8 t=1 type..namedata.v-+0 # rel 104+8 t=1 type.int+0 # 0x0024 00036 (./main.go:23) LEAQ type."".rawExec(SB), AX 0x002b 00043 (./main.go:23) PCDATA $1, $0 0x002b 00043 (./main.go:23) CALL runtime.newobject(SB) # # CX will be itab and get FindInPath TEXT address. # # go.itab.*"".rawExec,"".Exec SRODATA dupok size=32 # ... # rel 0+8 t=1 type."".Exec+0 # rel 8+8 t=1 type.*"".rawExec+0 # rel 24+8 t=-32767 "".(*rawExec).FindInPath+0 # 0x0030 00048 (./main.go:23) LEAQ go.itab.*"".rawExec,"".Exec(SB), CX 0x0037 00055 (./main.go:23) MOVQ "".c+40(SP), DX 0x003c 00060 (./main.go:23) MOVQ CX, (DX) 0x003f 00063 (./main.go:23) PCDATA $0, $-2 0x003f 00063 (./main.go:23) CMPL runtime.writeBarrier(SB), $0 0x0046 00070 (./main.go:23) JNE 78 0x0048 00072 (./main.go:23) MOVQ AX, 8(DX) 0x004c 00076 (./main.go:23) JMP 87 0x004e 00078 (./main.go:23) LEAQ 8(DX), DI 0x0052 00082 (./main.go:23) CALL runtime.gcWriteBarrier(SB) 0x0057 00087 (./main.go:25) PCDATA $0, $-1 0x0057 00087 (./main.go:25) MOVQ DX, AX 0x005a 00090 (./main.go:25) MOVQ ""..autotmp_4+16(SP), BX # if c.exec != nil and try to call FindInPath # # (AX) is cniConfig # 0x005f 00095 (./main.go:25) MOVQ (AX), CX # # 8(AX) is rawExec address (receiver) # 0x0062 00098 (./main.go:25) MOVQ 8(AX), AX # # 24(CX) is the FindInPath TEXT address. # 0x0066 00102 (./main.go:25) MOVQ 24(CX), CX 0x006a 00106 (./main.go:25) PCDATA $1, $1 0x006a 00106 (./main.go:25) CALL CX 0x006c 00108 (./main.go:26) MOVQ 24(SP), BP 0x0071 00113 (./main.go:26) ADDQ $32, SP 0x0075 00117 (./main.go:26) RET ... ... ``` Since there are multiple goroutines to check `c.exec == nil`, if the lazy-init only assigns iface.tab and the FindInPath TEXT address, interface's value is still nil and the FindInPath will use the interface's value as method receiver. It will panic if try to read the value from receiver. Reference: [1] containerd#76 [2] moby/buildkit#2589 (comment) [3] https://github.com/containernetworking/cni/blob/1694fd7b57e0176a98a12823a5ffc03337fdc152/libcni/api.go#L182 Fixes: containerd#76 Signed-off-by: Wei Fu <fuweid89@gmail.com>
Codecov Report
@@ Coverage Diff @@
## main #82 +/- ##
==========================================
+ Coverage 44.24% 45.08% +0.84%
==========================================
Files 9 9
Lines 391 397 +6
==========================================
+ Hits 173 179 +6
Misses 184 184
Partials 34 34
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @fuweid great catch |
Follow-up: containerd#82 Signed-off-by: Wei Fu <fuweid89@gmail.com>
Follow-up: containerd#82 Signed-off-by: Wei Fu <fuweid89@gmail.com>
Before this patch, the package doesn't init CNIConfig.exec field. It is
lazy init by CNIConfig.ensureExec() during AddNetworkList. After
enhancement [1] , there are always more than two goroutines(one for
loopback and other one for eth0) to call ensureExec(). Go runtime
doesn't ensure that value-assignment is atomic, and then it is possible
to use nil and panic, like [2].
I think it should be fixed in github.com/containernetworking/cni library
but we need to delivery containerd release 1.6 version in time. I would
like to init CNIConfig like what the
ensureExec()
[3] does instead of lazyinit.
How to reproduce it:
And run it
Root Cause:
The panic meesage is
main.(*rawExec).FindInPath(0x0, 0x0)
which meansthat the interface
exec
var has the underly typerawExec
but itsvalue is zero
FindInPath(0x0
, like what [2] mentioned.Since there are multiple goroutines to check
c.exec == nil
, if thelazy-init only assigns iface.tab and the FindInPath TEXT address,
interface's value is still nil and the FindInPath will use the
interface's value as method receiver. It will panic if try to read the
value from receiver.
Reference:
[1] #76
[2] moby/buildkit#2589 (comment)
[3] https://github.com/containernetworking/cni/blob/1694fd7b57e0176a98a12823a5ffc03337fdc152/libcni/api.go#L182
Fixes: #76
Signed-off-by: Wei Fu fuweid89@gmail.com