Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ocasional containerd crashes on CI #4154

Closed
mxpv opened this issue Apr 4, 2020 · 3 comments
Closed

Ocasional containerd crashes on CI #4154

mxpv opened this issue Apr 4, 2020 · 3 comments

Comments

@mxpv
Copy link
Member

mxpv commented Apr 4, 2020

I started seeing this while migrating integration tests on Github actions (1), but then noticed that it's quite often happens on Travis as well (1, 2, 3, 4).

time="2020-04-03T20:10:14.618255755Z" level=debug msg="(*service).Write started" ref=c1-commiterror-state
2932time="2020-04-03T20:10:14.624627962Z" level=debug msg="(*service).Write started" ref=c1-commiterror-state
2933time="2020-04-03T20:10:15.182502071Z" level=error msg="(*service).Write failed" error="rpc error: code = Unavailable desc = ref ContentClient-n1/1/c1-commiterror-state locked: unavailable" ref=c1-commiterror-state
2934time="2020-04-03T20:10:15.186321757Z" level=error msg="(*service).Write failed" error="rpc error: code = FailedPrecondition desc = unexpected commit digest sha256:af9f4d33a30760b1d8a0297164ac7bcc2d1a8b1bdaf7880e2ce2fdfb96edec65, expected sha256:f7358a28d3925ed49329484588a910278afbfe778319abb24db99b793a671202: failed precondition" ref=c1-commiterror-state

...

time="2020-04-03T15:26:32.535416150Z" level=debug msg="(*service).Write started" expected="sha256:208de85e6cfa2a86b205fd3e1fa66763ad3df81588b5a1b47a0b77e78a67e3cd" ref="manifest-sha256:208de85e6cfa2a86b205fd3e1fa66763ad3df81588b5a1b47a0b77e78a67e3cd" total=528
2136 time="2020-04-03T15:26:32.535612486Z" level=debug msg="(*service).Write started" expected="sha256:05e348797cb2a37dc91ba34fffbfcc64192097c113a1fc49b0b6bf047d96c81f" ref="manifest-sha256:05e348797cb2a37dc91ba34fffbfcc64192097c113a1fc49b0b6bf047d96c81f" total=527
2137 runtime: nelems=11 nalloc=2 previous allocCount=1 nfreed=65535
2138 fatal error: sweep increased allocation count
2139
2140 runtime stack:
2141 runtime.throw(0x556b83439af9, 0x20)
2142	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/runtime/panic.go:774 +0x74
2143 runtime.(*mspan).sweep(0x7fc116738460, 0x7fc116738400, 0x556b823c5f00)
2144	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/runtime/mgcsweep.go:328 +0x8cc
2145 runtime.(*mcentral).uncacheSpan(0x556b84b21c38, 0x7fc116738460)
2146	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/runtime/mcentral.go:197 +0x7b
2147 runtime.(*mcache).releaseAll(0x7fc11676d008)
...

and lots of errors like:

--- FAIL: TestContainerPTY (0.00s)
2009    container_linux_test.go:468: failed to dial "/run/containerd-test/containerd.sock": connection error: desc = "transport: error while dialing: dial unix /run/containerd-test/containerd.sock: connect: connection refused"
2010--- FAIL: TestContainerAttach (0.00s)
2011    container_linux_test.go:545: failed to dial "/run/containerd-test/containerd.sock": connection error: desc = "transport: error while dialing: dial unix /run/containerd-test/containerd.sock: connect: connection refused"
2012--- FAIL: TestShimInCgroup (0.00s)
2013    container_linux_test.go:134: failed to dial "/run/containerd-test/containerd.sock": connection error: desc = "transport: error while dialing: dial unix /run/containerd-test/containerd.sock: connect: connection refused"
2014--- FAIL: TestTaskUpdate (0.00s)
2015    container_linux_test.go:56: failed to dial "/run/containerd-test/containerd.sock": connection error: desc = "transport: error while dialing: dial unix /run/containerd-test/containerd.sock: connect: connection refused"
...

might be related to unexpected commit digest problems: #3974

@mxpv
Copy link
Member Author

mxpv commented Apr 5, 2020

A bit more info with GODEBUG=gccheckmark=1

2020-04-05T05:15:42.9497835Z time="2020-04-05T05:15:37.532756809Z" level=debug msg="(*service).Write started" expected="sha256:da64c92de9087c1e5628296d0034c579c8b73d4ebe47444228c4a96d54e3eb2c" ref="manifest-sha256:da64c92de9087c1e5628296d0034c579c8b73d4ebe47444228c4a96d54e3eb2c" total=528
2020-04-05T05:15:42.9498204Z runtime: marking free object 0xc000a7e820 found at *(0xc0000cee00+0x128)
2020-04-05T05:15:42.9498369Z base=0xc0000cee00 s.base()=0xc0000ce000 s.limit=0xc0000d0000 s.spanclass=50 s.elemsize=512 s.state=mSpanInUse
2020-04-05T05:15:42.9498510Z  *(base+0) = 0x0
2020-04-05T05:15:42.9498624Z  *(base+8) = 0x0
2020-04-05T05:15:42.9498742Z  *(base+16) = 0xc000a7e6e0
2020-04-05T05:15:42.9498853Z  *(base+24) = 0x50
2020-04-05T05:15:42.9498968Z  *(base+32) = 0x50
2020-04-05T05:15:42.9499085Z  *(base+40) = 0xc000a7e730
2020-04-05T05:15:42.9499200Z  *(base+48) = 0x0
2020-04-05T05:15:42.9499313Z  *(base+56) = 0x0
2020-04-05T05:15:42.9499411Z  *(base+64) = 0x0
2020-04-05T05:15:42.9499525Z  *(base+72) = 0x0
2020-04-05T05:15:42.9499637Z  *(base+80) = 0xc000a7e730
2020-04-05T05:15:42.9499752Z  *(base+88) = 0x50
2020-04-05T05:15:42.9499841Z  *(base+96) = 0x50
2020-04-05T05:15:42.9500080Z  *(base+104) = 0xc000a7e780
2020-04-05T05:15:42.9500212Z  *(base+112) = 0x0
2020-04-05T05:15:42.9500326Z  *(base+120) = 0x0
2020-04-05T05:15:42.9500440Z  *(base+128) = 0x0
2020-04-05T05:15:42.9500541Z  *(base+136) = 0x0
2020-04-05T05:15:42.9500657Z  *(base+144) = 0xc000a7e780
2020-04-05T05:15:42.9501806Z  *(base+152) = 0x50
2020-04-05T05:15:42.9501929Z  *(base+160) = 0x50
2020-04-05T05:15:42.9502046Z  *(base+168) = 0xc000a7e7d0
2020-04-05T05:15:42.9502149Z  *(base+176) = 0x0
2020-04-05T05:15:42.9502264Z  *(base+184) = 0x0
2020-04-05T05:15:42.9502378Z  *(base+192) = 0x0
2020-04-05T05:15:42.9502490Z  *(base+200) = 0x0
2020-04-05T05:15:42.9502603Z  *(base+208) = 0xc0005c2230
2020-04-05T05:15:42.9502686Z  *(base+216) = 0x50
2020-04-05T05:15:42.9502886Z  *(base+224) = 0x50
2020-04-05T05:15:42.9503007Z  *(base+232) = 0x0
2020-04-05T05:15:42.9503101Z  *(base+240) = 0x0
2020-04-05T05:15:42.9503277Z  *(base+248) = 0x0
2020-04-05T05:15:42.9503845Z  *(base+256) = 0x0
2020-04-05T05:15:42.9503962Z  *(base+264) = 0x0
2020-04-05T05:15:42.9504057Z  *(base+272) = 0xc000a7e7d0
2020-04-05T05:15:42.9504145Z  *(base+280) = 0x50
2020-04-05T05:15:42.9504390Z  *(base+288) = 0x50
2020-04-05T05:15:42.9504572Z  *(base+296) = 0xc000a7e820 <==
2020-04-05T05:15:42.9504691Z  *(base+304) = 0x0
2020-04-05T05:15:42.9504814Z  *(base+312) = 0x0
2020-04-05T05:15:42.9504915Z  *(base+320) = 0x0
2020-04-05T05:15:42.9505010Z  *(base+328) = 0x0
2020-04-05T05:15:42.9505212Z  *(base+336) = 0x0
2020-04-05T05:15:42.9505334Z  *(base+344) = 0x0
2020-04-05T05:15:42.9505451Z  *(base+352) = 0x0
2020-04-05T05:15:42.9505531Z  *(base+360) = 0x0
2020-04-05T05:15:42.9505625Z  *(base+368) = 0x0
2020-04-05T05:15:42.9505823Z  *(base+376) = 0x0
2020-04-05T05:15:42.9505944Z  *(base+384) = 0x0
2020-04-05T05:15:42.9506025Z  *(base+392) = 0x0
2020-04-05T05:15:42.9506214Z  *(base+400) = 0x0
2020-04-05T05:15:42.9506332Z  *(base+408) = 0x0
2020-04-05T05:15:42.9506445Z  *(base+416) = 0x0
2020-04-05T05:15:42.9506896Z  *(base+424) = 0x0
2020-04-05T05:15:42.9507122Z  *(base+432) = 0x0
2020-04-05T05:15:42.9507245Z  *(base+440) = 0x0
2020-04-05T05:15:42.9507362Z  *(base+448) = 0x0
2020-04-05T05:15:42.9507473Z  *(base+456) = 0x0
2020-04-05T05:15:42.9507572Z  *(base+464) = 0x0
2020-04-05T05:15:42.9507686Z  *(base+472) = 0x0
2020-04-05T05:15:42.9507780Z  *(base+480) = 0x0
2020-04-05T05:15:42.9507980Z  *(base+488) = 0x0
2020-04-05T05:15:42.9508102Z  *(base+496) = 0x0
2020-04-05T05:15:42.9508183Z  *(base+504) = 0x0
2020-04-05T05:15:42.9508389Z obj=0xc000a7e820 s.base()=0xc000a7e000 s.limit=0xc000a7fee0 s.spanclass=45 s.elemsize=416 s.state=mSpanInUse
2020-04-05T05:15:42.9508529Z  *(obj+0) = 0x0
2020-04-05T05:15:42.9508645Z  *(obj+8) = 0x0
2020-04-05T05:15:42.9508761Z  *(obj+16) = 0x0
2020-04-05T05:15:42.9508843Z  *(obj+24) = 0x0
2020-04-05T05:15:42.9509036Z  *(obj+32) = 0x0
2020-04-05T05:15:42.9509158Z  *(obj+40) = 0x0
2020-04-05T05:15:42.9509254Z  *(obj+48) = 0x0
2020-04-05T05:15:42.9509430Z  *(obj+56) = 0x0
2020-04-05T05:15:42.9509683Z  *(obj+64) = 0x0
2020-04-05T05:15:42.9509788Z  *(obj+72) = 0x0
2020-04-05T05:15:42.9510072Z  *(obj+80) = 0x0
2020-04-05T05:15:42.9510192Z  *(obj+88) = 0x0
2020-04-05T05:15:42.9510272Z  *(obj+96) = 0x0
2020-04-05T05:15:42.9510461Z  *(obj+104) = 0x0
2020-04-05T05:15:42.9510583Z  *(obj+112) = 0x0
2020-04-05T05:15:42.9510725Z  *(obj+120) = 0x0
2020-04-05T05:15:42.9510806Z  *(obj+128) = 0x0
2020-04-05T05:15:42.9510902Z  *(obj+136) = 0x0
2020-04-05T05:15:42.9510999Z  *(obj+144) = 0x0
2020-04-05T05:15:42.9511095Z  *(obj+152) = 0x0
2020-04-05T05:15:42.9511174Z  *(obj+160) = 0x0
2020-04-05T05:15:42.9511270Z  *(obj+168) = 0x0
2020-04-05T05:15:42.9511366Z  *(obj+176) = 0x0
2020-04-05T05:15:42.9511461Z  *(obj+184) = 0x0
2020-04-05T05:15:42.9511541Z  *(obj+192) = 0x0
2020-04-05T05:15:42.9511740Z  *(obj+200) = 0x0
2020-04-05T05:15:42.9512183Z  *(obj+208) = 0x0
2020-04-05T05:15:42.9512305Z  *(obj+216) = 0x0
2020-04-05T05:15:42.9512386Z  *(obj+224) = 0x0
2020-04-05T05:15:42.9512581Z  *(obj+232) = 0x0
2020-04-05T05:15:42.9512822Z  *(obj+240) = 0x0
2020-04-05T05:15:42.9512937Z  *(obj+248) = 0x0
2020-04-05T05:15:42.9513072Z  *(obj+256) = 0x0
2020-04-05T05:15:42.9513151Z  *(obj+264) = 0x0
2020-04-05T05:15:42.9513245Z  *(obj+272) = 0x0
2020-04-05T05:15:42.9513340Z  *(obj+280) = 0x0
2020-04-05T05:15:42.9513432Z  *(obj+288) = 0x0
2020-04-05T05:15:42.9513513Z  *(obj+296) = 0x0
2020-04-05T05:15:42.9513713Z  *(obj+304) = 0x0
2020-04-05T05:15:42.9513832Z  *(obj+312) = 0x0
2020-04-05T05:15:42.9513944Z  *(obj+320) = 0x0
2020-04-05T05:15:42.9514061Z  *(obj+328) = 0x0
2020-04-05T05:15:42.9514160Z  *(obj+336) = 0x0
2020-04-05T05:15:42.9514273Z  *(obj+344) = 0x0
2020-04-05T05:15:42.9514387Z  *(obj+352) = 0x0
2020-04-05T05:15:42.9514499Z  *(obj+360) = 0x0
2020-04-05T05:15:42.9514596Z  *(obj+368) = 0x0
2020-04-05T05:15:42.9514716Z  *(obj+376) = 0x0
2020-04-05T05:15:42.9514810Z  *(obj+384) = 0x0
2020-04-05T05:15:42.9514905Z  *(obj+392) = 0x0
2020-04-05T05:15:42.9515107Z  *(obj+400) = 0x0
2020-04-05T05:15:42.9515212Z  *(obj+408) = 0x0
2020-04-05T05:15:42.9515335Z fatal error: marking free object
2020-04-05T05:15:42.9515424Z 
2020-04-05T05:15:42.9515509Z runtime stack:
2020-04-05T05:15:42.9515705Z runtime.throw(0x5555ffd252ee, 0x13)
2020-04-05T05:15:42.9515849Z 	/usr/local/go1.14/src/runtime/panic.go:1114 +0x74 fp=0x7fa97a0fad18 sp=0x7fa97a0face8 pc=0x5555fed22ba4
2020-04-05T05:15:42.9515993Z runtime.greyobject(0xc000a7e820, 0xc0000cee00, 0x128, 0x7fa9a26d6f30, 0xc000051e98, 0x5)
2020-04-05T05:15:42.9516142Z 	/usr/local/go1.14/src/runtime/mgcmark.go:1432 +0x422 fp=0x7fa97a0fad48 sp=0x7fa97a0fad18 pc=0x5555fed0e5b2
2020-04-05T05:15:42.9516266Z runtime.scanobject(0xc0000cee00, 0xc000051e98)
2020-04-05T05:15:42.9516402Z 	/usr/local/go1.14/src/runtime/mgcmark.go:1275 +0x2b5 fp=0x7fa97a0fadd8 sp=0x7fa97a0fad48 pc=0x5555fed0dda5
2020-04-05T05:15:42.9516544Z runtime.gcDrainN(0xc000051e98, 0xfae0, 0x19)
2020-04-05T05:15:42.9516666Z 	/usr/local/go1.14/src/runtime/mgcmark.go:1126 +0x131 fp=0x7fa97a0fae08 sp=0x7fa97a0fadd8 pc=0x5555fed0d871
2020-04-05T05:15:42.9516888Z runtime.gcAssistAlloc1(0xc000598c00, 0xfae0)
2020-04-05T05:15:42.9517301Z 	/usr/local/go1.14/src/runtime/mgcmark.go:531 +0xf7 fp=0x7fa97a0fae58 sp=0x7fa97a0fae08 pc=0x5555fed0c2d7
2020-04-05T05:15:42.9517410Z runtime.gcAssistAlloc.func1()
2020-04-05T05:15:42.9517624Z 	/usr/local/go1.14/src/runtime/mgcmark.go:442 +0x35 fp=0x7fa97a0fae78 sp=0x7fa97a0fae58 pc=0x5555fed51695
2020-04-05T05:15:42.9517765Z runtime.systemstack(0x0)
2020-04-05T05:15:42.9517934Z 	/usr/local/go1.14/src/runtime/asm_amd64.s:370 +0x63 fp=0x7fa97a0fae80 sp=0x7fa97a0fae78 pc=0x5555fed54673
2020-04-05T05:15:42.9518048Z runtime.mstart()
2020-04-05T05:15:42.9518247Z 	/usr/local/go1.14/src/runtime/proc.go:1041 fp=0x7fa97a0fae88 sp=0x7fa97a0fae80 pc=0x5555fed28760

@mxpv
Copy link
Member Author

mxpv commented Apr 5, 2020

I was able to track down the faulty commit, it looks like that this problem started appearing on a83927d

@mxpv
Copy link
Member Author

mxpv commented Apr 5, 2020

Confirmed: etcd-io/bbolt#214

@fuweid fuweid closed this as completed in 3968fb0 Apr 5, 2020
fahedouch pushed a commit to fahedouch/containerd that referenced this issue Aug 7, 2020
This reverts commit fb9e3d9.

Fixes: containerd#4154

Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
tussennet pushed a commit to tussennet/containerd that referenced this issue Sep 11, 2020
This reverts commit fb9e3d9.

Fixes: containerd#4154

Signed-off-by: Maksym Pavlenko <makpav@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants