Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECS Agent crashes with: fatal error: concurrent map writes #707

Closed
rickard-von-essen-iz opened this issue Feb 13, 2017 · 3 comments
Closed

Comments

@rickard-von-essen-iz
Copy link

rickard-von-essen-iz commented Feb 13, 2017

fatal error: concurrent map writes

goroutine 1498949 [running]:
runtime.throw(0x930930, 0x15)
#011/usr/local/go/src/runtime/panic.go:566 +0x95 fp=0xc420511670 sp=0xc420511650
runtime.mapassign1(0x891f80, 0xc420d7dd70, 0xc420353e90, 0xc4205117a0)
#011/usr/local/go/src/runtime/hashmap.go:458 +0x8ef fp=0xc420511758 sp=0xc420511670
github.com/aws/amazon-ecs-agent/agent/engine.(*dockerImageManager).removeUnusedImages(0xc42010c580)
#011/go/src/github.com/aws/amazon-ecs-agent/agent/engine/docker_image_manager.go:282 +0xc0 fp=0xc4205117b8 sp=0xc420511758
runtime.goexit()
#011/usr/local/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc4205117c0 sp=0xc4205117b8
created by github.com/aws/amazon-ecs-agent/agent/engine.(*dockerImageManager).performPeriodicImageCleanup
#011/go/src/github.com/aws/amazon-ecs-agent/agent/engine/docker_image_manager.go:271 +0x11e

goroutine 1 [semacquire, 74 minutes]:
sync.runtime_Semacquire(0xc42000b984)
#011/usr/local/go/src/runtime/sema.go:47 +0x30
sync.(*Mutex).Lock(0xc42000b980)
#011/usr/local/go/src/sync/mutex.go:85 +0xd0
github.com/aws/amazon-ecs-agent/agent/vendor/github.com/cihub/seelog.Infof(0x9391e6, 0x24, 0xc42152f5d0, 0x1, 0x1)
#011/go/src/github.com/aws/amazon-ecs-agent/agent/vendor/github.com/cihub/seelog/log.go:216 +0x31
github.com/aws/amazon-ecs-agent/agent/acs/handler.startSession(0x7f939a89c078, 0xc42000a218, 0xc4201c20c0, 0x5a, 0xc4201960a0, 0xc42009ec80, 0xc420246a10, 0xb733c0, 0xc4200e41c0, 0xb71a20, ...)
#011/go/src/github.com/aws/amazon-ecs-agent/agent/acs/handler/acs_handler.go:185 +0x42e
github.com/aws/amazon-ecs-agent/agent/acs/handler.StartSession(0x7f939a89c078, 0xc42000a218, 0xc4201c20c0, 0x5a, 0xc4201960a0, 0xc42009ec80, 0xc420246a10, 0xb733c0, 0xc4200e41c0, 0xb71a20, ...)
#011/go/src/github.com/aws/amazon-ecs-agent/agent/acs/handler/acs_handler.go:155 +0x183
main._main(0x0)
#011/go/src/github.com/aws/amazon-ecs-agent/agent/agent.go:286 +0x34bb
main.main()
#011/go/src/github.com/aws/amazon-ecs-agent/agent/agent.go:62 +0x22

goroutine 5 [semacquire, 27015 minutes]:
sync.runtime_notifyListWait(0xc420013090, 0x0)
#011/usr/local/go/src/runtime/sema.go:267 +0x122
sync.(*Cond).Wait(0xc420013080)
#011/usr/local/go/src/sync/cond.go:57 +0x80
github.com/aws/amazon-ecs-agent/agent/vendor/github.com/cihub/seelog.(*asyncLoopLogger).processItem(0xc420050600, 0x0)
#011/go/src/github.com/aws/amazon-ecs-agent/agent/vendor/github.com/cihub/seelog/behavior_asynclooplogger.go:50 +0x97
github.com/aws/amazon-ecs-agent/agent/vendor/github.com/cihub/seelog.(*asyncLoopLogger).processQueue(0xc420050600)
#011/go/src/github.com/aws/amazon-ecs-agent/agent/vendor/github.com/cihub/seelog/behavior_asynclooplogger.go:63 +0x33
created by github.com/aws/amazon-ecs-agent/agent/vendor/github.com/cihub/seelog.newAsyncLoopLogger
#011/go/src/github.com/aws/amazon-ecs-agent/agent/vendor/github.com/cihub/seelog/behavior_asynclooplogger.go:40 +0x96

ECS init: 1.13.1

@samuelkarp
Copy link
Contributor

@rickard-von-essen-iz Is this the full set of stacks that you saw? The first one (goroutine 1498949 [running]) looks unrelated to the other two (goroutine 1 [semacquire, 74 minutes] and goroutine 5 [semacquire, 27015 minutes]) (in general, we'd probably be looking for two goroutines both marked [running] for this kind of error).

Based on the stack for goroutine 1498949 [running], it looks like this line is involved and that imageManager.imageStatesConsideredForDeletion really needs a sync.RWMutex around it.

@rickard-von-essen-iz
Copy link
Author

Sorry cut the trace a bit too much. Full stack trace. Ping me if you want a full days log.

vsiddharth added a commit to vsiddharth/amazon-ecs-agent that referenced this issue Mar 17, 2017
This patch inspects the cleanup and resolves the inherent concurrent map write
issue reported. An unit test has been added to increase confidence in the fix.

Fixes aws#707

Signed-off-by: Vinothkumar Siddharth <sidvin@amazon.com>
vsiddharth added a commit to vsiddharth/amazon-ecs-agent that referenced this issue Mar 22, 2017
This patch inspects the cleanup and resolves the inherent concurrent map write
issue reported. An unit test has been added to increase confidence in the fix.

Fixes aws#707

Signed-off-by: Vinothkumar Siddharth <sidvin@amazon.com>
vsiddharth added a commit to vsiddharth/amazon-ecs-agent that referenced this issue Mar 28, 2017
This patch inspects the cleanup and resolves the inherent concurrent map write
issue reported. An unit test has been added to increase confidence in the fix.

Fixes aws#707

Signed-off-by: Vinothkumar Siddharth <sidvin@amazon.com>
vsiddharth added a commit to vsiddharth/amazon-ecs-agent that referenced this issue Mar 28, 2017
This patch inspects the cleanup and resolves the inherent concurrent map write
issue reported. An unit test has been added to increase confidence in the fix.

Fixes aws#707

Signed-off-by: Vinothkumar Siddharth <sidvin@amazon.com>
vsiddharth added a commit to vsiddharth/amazon-ecs-agent that referenced this issue Mar 28, 2017
This patch inspects the cleanup and resolves the inherent concurrent map write
issue reported. An unit test has been added to increase confidence in the fix.

Fixes aws#707

Signed-off-by: Vinothkumar Siddharth <sidvin@amazon.com>
vsiddharth added a commit to vsiddharth/amazon-ecs-agent that referenced this issue May 23, 2017
This patch inspects the cleanup and resolves the inherent concurrent map write
issue reported. An unit test has been added to increase confidence in the fix.

Fixes aws#707

Signed-off-by: Vinothkumar Siddharth <sidvin@amazon.com>
adnxn pushed a commit to adnxn/amazon-ecs-agent that referenced this issue May 25, 2017
This patch inspects the cleanup and resolves the inherent concurrent map write
issue reported. An unit test has been added to increase confidence in the fix.

Fixes aws#707

Signed-off-by: Vinothkumar Siddharth <sidvin@amazon.com>
@samuelkarp samuelkarp added this to the 1.14.2 milestone Jun 5, 2017
@adnxn adnxn closed this as completed in 3eb2ca8 Jun 7, 2017
@samuelkarp
Copy link
Contributor

Released in v1.14.2.

jwerak pushed a commit to appuri/amazon-ecs-agent that referenced this issue Jun 8, 2017
This patch inspects the cleanup and resolves the inherent concurrent map write
issue reported. An unit test has been added to increase confidence in the fix.

Fixes aws#707

Signed-off-by: Vinothkumar Siddharth <sidvin@amazon.com>
petderek pushed a commit to petderek/amazon-ecs-agent that referenced this issue Jun 22, 2017
This patch inspects the cleanup and resolves the inherent concurrent map write
issue reported. An unit test has been added to increase confidence in the fix.

Fixes aws#707

Signed-off-by: Vinothkumar Siddharth <sidvin@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants