Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving endpoint to store fails in windows under concurrent load #1950

Closed
msabansal opened this issue Sep 22, 2017 · 15 comments · Fixed by #2268 or moby/moby#37843
Closed

Saving endpoint to store fails in windows under concurrent load #1950

msabansal opened this issue Sep 22, 2017 · 15 comments · Fixed by #2268 or moby/moby#37843

Comments

@msabansal
Copy link
Contributor

We are trying to create a lot of parallel containers on windows. As part of the workflow multiple endpoints are being created and saved simultaneously. When this happens some container creation fails as docker is not able to save the endpoint to store:

time="2017-09-12T19:53:30.792343000-07:00" level=debug msg="FIXME: Got an API for which error does not match any expected type!!!: failed to create endpoint eager_newton on network nat: failed to save endpoint 53944a9 to store: failed to update store for object type *windows.hnsEndpoint: open C:\docker/network/files/local-kv.db.lock: The process cannot access the file cause it is being used by another process." error_type=types.internal module=api

I am waiting for a repro machine to track this. Probably only happens on slow machines but creating an issue for tracking. This is also causing some noise in docker CI aswell.

@friism
Copy link

friism commented Oct 13, 2017

cc @mavenugo

@msabansal
Copy link
Contributor Author

A coworker hit another instance of this when doing 10 parallel docker run over ~1000 iterations.

C:\Windows\system32\docker.exe: Error response from daemon: failed to update the store state of sandbox: failed to
update store for object type *libnetwork.sb State: open C:\ProgramData\docker/network/files/local-kv.db.lock: Access is
denied.
+ CategoryInfo : NotSpecified: (C:\Windows\syst...cess is denied.:String) [], RemoteException
+ FullyQualifiedErrorId : NativeCommandError
+ PSComputerName : localhost

@rvarna
Copy link

rvarna commented Nov 9, 2017

We have hit the same issue on an Azure VM running "Windows Server 2016 with Containers". We were trying to only create 5 containers at a time. Looks like the 5th container creation worked but starting the container failed with:

failed to create endpoint gracious_pasteur on network <our_network>: failed to save endpoint aef7b83 to store: failed to update store for object type *windows.hnsEndpoint: open C:\ProgramData\docker/network/files/local-kv.db.lock: The process cannot access the file because it is being used by another process."}

@ghost
Copy link

ghost commented Nov 10, 2017

It seems somehow obvious that this is related, identical repro and endpoint object. My thoughts would be that another endpoint type wasn't handled properly here elsewhere and is causing continued issues. Original Moby bug / merged to libnetworking:

https://github.com/docker/libnetwork/pull/987/files

I have a couple of speculations but I'm new at playing with docker and have more limited networking experience than many, so please correct me if I'm making a blatant error or if the guess is wrong.

I noticed a few other places where an endpoint (which inherit from the same class in WIndows) is warned about but not deleted or otherwise cleaned up. Unfortunately I can't tell if this is intentional in some cases.

The changes made caused a missing missing endpoint in the KV store to throw a fatal log message, but elsewhere line on line 201 the same endpoint deletion failure is considered a warning. I'm guessing this is for the fall-through to the cleanup code, but uses of the KV store without a matching entry to remove (addition is fine, I'm guessing), or a program termination may be dangerous.

Allowing warnings through on endpoints failures when KV stores are later used to look for them may be causing a failure of the file to be unlocked. Like I said, I'm new, so I'd hope the storage DB was thread safe. Windows should have an effectively unlimited file handle limit so that doesn't seem likely, although the behavior would be vaguely similar.

@ghost
Copy link

ghost commented Nov 10, 2017

On second thought the ErrorF and warn EPs is probably unsafe too, the underlying windows EP object may not be deleted before the process is killed or the KV store sent nil in the case of WarnF... and that's usually what's happening when the warnings occur.

@JMesser81
Copy link

@mavenugo - restarting this conversation... this issue has hit us recently on a few different occasions where endpoints are not being cleaned up correctly. Do we know what the next steps are in this investigation?
cc @pradipd

@JMesser81
Copy link

cc @daschott ^^

@lowenna
Copy link
Member

lowenna commented Sep 10, 2018

I did a little investigation on this as I was able to repro reasonably consistently. Here's a couple of interesting goroutines when close() fails as in-use. Will try to figure out what's going on, but it looks like a legitimate race - one thread closing the database, and another is clearly in the middle of a write.


goroutine 364 [running]:
panic(0x1942d20, 0x1f3c4d0)
        C:/Go/src/runtime/panic.go:556 +0x2d9 fp=0xc000d38d18 sp=0xc000d38c88 pc=0x42f579
github.com/docker/docker/vendor/github.com/boltdb/bolt.(*DB).close(0xc000ffda40, 0x1000001a4, 0xdf8475800)
        e:/go/src/github.com/docker/docker/vendor/github.com/boltdb/bolt/db.go:442 +0x275 fp=0xc000d38d90 sp=0xc000d38d18 pc=0xed1c35
github.com/docker/docker/vendor/github.com/boltdb/bolt.Open(0xc000caa150, 0x24, 0xc0000001a4, 0xc000d38e68, 0xc000d38ed0, 0xc000d38e88, 0x42e649)
        e:/go/src/github.com/docker/docker/vendor/github.com/boltdb/bolt/db.go:188 +0x57c fp=0xc000d38e30 sp=0xc000d38d90 pc=0xed0dac
github.com/docker/docker/vendor/github.com/docker/libkv/store/boltdb.(*BoltDB).getDBhandle(0xc0008ec0a0, 0x1d81d50, 0xc0008ec0e4, 0xd43579)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libkv/store/boltdb/boltdb.go:113 +0x9a fp=0xc000d38e98 sp=0xc000d38e30 pc=0x111e95a
github.com/docker/docker/vendor/github.com/docker/libkv/store/boltdb.(*BoltDB).AtomicDelete(0xc0008ec0a0, 0xc001147c70, 0x69, 0xc001196f00, 0x0, 0x0, 0x0)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libkv/store/boltdb/boltdb.go:324 +0xb4 fp=0xc000d38f20 sp=0xc000d38e98 pc=0x111f6d4
github.com/docker/docker/vendor/github.com/docker/libnetwork/datastore.(*datastore).DeleteObjectAtomic(0xc00089a780, 0x1f7c6a0, 0xc000c50600, 0x0, 0x0)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/datastore/datastore.go:623 +0x24a fp=0xc000d38fa0 sp=0xc000d38f20 pc=0xd45f5a
github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/windows.(*driver).storeDelete(0xc00092bbf0, 0x1f7c6a0, 0xc000c50600, 0x24, 0x0)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/windows/windows_store.go:124 +0x73 fp=0xc000d39028 sp=0xc000d38fa0 pc=0x10f7973
github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/windows.(*driver).DeleteEndpoint(0xc00092bbf0, 0xc000a96440, 0x40, 0xc000ac86c0, 0x40, 0x0, 0xc000d39160)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/windows/windows.go:733 +0x21e fp=0xc000d390e8 sp=0xc000d39028 pc=0x10f539e
github.com/docker/docker/vendor/github.com/docker/libnetwork.(*endpoint).deleteEndpoint(0xc000abe580, 0xc000d39200, 0xc000d39268, 0x0)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/endpoint.go:889 +0x192 fp=0xc000d391b8 sp=0xc000d390e8 pc=0x1142ad2
github.com/docker/docker/vendor/github.com/docker/libnetwork.(*endpoint).Delete(0xc000679b80, 0x1f82000, 0x0, 0x0)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/endpoint.go:860 +0x500 fp=0xc000d39308 sp=0xc000d391b8 pc=0x1142680
github.com/docker/docker/vendor/github.com/docker/libnetwork.(*sandbox).delete(0xc0009c8000, 0xc000178600, 0x2f010b0, 0xc000140680)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/sandbox.go:229 +0x18e fp=0xc000d394f0 sp=0xc000d39308 pc=0x1163b2e
github.com/docker/docker/vendor/github.com/docker/libnetwork.(*sandbox).Delete(0xc0009c8000, 0xc000038dc0, 0x40)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/sandbox.go:185 +0x37 fp=0xc000d39520 sp=0xc000d394f0 pc=0x1163977
github.com/docker/docker/daemon.(*Daemon).releaseNetwork(0xc00017c000, 0xc00067ad80)
        e:/go/src/github.com/docker/docker/daemon/container_operations.go:1006 +0x4fa fp=0xc000d396c0 sp=0xc000d39520 pc=0x163346a
github.com/docker/docker/daemon.(*Daemon).Cleanup(0xc00017c000, 0xc00067ad80)
        e:/go/src/github.com/docker/docker/daemon/start.go:217 +0x57 fp=0xc000d39820 sp=0xc000d396c0 pc=0x16759d7
github.com/docker/docker/daemon.(*Daemon).ProcessEvent(0xc00017c000, 0xc000038dc0, 0x40, 0x1ce4cad, 0x4, 0xc000038dc0, 0x40, 0x1ce4d95, 0x4, 0xc000013a00003140, ...)
        e:/go/src/github.com/docker/docker/daemon/monitor.go:87 +0x5bc fp=0xc000d39af0 sp=0xc000d39820 pc=0x165cffc
github.com/docker/docker/libcontainerd.(*client).reapProcess.func1()
        e:/go/src/github.com/docker/docker/libcontainerd/client_local_windows.go:1338 +0x449 fp=0xc000d39f70 sp=0xc000d39af0 pc=0x14568a9
github.com/docker/docker/libcontainerd.(*queue).append.func1(0xc000038d00, 0x0, 0xc000972820, 0xc000d29020, 0xc000144710, 0xc000038dc0, 0x40)
        e:/go/src/github.com/docker/docker/libcontainerd/queue.go:26 +0x3f fp=0xc000d39fa8 sp=0xc000d39f70 pc=0x1456e7f
runtime.goexit()
        C:/Go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc000d39fb0 sp=0xc000d39fa8 pc=0x45e071
created by github.com/docker/docker/libcontainerd.(*queue).append
        e:/go/src/github.com/docker/docker/libcontainerd/queue.go:22 +0x18b


		
goroutine 362 [runnable]:
github.com/docker/docker/vendor/github.com/boltdb/bolt.(*meta).write(0xc000c62d40, 0xc00125d000)
        e:/go/src/github.com/docker/docker/vendor/github.com/boltdb/bolt/db.go:1015 +0x2b6 fp=0xc000c3aca0 sp=0xc000c3ac98 pc=0xed3ba6
github.com/docker/docker/vendor/github.com/boltdb/bolt.(*Tx).writeMeta(0xc000c73340, 0x0, 0x0)
        e:/go/src/github.com/docker/docker/vendor/github.com/boltdb/bolt/tx.go:547 +0x96 fp=0xc000c3ad00 sp=0xc000c3aca0 pc=0xedc156
github.com/docker/docker/vendor/github.com/boltdb/bolt.(*Tx).Commit(0xc000c73340, 0x0, 0x0)
        e:/go/src/github.com/docker/docker/vendor/github.com/boltdb/bolt/tx.go:221 +0x4ae fp=0xc000c3ae40 sp=0xc000c3ad00 pc=0xed9ffe
github.com/docker/docker/vendor/github.com/boltdb/bolt.(*DB).Update(0xc0011e6960, 0xc000c3af08, 0x0, 0x0)
        e:/go/src/github.com/docker/docker/vendor/github.com/boltdb/bolt/db.go:618 +0xef fp=0xc000c3ae90 sp=0xc000c3ae40 pc=0xed25ef
github.com/docker/docker/vendor/github.com/docker/libkv/store/boltdb.(*BoltDB).AtomicPut(0xc000144820, 0xc0011e1960, 0x64, 0xc000a85c60, 0xc, 0x10, 0xc00120cf30, 0x0, 0xc000c3af00, 0x0, ...)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libkv/store/boltdb/boltdb.go:371 +0x233 fp=0xc000c3af70 sp=0xc000c3ae90 pc=0x111fac3
github.com/docker/docker/vendor/github.com/docker/libnetwork/datastore.(*datastore).PutObjectAtomic(0xc000123ec0, 0x1f7c520, 0xc000fd4d80, 0x0, 0x0)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/datastore/datastore.go:415 +0x256 fp=0xc000c3b050 sp=0xc000c3af70 pc=0xd44996
github.com/docker/docker/vendor/github.com/docker/libnetwork.(*controller).updateToStore(0xc000110200, 0x1f7c520, 0xc000fd4d80, 0x1f7c520, 0xc000fd4d80)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/store.go:237 +0x8d fp=0xc000c3b0d8 sp=0xc000c3b050 pc=0x1176a0d
github.com/docker/docker/vendor/github.com/docker/libnetwork.(*endpointCnt).atomicIncDecEpCnt(0xc000fd4d80, 0x1d81d00, 0xc0001df320, 0xc0001df320)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/endpoint_cnt.go:161 +0x1c0 fp=0xc000c3b188 sp=0xc000c3b0d8 pc=0x1145fa0
github.com/docker/docker/vendor/github.com/docker/libnetwork.(*endpointCnt).DecEndpointCnt(0xc000fd4d80, 0xc000fd4d80, 0x0)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/endpoint_cnt.go:181 +0x37 fp=0xc000c3b1b8 sp=0xc000c3b188 pc=0x1146307
github.com/docker/docker/vendor/github.com/docker/libnetwork.(*endpoint).Delete(0xc000678160, 0x1f82000, 0x0, 0x0)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/endpoint.go:866 +0x55c fp=0xc000c3b308 sp=0xc000c3b1b8 pc=0x11426dc
github.com/docker/docker/vendor/github.com/docker/libnetwork.(*sandbox).delete(0xc0009dad80, 0xc000856300, 0x2f010b0, 0xc00063fad0)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/sandbox.go:229 +0x18e fp=0xc000c3b4f0 sp=0xc000c3b308 pc=0x1163b2e
github.com/docker/docker/vendor/github.com/docker/libnetwork.(*sandbox).Delete(0xc0009dad80, 0xc000039c40, 0x40)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/sandbox.go:185 +0x37 fp=0xc000c3b520 sp=0xc000c3b4f0 pc=0x1163977
github.com/docker/docker/daemon.(*Daemon).releaseNetwork(0xc00017c000, 0xc0005bad80)
        e:/go/src/github.com/docker/docker/daemon/container_operations.go:1006 +0x4fa fp=0xc000c3b6c0 sp=0xc000c3b520 pc=0x163346a
github.com/docker/docker/daemon.(*Daemon).Cleanup(0xc00017c000, 0xc0005bad80)
        e:/go/src/github.com/docker/docker/daemon/start.go:217 +0x57 fp=0xc000c3b820 sp=0xc000c3b6c0 pc=0x16759d7
github.com/docker/docker/daemon.(*Daemon).ProcessEvent(0xc00017c000, 0xc000039c40, 0x40, 0x1ce4cad, 0x4, 0xc000039c40, 0x40, 0x1ce4d95, 0x4, 0xc000013a000017ec, ...)
        e:/go/src/github.com/docker/docker/daemon/monitor.go:87 +0x5bc fp=0xc000c3baf0 sp=0xc000c3b820 pc=0x165cffc
github.com/docker/docker/libcontainerd.(*client).reapProcess.func1()
        e:/go/src/github.com/docker/docker/libcontainerd/client_local_windows.go:1338 +0x449 fp=0xc000c3bf70 sp=0xc000c3baf0 pc=0x14568a9
github.com/docker/docker/libcontainerd.(*queue).append.func1(0xc000039c00, 0x0, 0xc000972690, 0xc000075860, 0xc000144710, 0xc000039c40, 0x40)
        e:/go/src/github.com/docker/docker/libcontainerd/queue.go:26 +0x3f fp=0xc000c3bfa8 sp=0xc000c3bf70 pc=0x1456e7f
runtime.goexit()
        C:/Go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc000c3bfb0 sp=0xc000c3bfa8 pc=0x45e071
created by github.com/docker/docker/libcontainerd.(*queue).append
        e:/go/src/github.com/docker/docker/libcontainerd/queue.go:22 +0x18b

@lowenna
Copy link
Member

lowenna commented Sep 10, 2018

Bit more debugging - I can see a pattern where the database is being double unlocked: Edit: Actually, this is OK. It's just part of cleanup which calls a close in the case of failed attempt to lock the database.

2018-09-10 12:04:38.793045 I | JOHNDEBUG-200-locking C:\control/network/files/local-kv.db.lock
2018-09-10 12:04:38.794047 I | JOHNDEBUG-200-unlocking C:\control/network/files/local-kv.db.lock
2018-09-10 12:04:38.794047 I | JOHNDEBUG-200-locking C:\control/network/files/local-kv.db.lock
2018-09-10 12:04:38.795042 I | JOHNDEBUG-200-unlocking C:\control/network/files/local-kv.db.lock

The next reference to goroutine 200 is the one which fails:

2018-09-10 12:04:41.373128 I | JOHNDEBUG-200-unlocking C:\control/network/files/local-kv.db.lock
2018-09-10 12:04:41.373128 I | JJH: unlock failed: C:\control/network/files/local-kv.db.lock
2018-09-10 12:04:41.376128 I | goroutine 200 [running]:

goroutine 200 [running]:
panic(0x1942d20, 0x1f3c590)
        C:/Go/src/runtime/panic.go:556 +0x2d9 fp=0xc000a1b0b0 sp=0xc000a1b020 pc=0x42f579
github.com/docker/docker/vendor/github.com/boltdb/bolt.(*DB).close(0xc000d5dc20, 0x1000001a4, 0xdf8475800)
        e:/go/src/github.com/docker/docker/vendor/github.com/boltdb/bolt/db.go:444 +0x368 fp=0xc000a1b148 sp=0xc000a1b0b0 pc=0xed2108
github.com/docker/docker/vendor/github.com/boltdb/bolt.Open(0xc0007d42d0, 0x24, 0x1a4, 0xc000a1b220, 0x69, 0xc000a1b240, 0x44426e)
        e:/go/src/github.com/docker/docker/vendor/github.com/boltdb/bolt/db.go:188 +0x57c fp=0xc000a1b1e8 sp=0xc000a1b148 pc=0xed118c
github.com/docker/docker/vendor/github.com/docker/libkv/store/boltdb.(*BoltDB).getDBhandle(0xc000d2a4b0, 0x8, 0x8, 0xc000a60a00)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libkv/store/boltdb/boltdb.go:113 +0x9a fp=0xc000a1b250 sp=0xc000a1b1e8 pc=0x111ee2a
github.com/docker/docker/vendor/github.com/docker/libkv/store/boltdb.(*BoltDB).AtomicPut(0xc000d2a4b0, 0xc0008c8700, 0x69, 0xc000d2f0e0, 0x1db, 0x1e0, 0x0, 0x0, 0xc000a60900, 0x0, ...)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libkv/store/boltdb/boltdb.go:366 +0x103 fp=0xc000a1b330 sp=0xc000a1b250 pc=0x111fe63
github.com/docker/docker/vendor/github.com/docker/libnetwork/datastore.(*datastore).PutObjectAtomic(0xc000a3ef40, 0x1f7c760, 0xc000cfe780, 0x0, 0x0)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/datastore/datastore.go:415 +0x256 fp=0xc000a1b410 sp=0xc000a1b330 pc=0xd44996
github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/windows.(*driver).storeUpdate(0xc000a8fa10, 0x1f7c760, 0xc000cfe780, 0x6, 0x0)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/windows/windows_store.go:110 +0x68 fp=0xc000a1b498 sp=0xc000a1b410 pc=0x10f7c18
github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/windows.(*driver).CreateEndpoint(0xc000a8fa10, 0xc000996c80, 0x40, 0xc0006a1180, 0x40, 0x1f6d8e0, 0xc000cf45a0, 0xc0009e0960, 0x2, 0x2)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/drivers/windows/windows.go:706 +0x82c fp=0xc000a1b648 sp=0xc000a1b498 pc=0x10f4c6c
github.com/docker/docker/vendor/github.com/docker/libnetwork.(*network).addEndpoint(0xc00080c700, 0xc000cf6420, 0xc000a1b7d0, 0xc000cf6420)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/network.go:1127 +0x15f fp=0xc000a1b730 sp=0xc000a1b648 pc=0x1154e7f
github.com/docker/docker/vendor/github.com/docker/libnetwork.(*network).createEndpoint(0xc00080c700, 0xc000183191, 0xc, 0xc00097c2a0, 0x4, 0x4, 0x0, 0x0, 0x0, 0x0)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/network.go:1211 +0x599 fp=0xc000a1b810 sp=0xc000a1b730 pc=0x11559c9
github.com/docker/docker/vendor/github.com/docker/libnetwork.(*network).CreateEndpoint(0xc00080d880, 0xc000183191, 0xc, 0xc00097c2a0, 0x4, 0x4, 0x0, 0x0, 0x0, 0x0)
        e:/go/src/github.com/docker/docker/vendor/github.com/docker/libnetwork/network.go:1153 +0x193 fp=0xc000a1b8c0 sp=0xc000a1b810 pc=0x11551c3
github.com/docker/docker/daemon.(*Daemon).connectToNetwork(0xc000828000, 0xc0007fb680, 0x1ce42fc, 0x3, 0xc000c86c00, 0xc0000c0800, 0x0, 0x0)
        e:/go/src/github.com/docker/docker/daemon/container_operations.go:741 +0x35a fp=0xc000a1ba30 sp=0xc000a1b8c0 pc=0x163110a
github.com/docker/docker/daemon.(*Daemon).allocateNetwork(0xc000828000, 0xc0007fb680, 0x0, 0x0)
        e:/go/src/github.com/docker/docker/daemon/container_operations.go:532 +0xa04 fp=0xc000a1bcd0 sp=0xc000a1ba30 pc=0x16305e4
github.com/docker/docker/daemon.(*Daemon).initializeNetworking(0xc000828000, 0xc0007fb680, 0x0, 0x0)
        e:/go/src/github.com/docker/docker/daemon/container_operations.go:945 +0x1bf fp=0xc000a1bd28 sp=0xc000a1bcd0 pc=0x163311f
github.com/docker/docker/daemon.(*Daemon).containerStart(0xc000828000, 0xc0007fb680, 0x0, 0x0, 0x0, 0x0, 0x3039383736353401, 0x0, 0x0)
        e:/go/src/github.com/docker/docker/daemon/start.go:150 +0x2d1 fp=0xc000a1be50 sp=0xc000a1bd28 pc=0x1675471
github.com/docker/docker/daemon.(*Daemon).restore.func2(0xc000659e10, 0xc000828000, 0xc000756cc0, 0xc0007fb680, 0xc00007a900)
        e:/go/src/github.com/docker/docker/daemon/daemon.go:399 +0x2b4 fp=0xc000a1bfb8 sp=0xc000a1be50 pc=0x167fc74
runtime.goexit()
        C:/Go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc000a1bfc0 sp=0xc000a1bfb8 pc=0x45e071
created by github.com/docker/docker/daemon.(*Daemon).restore
        e:/go/src/github.com/docker/docker/daemon/daemon.go:379 +0x9cb

@lowenna
Copy link
Member

lowenna commented Sep 10, 2018

I understand the access denied error now, if not the "process cannot access the file because...." error. It's in github.com/boltdb/bolt/bolt_windows.go, function funlock().

Looks like there's a race between closing the lockfile and deleting it. If another goroutine comes in between these two steps, the openfile in flock() could fail with Access denied - exactly what we're seeing. @dineshgovindasamy FYI.

@lowenna
Copy link
Member

lowenna commented Sep 10, 2018

Note above is for the Access is denied. After making a local fix, so far (not extensively tested), I haven't hit that again. Once I've validated, I'll submit the fix upstream to boltdb. However, I can now repro the original error which is The process cannot access the file because it is being used by another process once the first fix is applied. That one needs more investigation still.

@lowenna
Copy link
Member

lowenna commented Sep 10, 2018

I'm really confused about the "cannot access the file". Questioning if defender is the culprit? Temporarily adding an exclusion to the folder containing network/files/local-kv.db.lock to see if still repros with that.

Nope - not that. Still repros.

@lowenna
Copy link
Member

lowenna commented Sep 10, 2018

Think I have it. By removing the lock file entirely, and locking the database instead using -1..0 byte range as the lock. So far so good in validation.

@lowenna
Copy link
Member

lowenna commented Sep 11, 2018

Opened an issue in etcd-io/bbolt#121 for upstream.

@lowenna
Copy link
Member

lowenna commented Sep 11, 2018

Fix submitted at etcd-io/bbolt#122

lowenna pushed a commit to microsoft/docker that referenced this issue Sep 13, 2018
Signed-off-by: John Howard <jhoward@microsoft.com>

This also adds go.etcd.io/bbolt as boltdb/bolt is no longer
maintained, and we need etcd-io/bbolt#122 which
was merged in https://github.com/etcd-io/bbolt/releases/tag/v1.3.1-etcd.8
in order to fix moby/libnetwork#1950.

Note that I can't entirely remove boltdb/bolt as it is still used by
other components. Still need to work my way through them.... These include
containerd/containerd (containerd/containerd#2634),
docker/swarmkit; moby/buildkit. And probably more....
thaJeztah pushed a commit to thaJeztah/docker that referenced this issue Sep 14, 2018
Signed-off-by: John Howard <jhoward@microsoft.com>

This also adds go.etcd.io/bbolt as boltdb/bolt is no longer
maintained, and we need etcd-io/bbolt#122 which
was merged in https://github.com/etcd-io/bbolt/releases/tag/v1.3.1-etcd.8
in order to fix moby/libnetwork#1950.

Note that I can't entirely remove boltdb/bolt as it is still used by
other components. Still need to work my way through them.... These include
containerd/containerd (containerd/containerd#2634),
docker/swarmkit; moby/buildkit. And probably more....

(cherry picked from commit 1a6e260)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
docker-jenkins pushed a commit to docker-archive/docker-ce that referenced this issue Sep 14, 2018
Signed-off-by: John Howard <jhoward@microsoft.com>

This also adds go.etcd.io/bbolt as boltdb/bolt is no longer
maintained, and we need etcd-io/bbolt#122 which
was merged in https://github.com/etcd-io/bbolt/releases/tag/v1.3.1-etcd.8
in order to fix moby/libnetwork#1950.

Note that I can't entirely remove boltdb/bolt as it is still used by
other components. Still need to work my way through them.... These include
containerd/containerd (containerd/containerd#2634),
docker/swarmkit; moby/buildkit. And probably more....
Upstream-commit: 1a6e2609ead86144b75067bfe5154dad5e42d5cf
Component: engine
dims pushed a commit to dims/sys that referenced this issue Oct 29, 2020
Signed-off-by: John Howard <jhoward@microsoft.com>

This also adds go.etcd.io/bbolt as boltdb/bolt is no longer
maintained, and we need etcd-io/bbolt#122 which
was merged in https://github.com/etcd-io/bbolt/releases/tag/v1.3.1-etcd.8
in order to fix moby/libnetwork#1950.

Note that I can't entirely remove boltdb/bolt as it is still used by
other components. Still need to work my way through them.... These include
containerd/containerd (containerd/containerd#2634),
docker/swarmkit; moby/buildkit. And probably more....
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants