-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue with persisting driver endpoint to store #1374
Comments
Also being affected by this on docker 1.12.1. It appears from the time a host is provisioned until the time we start seeing errors is very small. @coolljt0725 interested in the stress app/script. Is it open source? |
@coolljt0725 I was not able to reproduce the problem. I believe it has to do with the disk driver speed (I am running on a pretty empty SSD), its overload and maybe disk space and fragmentation. Somebody has infact reported that the stale data issue is easily reproducible on old spinning hard drivers. At this moment we have not heard enough of such complains and given the issue seems related to slow disks, it does not seem enough of a strong point for a pervasive code rework. Also given the fact there will likely be issues with the other approach as well, few of which were already raised in #1135 comments, in my opinion we should hold on on this and revisit later. |
We're running into this issue a lot. We have an image with a Our workaround for now is to not make that directory a volume, however we incur a performance penalty at runtime due to that. Is there any other workaround to this issue or any progress to for fixing it? |
@aboch |
@coolljt0725 Given libnetwork does not create the bolt interface with persistent connection option, it can't currently set the transient timeout. I think it is fine to just change the default in libkv project as you are doing with your PR. |
@coolljt0725 Then we can control it via libnetwork in https://github.com/docker/libnetwork/blob/release/v0.8/datastore/datastore.go#L136. |
@aboch |
That's a good idea. 👍 |
Closed by #1546 |
To support container live restore, we persist driver endpoint to store which is a good way
for each network driver. But persisting endpoint to store cause a performance issue. It will take
take more time to run a container and the situation is worse in parallel. Here is some test results using
https://github.com/crosbymichael/docker-stress.
The stress.json is
docker 1.11.2 with live restore(we backport the liver restore patch)
If we increase the concurrent workers of
stress
, there will be a lot oftimeout
error, seedocker 1.11.2 without live restore
There is significant performance decrease with live restore.
And also with persisting driver endpoint to store, there are also some consistent issues, so I suggest we can reconsider the
re-construct endpoint
approach, if it can reconstruct, we avoid persisting. I still think the less persisting to store the better.I think
@aboch @mavenugo WDYT?
The text was updated successfully, but these errors were encountered: