Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error creating overlay mount to /var/lib/docker/overlay2/<...>/merged: device or resource busy #711

Closed
2 of 3 tasks
llebout opened this issue Jun 30, 2019 · 65 comments
Closed
2 of 3 tasks

Comments

@llebout
Copy link

llebout commented Jun 30, 2019

  • This is a bug report
  • This is a feature request
  • I searched existing issues before opening this one

Expected behavior

No errors

Actual behavior

error creating overlay mount to /var/lib/docker/overlay2/af8c5d19cde2039cf3b4c3b340b960bac2b2f0504b0b291f6c63c2d5175ea3ba/merged: device or resource busy

dmesg:
[55383.848708] overlayfs: lowerdir is in-use as upperdir/workdir

( https://gitlab.com/freedesktop-sdk/infrastructure/freedesktop-sdk-docker-images/-/jobs/242656806 )

Steps to reproduce the behavior

Install Fedora 30 and update the kernel to 5.1.15-300.fc30 (latest at time of creation of this issue)

Then as root:

git clone https://gitlab.com/freedesktop-sdk/infrastructure/freedesktop-sdk-docker-images.git
cd freedesktop-sdk-docker-images
docker build .
# Then docker push somewhere...

Note: kernels that do not have this commit do not suffer from the issue. It looks like this commit added more sanity checks around usage of overlayfs and docker is using overlayfs wrong. I could work around the issue by using an older kernel.

Output of docker version:

Client:
 Version:         1.13.1
 API version:     1.26
 Package version: docker-1.13.1-67.git1185cfd.fc30.ppc64le
 Go version:      go1.12.2
 Git commit:      1185cfd/1.13.1
 Built:           Mon Apr 22 17:47:08 2019
 OS/Arch:         linux/ppc64le

Server:
 Version:         1.13.1
 API version:     1.26 (minimum version 1.12)
 Package version: docker-1.13.1-67.git1185cfd.fc30.ppc64le
 Go version:      go1.12.2
 Git commit:      1185cfd/1.13.1
 Built:           Mon Apr 22 17:47:08 2019
 OS/Arch:         linux/ppc64le
 Experimental:    false

Output of docker info:

Containers: 5
 Running: 1
 Paused: 0
 Stopped: 4
Images: 28
Server Version: 1.13.1
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: journald
Cgroup Driver: systemd
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
 Authorization: rhel-push-plugin
Swarm: inactive
Runtimes: oci runc
Default Runtime: oci
Init Binary: /usr/libexec/docker/docker-init-current
containerd version:  (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1)
runc version: N/A (expected: 9df8b306d01f59d3a8029be411de015b7304dd8f)
init version: N/A (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  WARNING: You're not using the default seccomp profile
  Profile: /etc/docker/seccomp.json
 selinux
Kernel Version: 5.1.15-300.fc30.ppc64le
Operating System: Fedora 30 (Server Edition)
OSType: linux
Architecture: ppc64le
Number of Docker Hooks: 2
CPUs: 32
Total Memory: 31.93 GiB
Name: localhost.localdomain
ID: UU6X:5ECG:AJ3L:NBNZ:LUXT:XTMT:X4FK:NX7S:DLZN:EHEI:AEAX:FZJC
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: true
Registries: docker.io (secure), registry.fedoraproject.org (secure), quay.io (secure), registry.access.redhat.com (secure), registry.centos.org (secure), docker.io (secure)

Additional environment details (AWS, VirtualBox, physical, etc.)
QEMU/KVM

@dougvj
Copy link

dougvj commented Jul 2, 2019

Simple patch against linux 5.2-rc7 to disable the check for a temporary workaround
https://gist.github.com/dougvj/e866760bb41b1d43aab79130ba1be1b3

@amrhassan
Copy link

I'm getting around this by temporarily switching the storage-driver to overlay rather than the default overlay2 (see docs here).

@CruCo
Copy link

CruCo commented Aug 1, 2019

I'm having the same issue but during push of an image to a private registry.
As @amrhassan suggests, temporarily changing the storage-driver to overlay works. But overlay is deprecated and will be removed so this issue should be fixed.

Running on Arch Linux
~/ uname -r 5.2.5-arch1-1-ARCH
Output of sudo docker version

Client:
 Version:           19.03.1-ce
 API version:       1.40
 Go version:        go1.12.7
 Git commit:        74b1e89e8a
 Built:             Sat Jul 27 21:08:50 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.1-ce
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.7
  Git commit:       74b1e89e8a
  Built:            Sat Jul 27 21:08:28 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.2.7.m
  GitCommit:        85f6aa58b8a3170aec9824568f7a31832878b603.m
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info

Client:
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 1
 Server Version: 19.03.1-ce
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 85f6aa58b8a3170aec9824568f7a31832878b603.m
 runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.2.5-arch1-1-ARCH
 Operating System: Arch Linux
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 15.42GiB
 Name: nelo
 ID: A4RW:J5TK:QQC2:4T5M:JZXR:QBU6:B3VU:P7IJ:JTSK:DSFW:KDYN:M7WT
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

@futile
Copy link

futile commented Aug 1, 2019

I experienced this issue when pushing to a private registry as well, and am working around it.

@markuman
Copy link

markuman commented Aug 1, 2019

We hit the same error when pushing to a registry using Amazon Linux 2 using with kernel-ng (sudo amazon-linux-extras install kernel-ng)

Linux sharedrunner 4.19.58-21.57.amzn2.x86_64 #1 SMP Thu Jul 11 07:59:25 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

@xtrasimplicity
Copy link

I'm using a private registry (built into GitLab) and have the same issue. The work-arounds listed above work for me. :)

sudo systemctl stop docker
sudo dockerd -s overlay
# Rebuild your image, and push.

@markuman
Copy link

markuman commented Aug 2, 2019

hm, but this fix will not survive a reboot.
So the systemd unitfile must be patched

diff -ru home/ec2-user/docker.service /usr/lib/systemd/system/docker.service
--- home/ec2-user/docker.service	2019-08-02 08:44:04.910157580 +0200
+++ /usr/lib/systemd/system/docker.service	2019-08-02 08:44:16.698134581 +0200
@@ -14,7 +14,7 @@
 # the default is not to use systemd for cgroups because the delegate issues still
 # exists and systemd currently does not support the cgroup feature set required
 # for containers run by docker
-ExecStart=/usr/bin/dockerd $OPTIONS $DOCKER_STORAGE_OPTIONS $DOCKER_ADD_RUNTIMES
+ExecStart=/usr/bin/dockerd -s overlay $OPTIONS $DOCKER_STORAGE_OPTIONS $DOCKER_ADD_RUNTIMES
 ExecReload=/bin/kill -s HUP $MAINPID
 # Having non-zero Limit*s causes performance problems due to accounting overhead
 # in the kernel. We recommend using cgroups to do container-local accounting.

sudo systemctl daemon-reload && sudo systemctrl restart docker

I will observe it to see if it fixes the issue here as well.

But I think overlay2 is what most people want and this patch disable the usage of overlay2.

@nanonyme
Copy link

nanonyme commented Aug 2, 2019

Can't that switch be passed through $OPTIONS? Where is it defined?

@markuman
Copy link

markuman commented Aug 2, 2019

Sure. I guess it's /etc/docker/daemon.json
See https://docs.docker.com/v17.09/engine/admin/systemd/

@maetthu
Copy link

maetthu commented Aug 2, 2019

Sure. I guess it's /etc/docker/daemon.json

It is. This is what I'm using as a workaround:

{
  "storage-driver": "overlay",
  "features": {"buildkit": true}
}

@csullivannet
Copy link

I'm able to work around by setting only storage-driver in /etc/docker/daemon.json:

{
  "storage-driver": "overlay"
}

As a word of warning, changing the storage driver will make all of your images/containers (volumes don't seem to be affected) dissappear. They're still there, but you have to switch back to overlay2.

@llebout
Copy link
Author

llebout commented Aug 4, 2019

Let me note that the issue always happened while pushing images and never while building. I was mistaken with my test case in the first post. Updated.

@julgonmej
Copy link

With buildkit enabled it happens when building larger images (> 1GB?) too.

@choopm
Copy link

choopm commented Aug 5, 2019

I've experienced this aswell when using GitLab-CI with docker-in-docker configuration,
using --storage-driver=overlay as suggested by #711 (comment) helps, thanks!

5.1.15-arch1-1-ARCH
Docker version 18.09.6-ce, build 481bc77156
dockerfiles:
  stage: pre-build
  script:
    - docker login -u "gitlab-ci-token" -p ${CI_JOB_TOKEN} ${CI_REGISTRY}
    - docker pull $HEAD_IMAGE || docker pull $LATEST_IMAGE || true
    - docker build --tag $HEAD_IMAGE dockerfiles/
    - docker push $HEAD_IMAGE
docker push $HEAD_IMAGE
The push refers to repository [registry.xxxx/build]
fa2aa1697b3f: Preparing
efb57bb8d2cd: Pushing
..
fa2aa1697b3f: Retrying in 20 seconds
efb57bb8d2cd: Retrying in 20 seconds
..
efb57bb8d2cd: Pushing
error creating overlay mount to /var/lib/docker/overlay2/63db9c807e3f1db9cb8123f17874493d125d348d344d5870fdfb35b462adc6e3/merged: device or resource busy

@hugendudel
Copy link

hugendudel commented Aug 12, 2019

Downgrading docker version from

docker-1:19.03.1

to

docker-1:18.09.8

solved the problem in my case.

@breimers
Copy link

breimers commented Aug 13, 2019

I am getting this error with Docker version 18.09.7, build 2d0083d but it seems that I can get a successful build/push after 3-5 attempts.

It seems to successfully push 1-2 layers per attempt so YMMV.

@EtienneBruines
Copy link

EtienneBruines commented Aug 14, 2019

To continue on the comment by @breimers:

Because it successfully pushes 1-2 layers per attempt, we -- as a workaround -- improved stability by setting:

--max-concurrent-uploads=1

on the docker daemon on the pushing side. If you have multiple clients this might be less-than-ideal, but when only the CI/CD pipeline pushes to docker, this might be a viable workaround. (Yes, it is slower this way.)

This way, no errors are thrown at all, and it won't have to wait in-between retries.

@TEFEdotCC
Copy link

Same problem here. But only on one of two machine. Both machines are running arch linux with latest patches and use an m.2 ssd. Two main difference do had the pc's. PC 1 do not has an monitor connected. The assembled processors are

# uname -r
5.2.5-arch1-1-ARCH

PC 1: Intel(R) Celeron(R) CPU G3930 @ 2.90GHz
PC 2: AMD Ryzen 5 2600X Six-Core Processor

On PC 1 The bug occurs heavly.
On PC 2 The bug does not occurs.

@mdgomes
Copy link

mdgomes commented Aug 19, 2019

Alternatively, and until this issue is fixed, one can set the maximum number of parallel uploads to 1 to bypass the issue without changing to overlay.

Edit /etc/docker/daemon.json and set the max-concurrent-uploads variable.

{
    "max-concurrent-uploads": 1
}

@skrat
Copy link

skrat commented Aug 19, 2019

Having this issues when pushing. Docker version 18.09.7-ce, build 2d0083d657 @ Arch Linux

@hellojukay
Copy link

me too , Manjaro docker version:19.03 .

i fix it

sudo systemctl stop docker
sudo rm -rf /var/lib/docker
sudo vim /etc/docker/daemon.json
{
  "storage-driver": "overlay"
}
sudo systemctl start docker

it works.

@lucky-suman
Copy link

I am using docker version 1.13.1 and was facing the same issue.
So I restart the docker and then pushed the image, and it worked fine for me.

@esauvisky
Copy link

esauvisky commented Aug 27, 2019

Solution proposed by @mdgomes is definitely the least intrusive one. No need to switch from overlayfs2, no need to rebuild images nor to delete or hide all your images and containers. It's much slower but makes for the best tradeoff in my opinion for now.

$ sudo systemctl stop docker
$ sudo nano /etc/docker/daemon.json
{
  "max-concurrent-uploads": 1
}
$ sudo systemctl start docker
$ docker push [...]

@chineerulz
Copy link

chineerulz commented Sep 6, 2019

while changing the storage driver to overlay fixes the issue, is this a good practice? especially as overlay cause excessive inode consumption (especially as the number of images grows)

@botzill
Copy link

botzill commented Oct 25, 2019

Hi.

Having the same issue, any updates?

@nightah
Copy link

nightah commented Oct 28, 2019

After upgrading the only time the issues has appeared for me was initially if a prune was not run because of cached layers, per the comment in #711 (comment).

@hrw
Copy link

hrw commented Nov 1, 2019

Debian 'buster' with default distro kernel + docker ce 19.03.4 fails:

INFO:kolla.common.utils.nova-base:Building started at 2019-11-01 09:51:16.368898
INFO:kolla.common.utils.nova-base:Directory /tmp/tmpfwvSiv/kolla-2019-11-01_09-46-01_kYr2TM/docker/nova/nova-base/additions already exist. Skipping.
INFO:kolla.common.utils.nova-base:Step 1/11 : FROM 127.0.0.1:4000/lokolla/debian-source-openstack-base:change_692520
INFO:kolla.common.utils.nova-base: ---> c3fdcca97957
INFO:kolla.common.utils.nova-base:Step 2/11 : LABEL maintainer="Kolla Project (https://launchpad.net/kolla)" name="nova-base" build-date="20191101"
ERROR:kolla.common.utils.nova-base:Error'd with the following message
ERROR:kolla.common.utils.nova-base:error creating overlay mount to /var/lib/docker/overlay2/9557e325e12034da6c072854e68b6c0c9cfc4fd33e5711305d5a007f06c989f8-init/merged: device or resource busy

It makes CI runs on Debian unusable ;(

@IngCr3at1on
Copy link

Yeah, this doesn't seem fully resolved from my perspective; I've got a k8s cluster here that was built with Docker version 19.03.4, build 9013bf583a (never had a previous version installed at all) presenting this error: on ubuntu 18.04 with Linux 4.15.0-66-generic.

Due to the fact that the docker version was always 19.03.4 running a docker system prune -af has no effect.

@AbhinayGupta741
Copy link

$ vim /etc/docker/daemon.json
{
"max-concurrent-uploads": 1
}
$ service docker restart

Posting out this if this helps to someone. I had the same issue with the docker. It was not able to push images to GCR. The above will able to solved my issue.

@hrw
Copy link

hrw commented Nov 14, 2019

@AbhinayGupta741 nice to know. But the problem exists during building as well.

@yoctozepto
Copy link

hrw, I think Abhinay's comment might as well help us because we are building and pushing at the same time. We need to delay pushing too.

@IngCr3at1on
Copy link

$ vim /etc/docker/daemon.json
{
"max-concurrent-uploads": 1
}
$ service docker restart

This was mentioned repeatedly throughout this thread; not sure why there's a need to add it again... This is not a solution but rather a cheap work-around.

Question for the maintainers:
How can you say this is fixed if people are still reporting this issue after a system prune?

@yoctozepto
Copy link

So, in the end, we did not need max-concurrent-uploads set to 1, but we needed to stop building and pushing at the same time:
kolla: https://review.opendev.org/694243

openstack-gerrit pushed a commit to openstack/kolla-ansible that referenced this issue Nov 25, 2019
This fixes Debian job failures during image building.

See docker/for-linux#711
for upstream details.

Change-Id: Icf3ffb261605ffe5d8f2618c2ed4cb97db97dd49
openstack-gerrit pushed a commit to openstack/openstack that referenced this issue Nov 25, 2019
* Update kolla-ansible from branch 'master'
  - Merge "CI/Debian: Push images after building"
  - CI/Debian: Push images after building
    
    This fixes Debian job failures during image building.
    
    See docker/for-linux#711
    for upstream details.
    
    Change-Id: Icf3ffb261605ffe5d8f2618c2ed4cb97db97dd49
openstack-gerrit pushed a commit to openstack/kolla-ansible that referenced this issue Nov 28, 2019
This fixes Debian job failures during image building.

See docker/for-linux#711
for upstream details.

Change-Id: Icf3ffb261605ffe5d8f2618c2ed4cb97db97dd49
(cherry picked from commit 6ab144a)
@daliborfilus
Copy link

Guys, this is still happening...

error creating overlay mount to /var/lib/docker/overlay2/e3003381a4b9d9ef5f527fb36eeb14b98709bedc5a5535f11c42011fae171cd2/merged: device or resource busy

$ docker -v
Docker version 19.03.5, build 633a0ea838
$ uname -r
4.19.0-6-amd64

@petr-fischer
Copy link

Oh yes, we have the same problem like @noice - when our Gitlab CI is building docker images, this problem occurs on daily basis. Using deprecated overlay instead of overlay2 is the official solution? Really? Thanks!

@choopm
Copy link

choopm commented Jan 23, 2020

@noice @petr-fischer
Try #711 (comment) and report back if it's still happening.

@daliborfilus
Copy link

I already tried to do docker system prune (without --all --force). But I'll try it again with them.

(Btw somewhere around these issue was mentioned that this should affect only linux >= 5.x, but I have this problem on 4.19.0. Don't know if this is the same core issue.)

@petr-fischer
Copy link

petr-fischer commented Jan 23, 2020

@choopm Unfortunately, still happening. Especially if parallel build of several (but different!) docker images is running (in the GitLab CI). This error occurs randomly in one container from 12. If I manually restart the build job with the overlay2 error, it's OK then.

@daliborfilus
Copy link

daliborfilus commented Jan 24, 2020

@choopm Can confirm - happened to me too again yesterday and now again just before few seconds.

There were running around 3 docker build commands at one time and this is the result:

57  ---> a4d65b2fc73e
58 Step 9/19 : RUN ["crystal", "spec", "--error-on-warnings", "--no-debug"]
59 error creating overlay mount to /var/lib/docker/overlay2/209a7dfb23bbf26404576987d2b3e1e847f3c6afcc328b201afe75f6d8de2639/merged: device or resource busy
63 ERROR: Job failed: exit code 1
# docker -v
Docker version 19.03.5, build 633a0ea838
# uname -r
4.19.0-6-amd64
# lsb_release -a
Distributor ID:	Debian
Description:	Debian GNU/Linux 10 (buster)
Release:	10
Codename:	buster

Can we reopen this issue please? Or should we create new one? This issue clearly isn't resolved yet...

@choopm
Copy link

choopm commented Jan 24, 2020

Our systems run stable since when we use --max-concurrent-uploads=1 for our docker:18.09-dind CI setup. However, after poking around with our buildsystem today I was able to reproduce it by using DOCKER_BUILDKIT=1 and parallel builds.

I suggest opening a new issue, describing your setup in depth and an example for devs to reproduce. I'm not affiliated in any kind with moby, just trying to help and keeping the noise down on this issue.

@omert08
Copy link

omert08 commented May 29, 2020

Hey guys, I have faced this problem when I use an external SD Card for my docker application (It is running on arm64v8).
How did I solve it?
I have updated my docker version to 19.03.9 , formatted the SD Card from ntfs to ext4, edited /etc/docker/daemon.json like below.

{
    "storage-driver":"overlay2"
}

It works now !

@saixin
Copy link

saixin commented Apr 23, 2021

After reboot system, I face this issue. I set centos Selinux(/etc/selinux/config) from disabled to permissive. and reboot again . It works!

@Kattyi
Copy link

Kattyi commented May 12, 2021

Hi guys, this is still happening:

docker -v
Docker version 20.10.6, build 370c289`
uname -r
5.4.0-1048-aws
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.5 LTS
Release:        18.04
Codename:       bionic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests