Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs driver: dataset is busy causes orphaned layers #2005

Open
JakeCooper opened this issue Jul 6, 2024 · 4 comments
Open

zfs driver: dataset is busy causes orphaned layers #2005

JakeCooper opened this issue Jul 6, 2024 · 4 comments
Labels

Comments

@JakeCooper
Copy link

Issue Description

While running with the ZFS driver, maybe one out of every 50 containers ends up with the following error

cleaning up storage: removing container 8941b7366dfbe5deb66554f86be2f1c931e510e1d770ed00bdd49c56aaa46c18 root filesystem: 1 error occurred:
	* deleting layer "7d18f701724d11385711e8e452835438c4c6f8b23c4b91ace208ef51e88005bb": exit status 1: "/usr/sbin/zfs destroy -r podman/7d18f701724d11385711e8e452835438c4c6f8b23c4b91ace208ef51e88005bb" => cannot destroy 'podman/7d18f701724d11385711e8e452835438c4c6f8b23c4b91ace208ef51e88005bb': dataset is busy

Steps to reproduce the issue

Steps to reproduce the issue

  1. Use the ZFS native driver
  2. Deploy maybe 50-100 different containers
  3. You'll get this error

Describe the results you received

Error above

Describe the results you expected

No error

podman info output

host:
  arch: amd64
  buildahVersion: 1.36.0
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: Unknown
    path: /usr/local/bin/conmon
    version: 'conmon version 2.1.12, commit: e8896631295ccb0bfdda4284f1751be19b483264-dirty'
  cpuUtilization:
    idlePercent: 66.22
    systemPercent: 11.5
    userPercent: 22.28
  cpus: 32
  databaseBackend: sqlite
  distribution:
    codename: bookworm
    distribution: debian
    version: "12"
  eventLogger: journald
  freeLocks: 65277
  hostname: production-stacker-178
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.1.0-13-cloud-amd64
  linkmode: dynamic
  logDriver: journald
  memFree: 24015925248
  memTotal: 270471868416
  networkBackend: cni
  networkBackendInfo:
    backend: cni
    dns: {}
  ociRuntime:
    name: crun
    package: Unknown
    path: /usr/local/bin/crun
    version: |-
      crun version 1.15
      commit: e6eacaf4034e84185fd8780ac9262bbf57082278
      rundir: /run/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: true
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 0
  swapTotal: 0
  uptime: 5217h 56m 41.00s (Approximately 217.38 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries: {}
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 249
    paused: 0
    running: 209
    stopped: 40
  graphDriverName: zfs
  graphOptions: {}
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 354307145728
  graphRootUsed: 7023755264
  graphStatus:
    Compression: "off"
    Parent Dataset: podman
    Parent Quota: "no"
    Space Available: "347290451968"
    Space Used By Parent: "688870408192"
    Zpool: podman
    Zpool Health: ONLINE
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 603
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 5.1.1
  Built: 1717640166
  BuiltTime: Thu Jun  6 02:16:06 2024
  GitCommit: ""
  GoVersion: go1.21.11
  Os: linux
  OsArch: linux/amd64
  Version: 5.1.1

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

Yes

Additional environment details

Additional environment details

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

@JakeCooper JakeCooper changed the title zfs driver: dataset is busy zfs driver: dataset is busy causes orphaned layers Jul 6, 2024
@Luap99 Luap99 transferred this issue from containers/podman Jul 8, 2024
@Luap99
Copy link
Member

Luap99 commented Jul 8, 2024

We recommend using overlayfs in general. I am not sure if there is anyone actively working on the zfs driver currently.

@JakeCooper
Copy link
Author

Does that mean there's no way to limit ephemeral storage with Podman in production?

@cgwalters
Copy link
Contributor

Does that mean there's no way to limit ephemeral storage with Podman in production?

One thing you can do is run your containers with --read-only and only bind mount in external host volumes for any persistence that are limited.

But, another path AFAIK is XFS+quotas, see e.g. containers/podman#21193 at least.

@JakeCooper
Copy link
Author

Seems only overlayfs is maintained as mentioned by chief contributor #2004 (comment)

So, I doubt XFS+quotas will be any better (not to mention the inherent downsides of XFS)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants