Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpoint of a new restored container fails #2379

Closed
obsidian0215 opened this issue Apr 2, 2024 · 4 comments
Closed

Checkpoint of a new restored container fails #2379

obsidian0215 opened this issue Apr 2, 2024 · 4 comments

Comments

@obsidian0215
Copy link

Description

Podman/CRIU fails to checkpoint a container restored using --import and --name. (similar to containers/podman#13672)
How can I checkpoint the new container?

Steps to reproduce the issue:

  1. Create container
    podman run -d --name looper busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
  2. Checkpoint container with --export
    podman container checkpoint --export ch1.tar.gz looper
  3. Restore container checkpoint with --import and --name
    podman container restore --import ch1.tar.gz --name looper2
  4. Checkpoint the new container
    podman container checkpoint looper2 --export ch2.tar.gz

Describe the results you received:

ERRO[0000] container is not destroyed                   
ERRO[0000] criu failed: type NOTIFY errno 0
log file: /var/lib/containers/storage/overlay-containers/868a180ab534c95b938e6cdc481f0df0ee6032ca47399deb5d458fea2628407d/userdata/dump.log 
Error: `/usr/bin/runc checkpoint --image-path /var/lib/containers/storage/overlay-containers/868a180ab534c95b938e6cdc481f0df0ee6032ca47399deb5d458fea2628407d/userdata/checkpoint --work-path /var/lib/containers/storage/overlay-containers/868a180ab534c95b938e6cdc481f0df0ee6032ca47399deb5d458fea2628407d/userdata 868a180ab534c95b938e6cdc481f0df0ee6032ca47399deb5d458fea2628407d` failed: exit status 1

Describe the results you expected:
It's expected that the looper2 container creates a new checkpoint in ch2.tar.gz.

Additional information you deem important (e.g. issue happens only occasionally):
output of podman version:

Client:       Podman Engine
Version:      4.3.1
API Version:  4.3.1
Go Version:   go1.19.8
Built:        Thu Jan 1 08:00:00 1970
OS/Arch:      linux/amd64

output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.28.2
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon_2.1.6+ds1-1_amd64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.6, commit: unknown'
  cpuUtilization:
    idlePercent: 18.33
    systemPercent: 9.95
    userPercent: 71.73
  cpus: 8
  distribution:
    codename: bookworm
    distribution: debian
    version: "12"
  eventLogger: journald
  hostname: debian-obsidian
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.1.0-18-amd64
  linkmode: dynamic
  logDriver: journald
  memFree: 433684480
  memTotal: 8290725888
  networkBackend: netavark
  ociRuntime:
    name: runc
    package: Unknown
    path: /usr/sbin/runc
    version: |-
      runc version 1.1.12
      commit: v1.1.12-0-g51d5e946
      spec: 1.0.2-dev
      go: go1.20.13
      libseccomp: 2.5.4
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: true
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns_1.2.0-1_amd64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.4
  swapFree: 1005469696
  swapTotal: 1022357504
  uptime: 1h 27m 32.00s (Approximately 0.04 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries: {}
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 0
    stopped: 2
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 40947412992
  graphRootUsed: 14612869120
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.3.1
  Built: 0
  BuiltTime: Thu Jan  1 08:00:00 1970
  GitCommit: ""
  GoVersion: go1.19.8
  Os: linux
  OsArch: linux/amd64
  Version: 4.3.1

output of uname -a:

Linux debian-obsidian 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux

CRIU logs and information:

CRIU full dump/restore logs:

(00.000000) Unable to get $HOME directory, local configuration file will not be used.
(00.000136) Version: 3.17.1 (gitid 0)
(00.000154) Running on debian-obsidian Linux 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64
(00.000160) Would overwrite RPC settings with values from /etc/criu/runc.conf
(00.000208) Loaded kdat cache from /run/criu.kdat
(00.000281) Hugetlb size 2 Mb is supported but cannot get dev's number
(00.000329) Hugetlb size 1024 Mb is supported but cannot get dev's number
(00.000456) ========================================
(00.000468) Dumping processes (pid: 42253)
(00.000473) ========================================
(00.000499) rlimit: RLIMIT_NOFILE unlimited for self
(00.000523) Running pre-dump scripts
(00.000529) 	RPC
(00.001023) irmap: Searching irmap cache in work dir
(00.001263) No irmap-cache image
(00.001277) irmap: Searching irmap cache in parent
(00.001296) No parent images directory provided
(00.001303) irmap: No irmap cache
(00.001352) cpu: x86_family 6 x86_vendor_id GenuineIntel x86_model_id 13th Gen Intel(R) Core(TM) i7-1370P
(00.001369) cpu: fpu: xfeatures_mask 0x205 xsave_size 2696 xsave_size_max 2696 xsaves_size 840
(00.001401) cpu: fpu: x87 floating point registers     xstate_offsets      0 / 0      xstate_sizes    160 / 160   
(00.001409) cpu: fpu: AVX registers                    xstate_offsets    576 / 576    xstate_sizes    256 / 256   
(00.001415) cpu: fpu: Protection Keys User registers   xstate_offsets   2688 / 832    xstate_sizes      8 / 8     
(00.001421) cpu: fpu:1 fxsr:1 xsave:1 xsaveopt:1 xsavec:1 xgetbv1:1 xsaves:1
(00.001789) cg-prop: Parsing controller "cpu"
(00.001806) cg-prop: 	Strategy "replace"
(00.001815) cg-prop: 	Property "cpu.shares"
(00.001820) cg-prop: 	Property "cpu.cfs_period_us"
(00.001826) cg-prop: 	Property "cpu.cfs_quota_us"
(00.001831) cg-prop: 	Property "cpu.rt_period_us"
(00.001835) cg-prop: 	Property "cpu.rt_runtime_us"
(00.001840) cg-prop: Parsing controller "memory"
(00.001845) cg-prop: 	Strategy "replace"
(00.001849) cg-prop: 	Property "memory.limit_in_bytes"
(00.001854) cg-prop: 	Property "memory.memsw.limit_in_bytes"
(00.001858) cg-prop: 	Property "memory.swappiness"
(00.001863) cg-prop: 	Property "memory.soft_limit_in_bytes"
(00.001867) cg-prop: 	Property "memory.move_charge_at_immigrate"
(00.001872) cg-prop: 	Property "memory.oom_control"
(00.001876) cg-prop: 	Property "memory.use_hierarchy"
(00.001880) cg-prop: 	Property "memory.kmem.limit_in_bytes"
(00.001885) cg-prop: 	Property "memory.kmem.tcp.limit_in_bytes"
(00.001889) cg-prop: Parsing controller "cpuset"
(00.001894) cg-prop: 	Strategy "replace"
(00.001899) cg-prop: 	Property "cpuset.cpus"
(00.001903) cg-prop: 	Property "cpuset.mems"
(00.001907) cg-prop: 	Property "cpuset.memory_migrate"
(00.001912) cg-prop: 	Property "cpuset.cpu_exclusive"
(00.001916) cg-prop: 	Property "cpuset.mem_exclusive"
(00.001920) cg-prop: 	Property "cpuset.mem_hardwall"
(00.001925) cg-prop: 	Property "cpuset.memory_spread_page"
(00.001929) cg-prop: 	Property "cpuset.memory_spread_slab"
(00.001934) cg-prop: 	Property "cpuset.sched_load_balance"
(00.001938) cg-prop: 	Property "cpuset.sched_relax_domain_level"
(00.001943) cg-prop: Parsing controller "blkio"
(00.001947) cg-prop: 	Strategy "replace"
(00.001952) cg-prop: 	Property "blkio.weight"
(00.001957) cg-prop: Parsing controller "freezer"
(00.001961) cg-prop: 	Strategy "replace"
(00.001966) cg-prop: Parsing controller "perf_event"
(00.001970) cg-prop: 	Strategy "replace"
(00.001975) cg-prop: Parsing controller "net_cls"
(00.001980) cg-prop: 	Strategy "replace"
(00.001984) cg-prop: 	Property "net_cls.classid"
(00.001988) cg-prop: Parsing controller "net_prio"
(00.001993) cg-prop: 	Strategy "replace"
(00.001998) cg-prop: 	Property "net_prio.ifpriomap"
(00.002002) cg-prop: Parsing controller "pids"
(00.002007) cg-prop: 	Strategy "replace"
(00.002011) cg-prop: 	Property "pids.max"
(00.002015) cg-prop: Parsing controller "devices"
(00.002020) cg-prop: 	Strategy "replace"
(00.002024) cg-prop: 	Property "devices.list"
(00.002106) Preparing image inventory (version 1)
(00.002232) Add pid ns 1 pid 42312
(00.002262) Add net ns 2 pid 42312
(00.002285) Add ipc ns 3 pid 42312
(00.002307) Add uts ns 4 pid 42312
(00.002328) Add time ns 5 pid 42312
(00.002358) Add mnt ns 6 pid 42312
(00.002386) Add user ns 7 pid 42312
(00.002414) Add cgroup ns 8 pid 42312
(00.002421) cg: Dumping cgroups for 42312
(00.002459) cg:  `- New css ID 1
(00.002465) cg:     `- [] -> [/user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-3717053b-cf26-4520-9ad4-9884aa06de13.scope] [0]
(00.002471) cg: Set 1 is criu one
(00.002534) Error (criu/seize.c:911): Neither a cgroupv1 (freezer.state) or cgroupv2 (cgroup.freeze) control file found.
(00.002576) Unlock network
(00.002596) Unfreezing tasks into 1
(00.002607) 	Unseizing 42253 into 1
(00.002621) Error (compel/src/lib/infect.c:356): Unable to detach from 42253: No such process
(00.002642) Error (criu/cr-dump.c:2053): Dumping FAILED.

Output of `criu --version`:

Version: 3.17.1

Output of `criu check --all`:

Looks good.

output of criu check --all: (criu 3.19)

Looks good but some kernel features are missing which, depending on your process tree, may cause dump or restore failure.

(it did't print the detail of missing feature??)

Additional environment details:

output of kernel config: kernel-config.txt

@adrianreber
Copy link
Member

Works for me with Podman 4.9.3 and CRIU 3.19 on Fedora with cgroup v1.

There is a patch for cgroup v2 in runc which has not made it to one of the releases yet which might be necessary for a v2 system. (opencontainers/runc#3546)

@obsidian0215
Copy link
Author

I tried and it worked well. criu 3.18+podman 3.4.1+runc 1.1.12(with cgroup v1)
image

So now the alternative is to use cgroup v1.(Will runc 1.2.0 release the patch for v2?)

@adrianreber
Copy link
Member

Will runc 1.2.0 release the patch for v2?

I don't know.

If your problem is solved, please close the ticket.

@obsidian0215
Copy link
Author

OK, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants