Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

container stats not working properly on zfs backend, zfs/graph/graph: no such file or directory #5820

Closed
cnfatal opened this issue Apr 25, 2022 · 6 comments · Fixed by #5821
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@cnfatal
Copy link
Contributor

cnfatal commented Apr 25, 2022

What happened?

Apr 25 20:28:46 user-desktop crio[1271281]: time="2022-04-25 20:28:46.398054959+08:00" level=error msg="Unable to get disk usage for container aa945de4f85e56e0a0f476db6e7c1d7509a8e44b94e0e643c66336d77417c255: lstat /var/lib/containers/storage/zfs/graph/graph: no such file or directory"
Apr 25 20:28:46 user-desktop crio[1271281]: time="2022-04-25 20:28:46.398209369+08:00" level=error msg="Unable to get disk usage for container 47b6b32496c02d54e0ea4eacd4dc9c9b4e6dbbae6225aeff79b0e212d7a30a94: lstat /var/lib/containers/storage/zfs/graph/graph: no such file or directory"
Apr 25 20:28:46 user-desktop crio[1271281]: time="2022-04-25 20:28:46.398348689+08:00" level=error msg="Unable to get disk usage for container 4dea061240a57dd9f3bd882a1154df4dff91f82dea93fb0e904e77af6585cc64: lstat /var/lib/containers/storage/zfs/graph/graph: no such file or directory"
Apr 25 20:28:46 user-desktop crio[1271281]: time="2022-04-25 20:28:46.398490934+08:00" level=error msg="Unable to get disk usage for container b6071d59badbcaecc02f448bfae22af3d7057d9cf74629f8ea944d3bc3feae7a: lstat /var/lib/containers/storage/zfs/graph/graph: no such file or directory"
Apr 25 20:28:46 user-desktop crio[1271281]: time="2022-04-25 20:28:46.398659647+08:00" level=error msg="Unable to get disk usage for container 2d6800719377ce96443f7addf49b82c0bb3d9ed4bdfdc89c0f5c84c00cc226cf: lstat /var/lib/containers/storage/zfs/graph/graph: no such file or directory"
Apr 25 20:28:46 user-desktop crio[1271281]: time="2022-04-25 20:28:46.398819006+08:00" level=error msg="Unable to get disk usage for container e48771f3cbfb8c88724ab5120c2376afdb661034d6615c34fade1777498b58fe: lstat /var/lib/containers/storage/zfs/graph/graph: no such file or directory"
Apr 25 20:28:46 user-desktop crio[1271281]: time="2022-04-25 20:28:46.398955391+08:00" level=error msg="Unable to get disk usage for container 36aac99925056cdba2e922666539163e467d9d6e8bac923ff12a1b642ed7f593: lstat /var/lib/containers/storage/zfs/graph/graph: no such file or directory"
Apr 25 20:28:46 user-desktop crio[1271281]: time="2022-04-25 20:28:46.399089699+08:00" level=error msg="Unable to get disk usage for container 66b1c4324a610ee932984549ef90e7636b7cb21a4e117c08c3737cf108312c25: lstat /var/lib/containers/storage/zfs/graph/graph: no such file or directory"
Apr 25 20:28:46 user-desktop crio[1271281]: time="2022-04-25 20:28:46.399226041+08:00" level=error msg="Unable to get disk usage for container 70956ec4858aac13e4e3c8d7ae937a56267756691c61fb863d09f402de3ada19: lstat /var/lib/containers/storage/zfs/graph/graph: no such file or directory"
Apr 25 20:28:46 user-desktop crio[1271281]: time="2022-04-25 20:28:46.399359220+08:00" level=error msg="Unable to get disk usage for container d5e40627dd2a491bac876f74f4d4229c752383bce10ad5c58bd88279b4ed66d7: lstat /var/lib/containers/storage/zfs/graph/graph: no such file or directory"
Apr 25 20:28:46 user-desktop crio[1271281]: time="2022-04-25 20:28:46.399504962+08:00" level=error msg="Unable to get disk usage for container ccc81be0a509d86fbd59c770ab290d9fb94e2512cdcebab23ed7261734c5d457: lstat /var/lib/containers/storage/zfs/graph/graph: no such file or directory"
Apr 25 20:28:46 user-desktop crio[1271281]: time="2022-04-25 20:28:46.399644069+08:00" level=error msg="Unable to get disk usage for container 4688ef29b68ecc2a215f844b6ec6b4ace480309979c5704c464140ce5f90a46d: lstat /var/lib/containers/storage/zfs/graph/graph: no such file or directory"
Apr 25 20:28:46 user-desktop crio[1271281]: time="2022-04-25 20:28:46.399779916+08:00" level=error msg="Unable to get disk usage for container d0416de7e0d3178db944b0091c7f4de870cc644cb79fe9c605445a4467e41f15: lstat /var/lib/containers/storage/zfs/graph/graph: no such file or directory"
Apr 25 20:28:46 user-desktop crio[1271281]: time="2022-04-25 20:28:46.399918646+08:00" level=error msg="Unable to get disk usage for container 48224ef19fa780bfc284e58eef0a6d69593014e0854ab51a61ceae22ed96b407: lstat /var/lib/containers/storage/zfs/graph/graph: no such file or directory"
Apr 25 20:28:46 user-desktop crio[1271281]: time="2022-04-25 20:28:46.400053576+08:00" level=error msg="Unable to get disk usage for container 974e1b51443659418084f6c0d4baded4ff138457e7092261cfb6cb382741b459: lstat /var/lib/containers/storage/zfs/graph/graph: no such file or directory"
Apr 25 20:28:46 user-desktop crio[1271281]: time="2022-04-25 20:28:46.400186061+08:00" level=error msg="Unable to get disk usage for container 235da4898ba2aa0a0c7bb154df1c894e744dcd124981afa7f265bffb9228a595: lstat /var/lib/containers/storage/zfs/graph/graph: no such file or directory"
Apr 25 20:28:46 user-desktop crio[1271281]: time="2022-04-25 20:28:46.400330504+08:00" level=error msg="Unable to get disk usage for container b4239d8f422e80f9b3e53642434561c8e6f2021d162832e9b267dcb165357988: lstat /var/lib/containers/storage/zfs/graph/graph: no such file or directory"
Apr 25 20:28:46 user-desktop crio[1271281]: time="2022-04-25 20:28:46.400465128+08:00" level=error msg="Unable to get disk usage for container 8b90a51ea464fd84d1d3b2f1b9ca2d326ecf741eaeb581b153834c3d12ca2a6a: lstat /var/lib/containers/storage/zfs/graph/graph: no such file or directory"
Apr 25 20:28:46 user-desktop crio[1271281]: time="2022-04-25 20:28:46.400601205+08:00" level=error msg="Unable to get disk usage for container 9d33fbb5d0ad23a11697687e3a1bde5f7d31e52ae88e7a5a8fb2695496f8f172: lstat /var/lib/containers/storage/zfs/graph/graph: no such file or directory"
Apr 25 20:28:46 user-desktop crio[1271281]: time="2022-04-25 20:28:46.400736646+08:00" level=error msg="Unable to get disk usage for container 54b31137ed5c952c8ca3856d39535c8fb96f8e22e1543ec692e20ce28849d821: lstat /var/lib/containers/storage/zfs/graph/graph: no such file or directory"

It cause by

id := filepath.Base(filepath.Dir(container.MountPoint()))

When using zfs backend, container.MountPoint() has value like /var/lib/containers/storage/zfs/graph/40f155bca81dd456f379f2a349b707c1b14e87bc5fed94e65182fc4a640a088e

The id is always "graph", and produce an error above.

But it is works well on overlay driver, on overlay the container mountPoint seems like /var/lib/containers/storage/overlay/679961db0ea4e298b51261b14c5b5a30252ce5425c4abcda84180f800611acac/merged

image

I'll be glade to do a PR if needed.

What did you expect to happen?

Expected id has a value of 519dff473ffe68973c81c754dcf4ef643b6d0305085f3317ffe096d2249c1976 at above case.

How can we reproduce it (as minimally and precisely as possible)?

A zfs driver base crio runtime.

Anything else we need to know?

No response

CRI-O and Kubernetes version

$  crio --version
crio version 1.23.2
Version:          1.23.2
GitCommit:        c0b2474b80fd0844b883729bda88961bed7b472b
GitTreeState:     clean
BuildDate:        2022-04-14T15:23:11Z
GoVersion:        go1.17.5
Compiler:         gc
Platform:         linux/amd64
Linkmode:         dynamic
BuildTags:        apparmor, exclude_graphdriver_devicemapper, containers_image_ostree_stub, seccomp
SeccompEnabled:   true
AppArmorEnabled:  false
$ kubectl version 
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:49:13Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}
The connection to the server 10.12.32.11:6443 was refused - did you specify the right host or port?
$ sudo crio-status c
[crio]
  root = "/var/lib/containers/storage"
  runroot = "/run/containers/storage"
  storage_driver = "zfs"
  log_dir = "/var/log/crio/pods"
  version_file = "/var/run/crio/version"
  version_file_persist = "/var/lib/crio/version"
  clean_shutdown_file = "/var/lib/crio/clean.shutdown"
  internal_wipe = true
  [crio.api]
    grpc_max_send_msg_size = 83886080
    grpc_max_recv_msg_size = 83886080
    listen = "/var/run/crio/crio.sock"
    stream_address = "127.0.0.1"
    stream_port = "0"
    stream_enable_tls = false
    stream_tls_cert = ""
    stream_tls_key = ""
    stream_tls_ca = ""
    stream_idle_timeout = ""
  [crio.runtime]
    seccomp_use_default_when_empty = true
    no_pivot = false
    selinux = false
    log_to_journald = false
    drop_infra_ctr = true
    read_only = false
    conmon_env = ["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"]
    hooks_dir = ["/usr/share/containers/oci/hooks.d"]
    default_capabilities = ["CHOWN", "DAC_OVERRIDE", "FSETID", "FOWNER", "SETGID", "SETUID", "SETPCAP", "NET_BIND_SERVICE", "KILL"]
    allowed_devices = ["/dev/fuse"]
    cdi_spec_dirs = ["/etc/cdi", "/var/run/cdi"]
    device_ownership_from_security_context = false
    default_runtime = "runc"
    decryption_keys_path = "/etc/crio/keys/"
    conmon = "/usr/bin/conmon"
    conmon_cgroup = "system.slice"
    seccomp_profile = ""
    apparmor_profile = "crio-default"
    blockio_config_file = ""
    irqbalance_config_file = "/etc/sysconfig/irqbalance"
    rdt_config_file = ""
    cgroup_manager = "systemd"
    default_mounts_file = ""
    container_exits_dir = "/var/run/crio/exits"
    container_attach_socket_dir = "/var/run/crio"
    bind_mount_prefix = ""
    uid_mappings = ""
    minimum_mappable_uid = -1
    gid_mappings = ""
    minimum_mappable_gid = -1
    log_level = "info"
    log_filter = ""
    namespaces_dir = "/var/run"
    pinns_path = "/usr/bin/pinns"
    pids_limit = 1024
    log_size_max = -1
    ctr_stop_timeout = 30
    separate_pull_cgroup = ""
    infra_ctr_cpuset = ""
    [crio.runtime.runtimes]
      [crio.runtime.runtimes.runc]
        runtime_config_path = ""
        runtime_path = "/usr/lib/cri-o-runc/sbin/runc"
        runtime_type = "oci"
        runtime_root = "/run/runc"
        DisallowedAnnotations = ["cpu-quota.crio.io", "irq-load-balancing.crio.io", "io.containers.trace-syscall", "io.kubernetes.cri-o.TrySkipVolumeSELinuxLabel", "io.kubernetes.cri-o.cgroup2-mount-hierarchy-rw", "io.kubernetes.cri-o.ShmSize", "cpu-load-balancing.crio.io", "io.kubernetes.cri.rdt-class", "io.kubernetes.cri-o.userns-mode", "io.kubernetes.cri-o.UnifiedCgroup", "io.kubernetes.cri-o.Devices"]
  [crio.image]
    default_transport = "docker://"
    global_auth_file = ""
    pause_image = "registry.k8s.io/pause:3.6"
    pause_image_auth_file = ""
    pause_command = "/pause"
    signature_policy = ""
    image_volumes = "mkdir"
    big_files_temporary_dir = ""
  [crio.network]
    cni_default_network = ""
    network_dir = "/etc/cni/net.d/"
    plugin_dirs = ["/opt/cni/bin/"]
  [crio.metrics]
    enable_metrics = false
    metrics_collectors = ["operations", "operations_latency_microseconds_total", "operations_latency_microseconds", "operations_errors", "image_pulls_by_digest", "image_pulls_by_name", "image_pulls_by_name_skipped", "image_pulls_failures", "image_pulls_successes", "image_pulls_layer_size", "image_layer_reuse", "containers_oom_total", "containers_oom", "processes_defunct", "operations_total", "operations_latency_seconds", "operations_latency_seconds_total", "operations_errors_total", "image_pulls_bytes_total", "image_pulls_skipped_bytes_total", "image_pulls_failure_total", "image_pulls_success_total", "image_layer_reuse_total", "containers_oom_count_total"]
    metrics_port = 9090
    metrics_socket = ""
    metrics_cert = ""
    metrics_key = ""
  [crio.tracing]
    enable_tracing = false
    tracing_endpoint = "0.0.0.0:4317"
    tracing_sampling_rate_per_million = 0
  [crio.stats]
    stats_collection_period = 0

OS version

# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 21.10"
NAME="Ubuntu"
VERSION_ID="21.10"
VERSION="21.10 (Impish Indri)"
VERSION_CODENAME=impish
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=impish
$ uname -a
Linux user-desktop 5.13.0-30-generic #33-Ubuntu SMP Fri Feb 4 17:03:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Additional environment details (AWS, VirtualBox, physical, etc.)

@cnfatal cnfatal added the kind/bug Categorizes issue or PR as related to a bug. label Apr 25, 2022
@haircommander
Copy link
Member

haircommander commented Apr 25, 2022

oh darn. I had hoped we could avoid switching based on storage driver. @nalind @giuseppe can you think of a storage driver agnostic way to get the ID of mountpoint from the path?

@giuseppe
Copy link
Member

zfs backend is not really supported, any reason for using it?

@cnfatal
Copy link
Contributor Author

cnfatal commented Apr 25, 2022

I'm running k8s cluster on my Ubuntu workstation which using the zfs as root file system. but that's not the point .

@nalind
Copy link
Collaborator

nalind commented Apr 26, 2022

oh darn. I had hoped we could avoid switching based on storage driver. @nalind @giuseppe can you think of a storage driver agnostic way to get the ID of mountpoint from the path?

There's no guaranteed relationship between the two. It would be best for CRI-O to not make any assumptions about relationships between the two. If you have the container's ID, its github.com/containers/storage.Container record tracks the ID of the read-write layer, which is what I think is being sought here.

@morsik
Copy link

morsik commented May 27, 2024

For future reference for others:

CRI-O/Kubernetes works correctly with ZFS >=2.2 where overlayfs support was correctly implemented. You may need to verify your distribution's repository, as in Debian 12 you need to enable bookworm-backports (both main and contrib) to be able to install 2.2. Otherwise you'll end up using 2.1 which won't work with overlayfs.

@kwilczynski
Copy link
Member

@morsik, thank you for the follow up! Appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants