Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm 0.12.2 - nfd-worker logs permission denied on selinux and gfd #325

Open
RichardSufliarsky opened this issue Jul 29, 2022 · 1 comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@RichardSufliarsky
Copy link

1. Issue or feature description

nvdp deployed via helm chart v0.12.2 with gfd enabled, no other changes to values.yaml. Running on RHEL 8 with selinux enabled.
No nvidia.com/xxx labels are added to the kubernetes worker node. I have a workaround described in "2. Steps to reproduce the issue. " Please advice how to do it the right way.

nfd-worker logs permission denied (no sealert messages in system logs) on /host-sys/fs/selinux/enforce and /etc/kubernetes/node-feature-discovery/features.d//gfd
Result is that only nfd labels are added to the node, but no gfd labels.

I0729 12:08:26.342300       1 nfd-worker.go:155] Node Feature Discovery Worker v0.11.0
I0729 12:08:26.342435       1 nfd-worker.go:156] NodeName: 'gpu003.lab.cortical.io'
I0729 12:08:26.343127       1 nfd-worker.go:423] configuration file "/etc/kubernetes/node-feature-discovery/nfd-worker.conf" parsed
I0729 12:08:26.343265       1 nfd-worker.go:461] worker (re-)configuration successfully completed
I0729 12:08:26.343326       1 base.go:127] connecting to nfd-master at nvdp-node-feature-discovery-master:8080 ...
I0729 12:08:26.343376       1 component.go:36] [core]parsed scheme: ""
I0729 12:08:26.343389       1 component.go:36] [core]scheme "" not registered, fallback to default scheme
I0729 12:08:26.343416       1 component.go:36] [core]ccResolverWrapper: sending update to cc: {[{nvdp-node-feature-discovery-master:8080  <nil> 0 <nil>}] <nil> <nil>}
I0729 12:08:26.343430       1 component.go:36] [core]ClientConn switching balancer to "pick_first"
I0729 12:08:26.343439       1 component.go:36] [core]Channel switches to new LB policy "pick_first"
I0729 12:08:26.343494       1 component.go:36] [core]Subchannel Connectivity change to CONNECTING
I0729 12:08:26.343538       1 component.go:36] [core]Subchannel picks a new address "nvdp-node-feature-discovery-master:8080" to connect
I0729 12:08:26.343994       1 component.go:36] [core]Channel Connectivity change to CONNECTING
I0729 12:08:26.346260       1 component.go:36] [core]Subchannel Connectivity change to READY
I0729 12:08:26.346296       1 component.go:36] [core]Channel Connectivity change to READY
W0729 12:08:26.361713       1 kernel.go:145] failed to detect the status of selinux: open /host-sys/fs/selinux/enforce: permission denied
E0729 12:08:26.361921       1 local.go:87] unable to access /etc/kubernetes/node-feature-discovery/features.d/: lstat /etc/kubernetes/node-feature-discovery/features.d//gfd: permission denied
I0729 12:08:26.436158       1 nfd-worker.go:472] starting feature discovery...
I0729 12:08:26.436813       1 nfd-worker.go:484] feature discovery completed
I0729 12:08:26.436833       1 nfd-worker.go:565] sending labeling request to nfd-master

2. Steps to reproduce the issue

Deploy via helm and check nfd-worker pod logs.

I have checked node-feature-discovery-worker Daemon Set definition and worker container has this security context defined:

  securityContext:
    allowPrivilegeEscalation: false
    capabilities:
      drop: [ "ALL" ]
    readOnlyRootFilesystem: true
    runAsNonRoot: true

When I edit it and add privileged: true (and remove allowPrivilegeEscalation: false), then it works and adds node labels nvidia.com/xxx:

  securityContext:
    capabilities:
      drop: [ "ALL" ]
    privileged: true
    runAsNonRoot: true
    readOnlyRootFilesystem: true

Additional information that might help better understand your environment and reproduce the bug:

RHEL 8.5
selinux enabled (container-selinux installed)
kubernetes 1.24.2
cri-o 1.24.1

Added selinux policy modules to enable nfd-worker and nvidia-device-plugin running without generating sealert logs:
allow container_t kubernetes_file_t:dir read;
allow container_t container_runtime_t:unix_stream_socket connectto;
allow container_t container_runtime_tmpfs_t:file { open read };
allow container_t xserver_misc_device_t:chr_file { getattr ioctl open read write };

Copy link

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

1 participant