Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during unshare(CLONE_NEWUSER): Invalid argument error parsing PID (RHEL8) #4087

Closed
faandg opened this issue Jul 1, 2022 · 12 comments
Closed

Comments

@faandg
Copy link

faandg commented Jul 1, 2022

Description

Steps to reproduce the issue:

  1. install container-tools dnf module
  2. run any buildah command as unprivileged user with namespaces enabled (except buildah version)
  3. worked fine for a long time but stopped working

Describe the results you received:

buildah info
Error during unshare(CLONE_NEWUSER): Invalid argument
ERRO[0000] error parsing PID "": strconv.Atoi: parsing "": invalid syntax
ERRO[0000] (unable to determine exit status)

Describe the results you expected:
buildah info to run successfully:

buildah info
{
    "host": {
        "CgroupVersion": "v1",
        "Distribution": {
            "distribution": "\"rhel\"",
            "version": "8.5"
        },
        "MemFree": 4091355136,
        "MemTotal": 8117874688,
        "OCIRuntime": "runc",
        "SwapFree": 2184429568,
        "SwapTotal": 2222977024,
        "arch": "amd64",
        "cpus": 4,
        "hostname": "REDACTED",
        "kernel": "4.18.0-348.23.1.el8_5.x86_64",
        "os": "linux",
        "rootless": true,
        "uptime": "2h 39m 42.55s (Approximately 0.08 days)"
    },
    "store": {
        "ContainerStore": {
            "number": 0
        },
        "GraphDriverName": "overlay",
        "GraphOptions": [
            "overlay.mount_program=/usr/bin/fuse-overlayfs",
            "overlay.mount_program=/usr/bin/fuse-overlayfs"
        ],
        "GraphRoot": "/var/lib/jenkins/.local/share/containers/storage",
        "GraphStatus": {
            "Backing Filesystem": "xfs",
            "Native Overlay Diff": "false",
            "Supports d_type": "true",
            "Using metacopy": "false"
        },
        "ImageStore": {
            "number": 8
        },
        "RunRoot": "/run/user/1003"
    }
}

Output of rpm -q buildah or apt list buildah:

buildah-1.23.1-2.module+el8.5.0+13436+9c05b4ba.x86_6

Output of buildah version:

Version:         1.23.1
Go Version:      go1.16.7
Image Spec:      1.0.1-dev
Runtime Spec:    1.0.2-dev
CNI Spec:        0.4.0
libcni Version:  v0.8.1
image Version:   5.16.0
Git Commit:
Built:           Tue Nov 23 13:34:32 2021
OS/Arch:         linux/amd64
BuildPlatform:   linux/amd64

Output of podman version if reporting a podman build issue:

Version:      3.4.2
API Version:  3.4.2
Go Version:   go1.16.7
Built:        Thu Jan 13 11:15:49 2022
OS/Arch:      linux/amd64

Output of cat /etc/*release:

NAME="Red Hat Enterprise Linux"
VERSION="8.5 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.5"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.5 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/red_hat_enterprise_linux/8/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.5
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.5"
Red Hat Enterprise Linux release 8.5 (Ootpa)
Red Hat Enterprise Linux release 8.5 (Ootpa)

Output of uname -a:

Linux REDACTED 4.18.0-348.23.1.el8_5.x86_64 #1 SMP Tue Apr 12 11:20:32 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

Output of cat /etc/containers/storage.conf:

[storage]
driver = "overlay"
runroot = "/run/containers/storage"
graphroot = "/var/lib/containers/storage"

[storage.options]
additionalimagestores = [
]

[storage.options.overlay]
mountopt = "nodev,metacopy=on"

[storage.options.thinpool]

(removed comments, hostnames)

Additional info:

  • while buildah bud does not work, podman build does
  • /usr/bin/newuidmap = cap_setuid+ep
  • /usr/bin/newgidmap = cap_setgid+ep
  • unprivileged_userns_clone is unset

cat /etc/subuid
jenkins:296608:65536

cat /etc/subgid
jenkins:296608:65536

grep CONFIG_USER_NS /boot/config-$(uname -r)
CONFIG_USER_NS=y

capsh --print

Current: =
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore
Ambient set =
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
 secure-no-ambient-raise: no (unlocked)
uid=1003(jenkins)
gid=1003(jenkins)
groups=1003(jenkins)

We have another node with the same setup where buildah works fine. Not sure where else to look at this point.
Please let me know if additional info is required.

@rhatdan
Copy link
Member

rhatdan commented Jul 1, 2022

Looks like you are running it in some kind of container or some place without the unshare SYSCALL.

@faandg
Copy link
Author

faandg commented Jul 1, 2022

That's the weird part- I've seen a lot of similar issues on this github and RH documentation which indeed state that is the problem but this is running directly on a RHEL VM. Also, it seems stuff like unshare -U works.

Any additional information I can provide to verify?

@rhatdan
Copy link
Member

rhatdan commented Jul 2, 2022

Does buildah unshare echo hi work? Or buildah unshare cat /proc/self/uid_map

@faandg
Copy link
Author

faandg commented Jul 4, 2022

Alas, both commands return the same thing. Added log-level=debug:

$ buildah --log-level=debug unshare cat /proc/self/uid_map
DEBU[0000] running [buildah-in-a-user-namespace --log-level=debug unshare cat /proc/self/uid_map] with environment [LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.m4a=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.oga=01;36:*.opus=01;36:*.spx=01;36:*.xspf=01;36: LANG=en_US.UTF-8 HISTCONTROL=ignoredups HOSTNAME=REDACTED JAVA_HOME=/usr/lib/jvm/java-11-openjdk which_declare=declare -f CLASSPATH=.:/usr/lib/jvm/java-11-openjdk/jre/lib:/usr/lib/jvm/java-11-openjdk/lib:/usr/lib/jvm/java-11-openjdk/lib/tools.jar USER=jenkins PWD=/var/lib/jenkins HOME=/var/lib/jenkins XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop MAIL=/var/spool/mail/jenkins SHELL=/bin/bash TERM=xterm SHLVL=1 LOGNAME=jenkins PATH=/var/lib/jenkins/.local/bin:/var/lib/jenkins/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/lib/jvm/java-11-openjdk/bin:/usr/local/bin: HISTSIZE=1000 LESSOPEN=||/usr/bin/lesspipe.sh %s BASH_FUNC_which%%=() {  ( alias;
 eval ${which_declare} ) | /usr/bin/which --tty-only --read-alias --read-functions --show-tilde --show-dot "$@"
} _=/usr/bin/buildah TMPDIR=/var/tmp _CONTAINERS_USERNS_CONFIGURED=1 BUILDAH_ISOLATION=rootless], UID map [{ContainerID:0 HostID:1003 Size:1} {ContainerID:1 HostID:296608 Size:65536}], and GID map [{ContainerID:0 HostID:1003 Size:1} {ContainerID:1 HostID:296608 Size:65536}]
Error during unshare(CLONE_NEWUSER): Invalid argument
ERRO[0000] error parsing PID "": strconv.Atoi: parsing "": invalid syntax
ERRO[0000] (unable to determine exit status)

@faandg
Copy link
Author

faandg commented Jul 5, 2022

The issue was resolved by reinstalling the dnf module container-tools

@faandg faandg closed this as completed Jul 5, 2022
@faandg faandg reopened this Jul 6, 2022
@faandg
Copy link
Author

faandg commented Jul 6, 2022

Please ignore my previous comment as I diagnosed with the wrong user, the issue was not resolved

@ranjithrajaram
Copy link
Contributor

Do you have any monitoring agent like Dynatrace(dynatrace modules configured to load via /etc/ld.so.preload) on the node ?. If yes, can you try disabling the preload and check

@faandg
Copy link
Author

faandg commented Jul 15, 2022

@rhatdan I opened a case with RedHat support and the root cause is a Dynatrace OneAgent lib injection (full stack monitoring feature ):

$ cat /etc/ld.so.preload
**/lib64/liboneagentproc.so**

$ ldd `which buildah`
        linux-vdso.so.1 (0x00007ffc8ff55000)
        **/lib64/liboneagentproc.so (0x00007f0ff8657000)**
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f0ff8437000)
        libgpgme.so.11 => /lib64/libgpgme.so.11 (0x00007f0ff81e7000)
        libassuan.so.0 => /lib64/libassuan.so.0 (0x00007f0ff7fd3000)
        libgpg-error.so.0 => /lib64/libgpg-error.so.0 (0x00007f0ff7db2000)
        libseccomp.so.2 => /lib64/libseccomp.so.2 (0x00007f0ff7b93000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f0ff798f000)
        libdevmapper.so.1.02 => /lib64/libdevmapper.so.1.02 (0x00007f0ff7735000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f0ff7370000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f0ffa7d6000)
        libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f0ff7146000)
        libsepol.so.1 => /lib64/libsepol.so.1 (0x00007f0ff6e95000)
        libudev.so.1 => /lib64/libudev.so.1 (0x00007f0ff6bfe000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f0ff687c000)
        libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007f0ff65f8000)
        libmount.so.1 => /lib64/libmount.so.1 (0x00007f0ff639e000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f0ff6186000)
        libblkid.so.1 => /lib64/libblkid.so.1 (0x00007f0ff5f33000)
        libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f0ff5d2b000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f0ff5b23000)

This seems to conflict only sometimes.
Would you like me to provide the case number?

@ranjithrajaram
Copy link
Contributor

You may have to follow it up with Dynatrace to understand why it is blocking the unshare syscall ?

@faandg
Copy link
Author

faandg commented Jul 15, 2022

Well, the case contains an strace and at this point I cannot determine whether a fix would be needed in buildah or a fix would be needed in the dynatrace oneagent as I don't have the necessary expertise to troubleshoot syscalls, so I thought I'd inform both parties.

@rhatdan
Copy link
Member

rhatdan commented Jul 15, 2022

Looks like you are running buildah in a rootless enviroment, which will trigger it to unshare to setup the user namespace. Have you tried to run it as root inside of the container?

@faandg
Copy link
Author

faandg commented Jul 18, 2022

It seems like you did not read the latest comments but no matter, the issue is resolved.

For anyone experiencing the same issue:

  • check if you have a Dynatrace agent installed
  • check for entries in cat /etc/ld.so.preload
  • check if ldd $(which buildah) shows a dynatrace library being loaded
  • in Dynatrace, check if deep monitoring is enabled on group processes like buildah, as it will cause them to fail with the "Error during unshare(CLONE_NEWUSER)" --> disable deep monitoring for now as it does not seem to be compatible with buildah

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants