-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syscall fchmodat2
allows to create suid binaries
#10424
Comments
fyi i think there is possibly an ongoing private security issue related to this, in case you are wondering about lack of response. |
The
That would be much appreciated, yes :) |
Regarding
In case other people than me try to use this: It seems that it can't (at least not easily) be used for that because it can only create empty setuid files (rather useless), and writing anything to them drops the setuid bit |
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that isn't filtered away by the libseccomp sandbox. Being able to use this to bypass that restriction has surprising results for some builds such as lxc[1]: > With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2, > which slips through https://github.com/NixOS/nix/blob/9b88e5284608116b7db0dbd3d5dd7a33b90d52d7/src/libstore/build/local-derivation-goal.cc#L1650-L1663. > The fixupPhase then uses fchmodat, which fails. > With older kernel or glibc, setting the suid bit fails in the > install phase, which is not treated as fatal, and then the > fixup phase does not try to set it again. Please note that there are still ways to bypass this sandbox[2] and this is mostly a fix for the breaking builds. This change works by checking if seccomp is new enough to support fchmodat2 (this is the case with the combination glibc 2.39, libseccomp 2.5.5 & Linux 6.6 from nixos-unstable). Since the project's flake uses 23.11, the code isn't compiled into the Nix built by this flake. I tested the change by adding the diff below as patch to `pkgs/tools/package-management/nix/common.nix` & then built a VM from the following config using my dirty nixpkgs master: { vm = { pkgs, ... }: { virtualisation.writableStore = true; virtualisation.memorySize = 8192; virtualisation.diskSize = 12 * 1024; nix.package = pkgs.nixVersions.nix_2_21; }; } The original issue can be triggered via nix build -L github:nixos/nixpkgs/d6dc19adbda4fd92fe9a332327a8113eaa843894#lxc \ --extra-experimental-features 'nix-command flakes' however the problem disappears with this patch applied. Closes NixOS#10424 [1] NixOS/nixpkgs#300635 (comment) [2] NixOS/nixpkgs#300635 (comment)
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that isn't filtered away by the libseccomp sandbox. Being able to use this to bypass that restriction has surprising results for some builds such as lxc[1]: > With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2, > which slips through https://github.com/NixOS/nix/blob/9b88e5284608116b7db0dbd3d5dd7a33b90d52d7/src/libstore/build/local-derivation-goal.cc#L1650-L1663. > The fixupPhase then uses fchmodat, which fails. > With older kernel or glibc, setting the suid bit fails in the > install phase, which is not treated as fatal, and then the > fixup phase does not try to set it again. Please note that there are still ways to bypass this sandbox[2] and this is mostly a fix for the breaking builds. This change works by checking if seccomp is new enough to support fchmodat2 (this is the case with the combination glibc 2.39, libseccomp 2.5.5 & Linux 6.6 from nixos-unstable). Since the project's flake uses 23.11, the code isn't compiled into the Nix built by this flake. I tested the change by adding the diff below as patch to `pkgs/tools/package-management/nix/common.nix` & then built a VM from the following config using my dirty nixpkgs master: { vm = { pkgs, ... }: { virtualisation.writableStore = true; virtualisation.memorySize = 8192; virtualisation.diskSize = 12 * 1024; nix.package = pkgs.nixVersions.nix_2_21; }; } The original issue can be triggered via nix build -L github:nixos/nixpkgs/d6dc19adbda4fd92fe9a332327a8113eaa843894#lxc \ --extra-experimental-features 'nix-command flakes' however the problem disappears with this patch applied. Closes NixOS#10424 [1] NixOS/nixpkgs#300635 (comment) [2] NixOS/nixpkgs#300635 (comment)
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that isn't filtered away by the libseccomp sandbox. Being able to use this to bypass that restriction has surprising results for some builds such as lxc[1]: > With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2, > which slips through https://github.com/NixOS/nix/blob/9b88e5284608116b7db0dbd3d5dd7a33b90d52d7/src/libstore/build/local-derivation-goal.cc#L1650-L1663. > The fixupPhase then uses fchmodat, which fails. > With older kernel or glibc, setting the suid bit fails in the > install phase, which is not treated as fatal, and then the > fixup phase does not try to set it again. Please note that there are still ways to bypass this sandbox[2] and this is mostly a fix for the breaking builds. This change works by creating a syscall filter for the `fchmodat2` syscall (number 452 on most systems). The problem is that glibc 2.39 and seccomp 2.5.5 are needed to have the correct syscall number available via `__NR_fchmodat2` / `__SNR_fchmodat2`, but this flake is still on nixpkgs 23.11. To have this change everywhere and not dependent on the glibc this package is built against, I added a header "fchmodat2-compat.hh" that sets the syscall number based on the architecture. On most platforms its 452 according to glibc with a few exceptions: $ rg --pcre2 'define __NR_fchmodat2 (?!452)' sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h 58:#define __NR_fchmodat2 1073742276 sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h 67:#define __NR_fchmodat2 6452 sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h 62:#define __NR_fchmodat2 5452 sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h 70:#define __NR_fchmodat2 4452 sysdeps/unix/sysv/linux/alpha/arch-syscall.h 59:#define __NR_fchmodat2 562 I tested the change by adding the diff below as patch to `pkgs/tools/package-management/nix/common.nix` & then built a VM from the following config using my dirty nixpkgs master: { vm = { pkgs, ... }: { virtualisation.writableStore = true; virtualisation.memorySize = 8192; virtualisation.diskSize = 12 * 1024; nix.package = pkgs.nixVersions.nix_2_21; }; } The original issue can be triggered via nix build -L github:nixos/nixpkgs/d6dc19adbda4fd92fe9a332327a8113eaa843894#lxc \ --extra-experimental-features 'nix-command flakes' however the problem disappears with this patch applied. Closes NixOS#10424 [1] NixOS/nixpkgs#300635 (comment) [2] NixOS/nixpkgs#300635 (comment)
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that isn't filtered away by the libseccomp sandbox. Being able to use this to bypass that restriction has surprising results for some builds such as lxc[1]: > With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2, > which slips through https://github.com/NixOS/nix/blob/9b88e5284608116b7db0dbd3d5dd7a33b90d52d7/src/libstore/build/local-derivation-goal.cc#L1650-L1663. > The fixupPhase then uses fchmodat, which fails. > With older kernel or glibc, setting the suid bit fails in the > install phase, which is not treated as fatal, and then the > fixup phase does not try to set it again. Please note that there are still ways to bypass this sandbox[2] and this is mostly a fix for the breaking builds. This change works by creating a syscall filter for the `fchmodat2` syscall (number 452 on most systems). The problem is that glibc 2.39 and seccomp 2.5.5 are needed to have the correct syscall number available via `__NR_fchmodat2` / `__SNR_fchmodat2`, but this flake is still on nixpkgs 23.11. To have this change everywhere and not dependent on the glibc this package is built against, I added a header "fchmodat2-compat.hh" that sets the syscall number based on the architecture. On most platforms its 452 according to glibc with a few exceptions: $ rg --pcre2 'define __NR_fchmodat2 (?!452)' sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h 58:#define __NR_fchmodat2 1073742276 sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h 67:#define __NR_fchmodat2 6452 sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h 62:#define __NR_fchmodat2 5452 sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h 70:#define __NR_fchmodat2 4452 sysdeps/unix/sysv/linux/alpha/arch-syscall.h 59:#define __NR_fchmodat2 562 I tested the change by adding the diff below as patch to `pkgs/tools/package-management/nix/common.nix` & then built a VM from the following config using my dirty nixpkgs master: { vm = { pkgs, ... }: { virtualisation.writableStore = true; virtualisation.memorySize = 8192; virtualisation.diskSize = 12 * 1024; nix.package = pkgs.nixVersions.nix_2_21; }; } The original issue can be triggered via nix build -L github:nixos/nixpkgs/d6dc19adbda4fd92fe9a332327a8113eaa843894#lxc \ --extra-experimental-features 'nix-command flakes' however the problem disappears with this patch applied. Closes NixOS#10424 [1] NixOS/nixpkgs#300635 (comment) [2] NixOS/nixpkgs#300635 (comment)
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that isn't filtered away by the libseccomp sandbox. Being able to use this to bypass that restriction has surprising results for some builds such as lxc[1]: > With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2, > which slips through https://github.com/NixOS/nix/blob/9b88e5284608116b7db0dbd3d5dd7a33b90d52d7/src/libstore/build/local-derivation-goal.cc#L1650-L1663. > The fixupPhase then uses fchmodat, which fails. > With older kernel or glibc, setting the suid bit fails in the > install phase, which is not treated as fatal, and then the > fixup phase does not try to set it again. Please note that there are still ways to bypass this sandbox[2] and this is mostly a fix for the breaking builds. This change works by creating a syscall filter for the `fchmodat2` syscall (number 452 on most systems). The problem is that glibc 2.39 and seccomp 2.5.5 are needed to have the correct syscall number available via `__NR_fchmodat2` / `__SNR_fchmodat2`, but this flake is still on nixpkgs 23.11. To have this change everywhere and not dependent on the glibc this package is built against, I added a header "fchmodat2-compat.hh" that sets the syscall number based on the architecture. On most platforms its 452 according to glibc with a few exceptions: $ rg --pcre2 'define __NR_fchmodat2 (?!452)' sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h 58:#define __NR_fchmodat2 1073742276 sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h 67:#define __NR_fchmodat2 6452 sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h 62:#define __NR_fchmodat2 5452 sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h 70:#define __NR_fchmodat2 4452 sysdeps/unix/sysv/linux/alpha/arch-syscall.h 59:#define __NR_fchmodat2 562 I tested the change by adding the diff below as patch to `pkgs/tools/package-management/nix/common.nix` & then built a VM from the following config using my dirty nixpkgs master: { vm = { pkgs, ... }: { virtualisation.writableStore = true; virtualisation.memorySize = 8192; virtualisation.diskSize = 12 * 1024; nix.package = pkgs.nixVersions.nix_2_21; }; } The original issue can be triggered via nix build -L github:nixos/nixpkgs/d6dc19adbda4fd92fe9a332327a8113eaa843894#lxc \ --extra-experimental-features 'nix-command flakes' however the problem disappears with this patch applied. Closes NixOS#10424 [1] NixOS/nixpkgs#300635 (comment) [2] NixOS/nixpkgs#300635 (comment)
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that isn't filtered away by the libseccomp sandbox. Being able to use this to bypass that restriction has surprising results for some builds such as lxc[1]: > With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2, > which slips through https://github.com/NixOS/nix/blob/9b88e5284608116b7db0dbd3d5dd7a33b90d52d7/src/libstore/build/local-derivation-goal.cc#L1650-L1663. > The fixupPhase then uses fchmodat, which fails. > With older kernel or glibc, setting the suid bit fails in the > install phase, which is not treated as fatal, and then the > fixup phase does not try to set it again. Please note that there are still ways to bypass this sandbox[2] and this is mostly a fix for the breaking builds. This change works by creating a syscall filter for the `fchmodat2` syscall (number 452 on most systems). The problem is that glibc 2.39 and seccomp 2.5.5 are needed to have the correct syscall number available via `__NR_fchmodat2` / `__SNR_fchmodat2`, but this flake is still on nixpkgs 23.11. To have this change everywhere and not dependent on the glibc this package is built against, I added a header "fchmodat2-compat.hh" that sets the syscall number based on the architecture. On most platforms its 452 according to glibc with a few exceptions: $ rg --pcre2 'define __NR_fchmodat2 (?!452)' sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h 58:#define __NR_fchmodat2 1073742276 sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h 67:#define __NR_fchmodat2 6452 sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h 62:#define __NR_fchmodat2 5452 sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h 70:#define __NR_fchmodat2 4452 sysdeps/unix/sysv/linux/alpha/arch-syscall.h 59:#define __NR_fchmodat2 562 I tested the change by adding the diff below as patch to `pkgs/tools/package-management/nix/common.nix` & then built a VM from the following config using my dirty nixpkgs master: { vm = { pkgs, ... }: { virtualisation.writableStore = true; virtualisation.memorySize = 8192; virtualisation.diskSize = 12 * 1024; nix.package = pkgs.nixVersions.nix_2_21; }; } The original issue can be triggered via nix build -L github:nixos/nixpkgs/d6dc19adbda4fd92fe9a332327a8113eaa843894#lxc \ --extra-experimental-features 'nix-command flakes' however the problem disappears with this patch applied. Closes NixOS#10424 [1] NixOS/nixpkgs#300635 (comment) [2] NixOS/nixpkgs#300635 (comment)
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that isn't filtered away by the libseccomp sandbox. Being able to use this to bypass that restriction has surprising results for some builds such as lxc[1]: > With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2, > which slips through https://github.com/NixOS/nix/blob/9b88e5284608116b7db0dbd3d5dd7a33b90d52d7/src/libstore/build/local-derivation-goal.cc#L1650-L1663. > The fixupPhase then uses fchmodat, which fails. > With older kernel or glibc, setting the suid bit fails in the > install phase, which is not treated as fatal, and then the > fixup phase does not try to set it again. Please note that there are still ways to bypass this sandbox[2] and this is mostly a fix for the breaking builds. This change works by creating a syscall filter for the `fchmodat2` syscall (number 452 on most systems). The problem is that glibc 2.39 and seccomp 2.5.5 are needed to have the correct syscall number available via `__NR_fchmodat2` / `__SNR_fchmodat2`, but this flake is still on nixpkgs 23.11. To have this change everywhere and not dependent on the glibc this package is built against, I added a header "fchmodat2-compat.hh" that sets the syscall number based on the architecture. On most platforms its 452 according to glibc with a few exceptions: $ rg --pcre2 'define __NR_fchmodat2 (?!452)' sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h 58:#define __NR_fchmodat2 1073742276 sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h 67:#define __NR_fchmodat2 6452 sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h 62:#define __NR_fchmodat2 5452 sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h 70:#define __NR_fchmodat2 4452 sysdeps/unix/sysv/linux/alpha/arch-syscall.h 59:#define __NR_fchmodat2 562 I tested the change by adding the diff below as patch to `pkgs/tools/package-management/nix/common.nix` & then built a VM from the following config using my dirty nixpkgs master: { vm = { pkgs, ... }: { virtualisation.writableStore = true; virtualisation.memorySize = 8192; virtualisation.diskSize = 12 * 1024; nix.package = pkgs.nixVersions.nix_2_21; }; } The original issue can be triggered via nix build -L github:nixos/nixpkgs/d6dc19adbda4fd92fe9a332327a8113eaa843894#lxc \ --extra-experimental-features 'nix-command flakes' however the problem disappears with this patch applied. Closes NixOS#10424 [1] NixOS/nixpkgs#300635 (comment) [2] NixOS/nixpkgs#300635 (comment) (cherry picked from commit ba68045)
Previously, system call filtering (to prevent builders from storing files with setuid/setgid permission bits or extended attributes) was performed using a blocklist. While this looks simple at first, it actually carries significant security and maintainability risks: after all, the kernel may add new syscalls to achieve the same functionality one is trying to block, and it can even be hard to actually add the syscall to the blocklist when building against a C library that doesn't know about it yet. For a recent demonstration of this happening in practice to Nix, see the introduction of fchmodat2 [0] [1]. The allowlist approach does not share the same drawback. While it does require a rather large list of harmless syscalls to be maintained in the codebase, failing to update this list (and roll out the update to all users) in time has rather benign effects; at worst, very recent programs that already rely on new syscalls will fail with an error the same way they would on a slightly older kernel that doesn't support them yet. Most importantly, no unintended new ways of performing dangerous operations will be silently allowed. Another possible drawback is reduced system call performance due to the larger filter created by the allowlist requiring more computation [2]. However, this issue has not convincingly been demonstrated yet in practice, for example in systemd or various browsers. To the contrary, it has been measured that the the actual filter constructed here has approximately the same overhead as a very simple filter blocking only one system call. This commit tries to keep the behavior as close to unchanged as possible. The system call list is in line with libseccomp 2.5.5 and glibc 2.39, which are the latest versions at the point of writing. Since libseccomp 2.5.5 is already a requirement and the distributions shipping this together with older versions of glibc are mostly not a thing any more, this should not lead to more build failures any more. [0] NixOS/nixpkgs#300635 [1] NixOS/nix#10424 [2] flatpak/flatpak#4462 (comment) Change-Id: I541be3ea9b249bcceddfed6a5a13ac10b11e16ad
Describe the bug
To quote from NixOS/nixpkgs#300635 (comment):
As @lf- points out, the use of the seccomp sandbox against creating suid binaries is questionable (NixOS/nixpkgs#300635 (comment)) since the restriction can be bypassed easily by using the
open(2)
syscall for the exact same purpose and the permissions will be wiped later on.What I'd like to achieve is that we get a fix for that scenario to make sure people aren't running into the problem when upgrading to 24.05. The builds from nixpkgs appear fixed (or actually, worked around), but a lot of people will also build custom stuff.
If you agree on removing the chmod restriction, that's one way forward.
If that requires further discussion, I'll file a patch to add fchmodat2 to
nix/src/libstore/build/local-derivation-goal.cc
Lines 1650 to 1663 in 8b16cce
cc @NixOS/nix-team
cc @vcunat
cc @lf-
Priorities
Add 👍 to issues you find important.
The text was updated successfully, but these errors were encountered: