-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIGSEV (signal 11 error) with Go 1.22.0 and Ubuntu 20.04 / Debian 10 #2677
Comments
This...
Indicates that there is a segmentation fault. Please check the output of the I believe this is probably related to an issue reported against Go 1.22.0 - golang/go#65625 that is affecting other container runtime projects also (incus / runc). Please try building Singularity with Go 1.21.7 from https://go.dev/dl/ instead. |
The bug itself is not in Go - it's in glibc, but this is difficult to avoid: |
dmesg message will try building with different version of GO |
@gregorex333 - thanks. I believe the dmesg output there confirms it is the same issue. |
You seem correct. Changing GO installation and reinstalling has allowed the build to complete with exist status 0. VERBOSE [U=0,P=283260] Full() Build complete: /home/giovannini/sandbox/ubuntu |
Adapted from: opencontainers/runc#4247 Execution of a container using a PID namespace can fail on certain versions of glibc when Singularity is built with Go 1.22. This is due to Go 1.22 performing calls using pthread_self which, from glibc 2.25, is not updated for the current TID on clone. Fixes sylabs#2677 ----- Original runc explanation: Since glibc 2.25, the thread-local cache of the current TID is no longer updated in the child when calling clone(2). This results in very unfortunate behaviour when Go does pthread calls using pthread_self(), which has the wrong TID stored. The "simple" solution is to forcefully overwrite this cached value. Unfortunately (and unsurprisingly), the layout of "struct pthread" is strictly private and can change without warning. Luckily, glibc (currently) uses CLONE_CHILD_CLEARTID for all forks (with the child_tid set to the cached &PTHREAD_SELF->tid), meaning that as long as runc is using glibc, when "runc init" is spawned the child process will have a pointer directly to the cached value we want to change. With CONFIG_CHECKPOINT_RESTORE=y kernels on Linux 3.5 and later, we can simply use prctl(PR_GET_TID_ADDRESS). For older kernels we need to memory scan the TLS structure (pthread_self() returns a pointer to the start of the structure so we can "just" scan it for a field containing the current TID and assume that it is the correct field). Obviously this is all very horrific, and if you are reading this in the future, it almost certainly has caused some horrific bug that I did not forsee. Sorry about that. As far as I can tell, there is no other workable solution that doesn't also depend on the CLONE_CHILD_CLEARTID behaviour of glibc in some way. We cannot "just" do a re-exec after clone(2) for security reasons. Fixes opencontainers/runc#4233 Signed-off-by: Aleksa Sarai cyphar@cyphar.com
Adapted from: opencontainers/runc#4247 Execution of a container using a PID namespace can fail on certain versions of glibc when Singularity is built with Go 1.22. This is due to Go 1.22 performing calls using pthread_self which, from glibc 2.25, is not updated for the current TID on clone. Fixes sylabs#2677 ----- Original runc explanation: Since glibc 2.25, the thread-local cache of the current TID is no longer updated in the child when calling clone(2). This results in very unfortunate behaviour when Go does pthread calls using pthread_self(), which has the wrong TID stored. The "simple" solution is to forcefully overwrite this cached value. Unfortunately (and unsurprisingly), the layout of "struct pthread" is strictly private and can change without warning. Luckily, glibc (currently) uses CLONE_CHILD_CLEARTID for all forks (with the child_tid set to the cached &PTHREAD_SELF->tid), meaning that as long as runc is using glibc, when "runc init" is spawned the child process will have a pointer directly to the cached value we want to change. With CONFIG_CHECKPOINT_RESTORE=y kernels on Linux 3.5 and later, we can simply use prctl(PR_GET_TID_ADDRESS). For older kernels we need to memory scan the TLS structure (pthread_self() returns a pointer to the start of the structure so we can "just" scan it for a field containing the current TID and assume that it is the correct field). Obviously this is all very horrific, and if you are reading this in the future, it almost certainly has caused some horrific bug that I did not forsee. Sorry about that. As far as I can tell, there is no other workable solution that doesn't also depend on the CLONE_CHILD_CLEARTID behaviour of glibc in some way. We cannot "just" do a re-exec after clone(2) for security reasons. Fixes opencontainers/runc#4233 Signed-off-by: Aleksa Sarai cyphar@cyphar.com
Adapted from: opencontainers/runc#4247 Execution of a container using a PID namespace can fail on certain versions of glibc when Singularity is built with Go 1.22. This is due to Go 1.22 performing calls using pthread_self which, from glibc 2.25, is not updated for the current TID on clone. Fixes sylabs#2677 ----- Original runc explanation: Since glibc 2.25, the thread-local cache of the current TID is no longer updated in the child when calling clone(2). This results in very unfortunate behaviour when Go does pthread calls using pthread_self(), which has the wrong TID stored. The "simple" solution is to forcefully overwrite this cached value. Unfortunately (and unsurprisingly), the layout of "struct pthread" is strictly private and can change without warning. Luckily, glibc (currently) uses CLONE_CHILD_CLEARTID for all forks (with the child_tid set to the cached &PTHREAD_SELF->tid), meaning that as long as runc is using glibc, when "runc init" is spawned the child process will have a pointer directly to the cached value we want to change. With CONFIG_CHECKPOINT_RESTORE=y kernels on Linux 3.5 and later, we can simply use prctl(PR_GET_TID_ADDRESS). For older kernels we need to memory scan the TLS structure (pthread_self() returns a pointer to the start of the structure so we can "just" scan it for a field containing the current TID and assume that it is the correct field). Obviously this is all very horrific, and if you are reading this in the future, it almost certainly has caused some horrific bug that I did not forsee. Sorry about that. As far as I can tell, there is no other workable solution that doesn't also depend on the CLONE_CHILD_CLEARTID behaviour of glibc in some way. We cannot "just" do a re-exec after clone(2) for security reasons. Fixes opencontainers/runc#4233 Signed-off-by: Aleksa Sarai cyphar@cyphar.com
Adapted from: opencontainers/runc#4247 Execution of a container using a PID namespace can fail on certain versions of glibc when Singularity is built with Go 1.22. This is due to Go 1.22 performing calls using pthread_self which, from glibc 2.25, is not updated for the current TID on clone. Fixes sylabs#2677 ----- Original runc explanation: Since glibc 2.25, the thread-local cache of the current TID is no longer updated in the child when calling clone(2). This results in very unfortunate behaviour when Go does pthread calls using pthread_self(), which has the wrong TID stored. The "simple" solution is to forcefully overwrite this cached value. Unfortunately (and unsurprisingly), the layout of "struct pthread" is strictly private and can change without warning. Luckily, glibc (currently) uses CLONE_CHILD_CLEARTID for all forks (with the child_tid set to the cached &PTHREAD_SELF->tid), meaning that as long as runc is using glibc, when "runc init" is spawned the child process will have a pointer directly to the cached value we want to change. With CONFIG_CHECKPOINT_RESTORE=y kernels on Linux 3.5 and later, we can simply use prctl(PR_GET_TID_ADDRESS). For older kernels we need to memory scan the TLS structure (pthread_self() returns a pointer to the start of the structure so we can "just" scan it for a field containing the current TID and assume that it is the correct field). Obviously this is all very horrific, and if you are reading this in the future, it almost certainly has caused some horrific bug that I did not forsee. Sorry about that. As far as I can tell, there is no other workable solution that doesn't also depend on the CLONE_CHILD_CLEARTID behaviour of glibc in some way. We cannot "just" do a re-exec after clone(2) for security reasons. Fixes opencontainers/runc#4233 Signed-off-by: Aleksa Sarai cyphar@cyphar.com
Just noting that this will hopefully be solved in Go 1.22.4 by a backport of https://go-review.googlesource.com/c/go/+/587919 |
See * https://github.com/sylabs/singularity/releases/tag/v4.1.3 * sylabs/singularity#2677 * golang/go#65625 * https://dev.arvados.org/issues/21705#note-13 Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>
This is addressed by Go 1.22.4 - tested and confirmed. |
Version of Singularity
Singularity-CE 4.1.0
(also tried CE 4.1.1 with same result)
Describe the bug
Fails to mount tempfs or ramfs for "build --sandbox" mainly with this error: signal number 11
"VERBOSE [U=0,P=1] wait_child() rpc server interrupted by signal number 11
FATAL [U=0,P=9662] Master() container creation failed: mount tmpfs->/usr/local/var/singularity/mnt/session error: while mounting tmpfs: can't mount tmpfs filesystem to /usr/local/var/singularity/mnt/session: read unix @->@: read: connection reset by peer
: exit status 255"
To Reproduce
Reset my dual-boot Linux harddrive back to its initial conditions for Ubuntu focal 20.04.6 LTS
installed a small number of programs needed for using the image to be built from my custom definition
full list of commands / packages in attached file:
full_reproduction.odt
Install GO 1.22.0
Install dependencies listed at (https://docs.sylabs.io/guides/main/admin-guide/installation.html) for Ubuntu
Install singularity into usr/local with:
./mconfig &&
make -C ./builddir &&
sudo make -C ./builddir install
Run build -- sandbox on my custom definition file with a library for ubuntu:20.04 as its base
AS WELL AS on this basic test case library:
sudo singularity -d build --sandbox ubuntu/ library://ubuntu
Fails to mount with above error
Tried more by setting these env variables & with or without various config file settings listed
With or Without these env variables
export SINGULARITY_TMPDIR=/home/giovannini/sandbox/temp/tmp
export SINGULARITY_CACHEDIR=/home/giovannini/sandbox/temp/cache
(also tried with or without sudo -E)
Also tried with "mount fs" set to tempfs and ramfs.
All fail to mount at the same point.
Expected behavior
Expected to mount tempfs/ramfs to create a sandbox folder OR the .sif from my defintion
OS / Linux Distribution
Ubuntu focal 20.04.6 LTS
Installation Method
Install singularity into usr/local with:
./mconfig &&
make -C ./builddir &&
sudo make -C ./builddir install
using your github release source file for 4.1.0 (and previously 4.1.1 before resetting my OS)
Additional context
mount and cat /proc/self/mountinfo
and build config also in attached file:
mount_build_info.odt
DEBUG
Example Debug when attempting to sandbox my definition file.
Same point of failure for the other basic test case.
with or without these variables set
export SINGULARITY_TMPDIR=/home/giovannini/sandbox/temp/tmp
export SINGULARITY_CACHEDIR=/home/giovannini/sandbox/temp/cache
Debug Log.odt
The text was updated successfully, but these errors were encountered: