Standardize forwarding crashes to containers #102

bdrung · 2024-05-06T16:03:45Z

There are different crash dump handler like systemd-coredump and Apport available. In case a process crashes inside a container, the crash dump handler on the host receives the crash and needs to forward the crash into the container. This crash forwarding works if the same handler is present on the host and in the container (e.g. systemd-coredump on the host and systemd-coredump in the container or Apport on the host and Apport in the container). If the crash dump handler in the container differs from the handler on the host, the forwarding will not work (see systemd-coredump handler does not forward the crash to the container for example).

To make forwarding crashes work in all different scenarios, please standardize the way of forwarding crashes to containers. I suggest to specify the location of a socket in the container and how the needed information (like crashed process ID) is sent to the socket.

bluca · 2024-05-06T16:09:32Z

needs to forward the crash into the container

Citation needed. In most cases, that's actually not true, the container might not even exist anymore when the crashdump is received.

bdrung · 2024-05-06T16:14:48Z

needs to forward the crash into the container

Citation needed. In most cases, that's actually not true, the container might not even exist anymore when the crashdump is received.

One example: The test case mentioned in the bug description in https://bugs.launchpad.net/ubuntu/+source/apport/+bug/2063349. Another example: autopkgtest runners on Ubuntu armhf. Do you have examples where containers are destroyed when one process crashes inside?

bluca · 2024-05-06T16:25:32Z

Anything single process, and anything that is closed or upgraded after the crash. Containers are ephemeral and volatile by definition. What's the point in doing this forwarding at all?

schopin-pro · 2024-05-06T16:47:13Z

Without getting into the weeds of why this is useful, one could just note that there are at least two crash dump handlers that grew that capability independently(apport and systemd-coredump), which IMHO is a good indication that there is an actual need here.

I agree that the single-process container is a common pattern, but it's not the only use case for containers. So, assuming the container survives the crash and has a crash handler installed, it will get much more out of the crash dump than the host's handler, since it knows about the details of the containers. Think running a Ubuntu container on a Fedora host.

bluca · 2024-05-06T17:21:28Z

Think running a Ubuntu container on a Fedora host.

Have you seen https://systemd.io/ELF_PACKAGE_METADATA/ ? I should probably move that here.

I've already mentioned this to @enr0n please consider enabling that spec distro-wide in Ubuntu, so that the host can get all the information from a crash in the guest without any need for communication, but simply by parsing the core file. Fedora already implements it, so if you try the opposite (crash fedora guest in ubuntu host) coredumpctl on the host will give you at lot of info.

In fact several packages in Debian/Ubuntu already use it, including all systemd ones, so if you crash any of those they'll already contain the info. This is done on a package-by-package opt-in basis, for a distro-wide debhelper change see: https://salsa.debian.org/debian/debhelper/-/merge_requests/98 (unfortunately going nowhere in Debian due to dpkg politics, but this shouldn't be a problem for you)

bdrung · 2024-05-06T20:27:52Z

I have seen https://systemd.io/ELF_PACKAGE_METADATA/ and it has been on my todo wish list for a long time. Thanks for the pointer to https://salsa.debian.org/debian/debhelper/-/merge_requests/98. I'll read the discussion there. If there are no technical reasons against the proposed implementation, we could carry this delta in Ubuntu to add the ELF metadata by default.

bluca · 2024-05-06T20:53:04Z

That would be very nice, thanks

bdrung · 2024-05-06T21:11:07Z

I read enough for today. @bluca since you submitted https://salsa.debian.org/debian/debhelper/-/merge_requests/98 are you willing to submit against dpkg-buildflags? Since we are already carrying some changes for dpkg in Ubuntu, I doubt that this additional delta will be a problem.

bluca · 2024-05-07T00:08:51Z

Yes I can look into that in the next few days

bluca · 2024-05-11T01:45:20Z

@bdrung here's a PR: https://code.launchpad.net/~bluca/ubuntu/+source/dpkg/+git/dpkg/+merge/465957
tested with a package build on noble, seems to work as intended

poettering · 2024-05-27T16:32:23Z

There are different crash dump handler like systemd-coredump and Apport available. In case a process crashes inside a container, the crash dump handler on the host receives the crash and needs to forward the crash into the container. This crash forwarding works if the same handler is present on the host and in the container (e.g. systemd-coredump on the host and systemd-coredump in the container or Apport on the host and Apport in the container). If the crash dump handler in the container differs from the handler on the host, the forwarding will not work (see systemd-coredump handler does not forward the crash to the container for example).

To make forwarding crashes work in all different scenarios, please standardize the way of forwarding crashes to containers. I suggest to specify the location of a socket in the container and how the needed information (like crashed process ID) is sent to the socket.

Let me ask one question: wouldn't a nicer option to maybe switch to systemd-coredump as backend for Apport?

I mean, that's what everyone else ended up doing, for example rh's abrt: they let systemd-coredump to the initial dirty work and then hook into it at a later step. Is there anything that the coredump collection logic in apport can do that systemd-coredump cannot do anyway?

I mean, systemd-coredump as really nice features, such as the sandboxing and backtrace exraction and stuff, or the container forwarding.

To me it appears like a much simpler approach.

I mean, i can very much understand why you want that, i.e. in particular for closing the loop in CIs and suchlike, that they can get access to their own crashes. But I am a bit reluctant to commit to a generic API for this, as we tend to interpet certain things (rlimit_core) a bit differently from others, and hence our handlers get called differently from others. I think we can commit to compat between differently versioned containers and hosts to some degree, but I am a bit conservative in commiting to more than that on this interface.

bluca · 2024-05-27T18:15:05Z

@bdrung here's a PR: https://code.launchpad.net/~bluca/ubuntu/+source/dpkg/+git/dpkg/+merge/465957 tested with a package build on noble, seems to work as intended

@bdrung any update on that MR?

bdrung · 2024-06-14T11:17:49Z

Let me ask one question: wouldn't a nicer option to maybe switch to systemd-coredump as backend for Apport?

Apport in Ubuntu 24.04 gained support for using systemd-coredump as backend: https://discourse.ubuntu.com/t/apport-2-28-0-gained-systemd-coredump-integration/44910

I mean, i can very much understand why you want that, i.e. in particular for closing the loop in CIs and suchlike, that they can get access to their own crashes. But I am a bit reluctant to commit to a generic API for this, as we tend to interpet certain things (rlimit_core) a bit differently from others, and hence our handlers get called differently from others. I think we can commit to compat between differently versioned containers and hosts to some degree, but I am a bit conservative in commiting to more than that on this interface.

The problem with switching the Apport's backend from Apport to systemd-coredump is backward/forward compatibility: Let's assume we have following basic installations:

old host: Apport, no systemd-coredump
old container: Apport, no systemd-coredump
new host: systemd-coredump as crash handler, Apport with systemd-coredump backend
new container: systemd-coredump and Apport with systemd-coredump backend installed

host	container	current outcome
old	old	works via Apport socket
old	new	works via Apport socket
new	old	does not work due to missing systemd-coredump in container
new	new	works via systemd-coredump socket

So we need a solution for a new host with old containers. Changing the default setup of the old containers to include systemd-coredump is too invasive. We could modify Apport to be able to read the crash information from systemd-coredump from the host via a socket. That's this ticket about.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize forwarding crashes to containers #102

Standardize forwarding crashes to containers #102

bdrung commented May 6, 2024

bluca commented May 6, 2024

bdrung commented May 6, 2024

bluca commented May 6, 2024

schopin-pro commented May 6, 2024

bluca commented May 6, 2024

bdrung commented May 6, 2024

bluca commented May 6, 2024

bdrung commented May 6, 2024

bluca commented May 7, 2024

bluca commented May 11, 2024

poettering commented May 27, 2024

bluca commented May 27, 2024

bdrung commented Jun 14, 2024

Standardize forwarding crashes to containers #102

Standardize forwarding crashes to containers #102

Comments

bdrung commented May 6, 2024

bluca commented May 6, 2024

bdrung commented May 6, 2024

bluca commented May 6, 2024

schopin-pro commented May 6, 2024

bluca commented May 6, 2024

bdrung commented May 6, 2024

bluca commented May 6, 2024

bdrung commented May 6, 2024

bluca commented May 7, 2024

bluca commented May 11, 2024

poettering commented May 27, 2024

bluca commented May 27, 2024

bdrung commented Jun 14, 2024