Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize forwarding crashes to containers #102

Open
bdrung opened this issue May 6, 2024 · 13 comments
Open

Standardize forwarding crashes to containers #102

bdrung opened this issue May 6, 2024 · 13 comments

Comments

@bdrung
Copy link

bdrung commented May 6, 2024

There are different crash dump handler like systemd-coredump and Apport available. In case a process crashes inside a container, the crash dump handler on the host receives the crash and needs to forward the crash into the container. This crash forwarding works if the same handler is present on the host and in the container (e.g. systemd-coredump on the host and systemd-coredump in the container or Apport on the host and Apport in the container). If the crash dump handler in the container differs from the handler on the host, the forwarding will not work (see systemd-coredump handler does not forward the crash to the container for example).

To make forwarding crashes work in all different scenarios, please standardize the way of forwarding crashes to containers. I suggest to specify the location of a socket in the container and how the needed information (like crashed process ID) is sent to the socket.

@bluca
Copy link
Member

bluca commented May 6, 2024

needs to forward the crash into the container

Citation needed. In most cases, that's actually not true, the container might not even exist anymore when the crashdump is received.

@bdrung
Copy link
Author

bdrung commented May 6, 2024

needs to forward the crash into the container

Citation needed. In most cases, that's actually not true, the container might not even exist anymore when the crashdump is received.

One example: The test case mentioned in the bug description in https://bugs.launchpad.net/ubuntu/+source/apport/+bug/2063349. Another example: autopkgtest runners on Ubuntu armhf. Do you have examples where containers are destroyed when one process crashes inside?

@bluca
Copy link
Member

bluca commented May 6, 2024

Anything single process, and anything that is closed or upgraded after the crash. Containers are ephemeral and volatile by definition. What's the point in doing this forwarding at all?

@schopin-pro
Copy link

Without getting into the weeds of why this is useful, one could just note that there are at least two crash dump handlers that grew that capability independently(apport and systemd-coredump), which IMHO is a good indication that there is an actual need here.

I agree that the single-process container is a common pattern, but it's not the only use case for containers. So, assuming the container survives the crash and has a crash handler installed, it will get much more out of the crash dump than the host's handler, since it knows about the details of the containers. Think running a Ubuntu container on a Fedora host.

@bluca
Copy link
Member

bluca commented May 6, 2024

Think running a Ubuntu container on a Fedora host.

Have you seen https://systemd.io/ELF_PACKAGE_METADATA/ ? I should probably move that here.

I've already mentioned this to @enr0n please consider enabling that spec distro-wide in Ubuntu, so that the host can get all the information from a crash in the guest without any need for communication, but simply by parsing the core file. Fedora already implements it, so if you try the opposite (crash fedora guest in ubuntu host) coredumpctl on the host will give you at lot of info.

In fact several packages in Debian/Ubuntu already use it, including all systemd ones, so if you crash any of those they'll already contain the info. This is done on a package-by-package opt-in basis, for a distro-wide debhelper change see: https://salsa.debian.org/debian/debhelper/-/merge_requests/98 (unfortunately going nowhere in Debian due to dpkg politics, but this shouldn't be a problem for you)

@bdrung
Copy link
Author

bdrung commented May 6, 2024

I have seen https://systemd.io/ELF_PACKAGE_METADATA/ and it has been on my todo wish list for a long time. Thanks for the pointer to https://salsa.debian.org/debian/debhelper/-/merge_requests/98. I'll read the discussion there. If there are no technical reasons against the proposed implementation, we could carry this delta in Ubuntu to add the ELF metadata by default.

@bluca
Copy link
Member

bluca commented May 6, 2024

That would be very nice, thanks

@bdrung
Copy link
Author

bdrung commented May 6, 2024

I read enough for today. @bluca since you submitted https://salsa.debian.org/debian/debhelper/-/merge_requests/98 are you willing to submit against dpkg-buildflags? Since we are already carrying some changes for dpkg in Ubuntu, I doubt that this additional delta will be a problem.

@bluca
Copy link
Member

bluca commented May 7, 2024

Yes I can look into that in the next few days

@bluca
Copy link
Member

bluca commented May 11, 2024

@bdrung here's a PR: https://code.launchpad.net/~bluca/ubuntu/+source/dpkg/+git/dpkg/+merge/465957
tested with a package build on noble, seems to work as intended

@poettering
Copy link
Collaborator

There are different crash dump handler like systemd-coredump and Apport available. In case a process crashes inside a container, the crash dump handler on the host receives the crash and needs to forward the crash into the container. This crash forwarding works if the same handler is present on the host and in the container (e.g. systemd-coredump on the host and systemd-coredump in the container or Apport on the host and Apport in the container). If the crash dump handler in the container differs from the handler on the host, the forwarding will not work (see systemd-coredump handler does not forward the crash to the container for example).

To make forwarding crashes work in all different scenarios, please standardize the way of forwarding crashes to containers. I suggest to specify the location of a socket in the container and how the needed information (like crashed process ID) is sent to the socket.

Let me ask one question: wouldn't a nicer option to maybe switch to systemd-coredump as backend for Apport?

I mean, that's what everyone else ended up doing, for example rh's abrt: they let systemd-coredump to the initial dirty work and then hook into it at a later step. Is there anything that the coredump collection logic in apport can do that systemd-coredump cannot do anyway?

I mean, systemd-coredump as really nice features, such as the sandboxing and backtrace exraction and stuff, or the container forwarding.

To me it appears like a much simpler approach.

I mean, i can very much understand why you want that, i.e. in particular for closing the loop in CIs and suchlike, that they can get access to their own crashes. But I am a bit reluctant to commit to a generic API for this, as we tend to interpet certain things (rlimit_core) a bit differently from others, and hence our handlers get called differently from others. I think we can commit to compat between differently versioned containers and hosts to some degree, but I am a bit conservative in commiting to more than that on this interface.

@bluca
Copy link
Member

bluca commented May 27, 2024

@bdrung here's a PR: https://code.launchpad.net/~bluca/ubuntu/+source/dpkg/+git/dpkg/+merge/465957 tested with a package build on noble, seems to work as intended

@bdrung any update on that MR?

@bdrung
Copy link
Author

bdrung commented Jun 14, 2024

Let me ask one question: wouldn't a nicer option to maybe switch to systemd-coredump as backend for Apport?

Apport in Ubuntu 24.04 gained support for using systemd-coredump as backend: https://discourse.ubuntu.com/t/apport-2-28-0-gained-systemd-coredump-integration/44910

I mean, i can very much understand why you want that, i.e. in particular for closing the loop in CIs and suchlike, that they can get access to their own crashes. But I am a bit reluctant to commit to a generic API for this, as we tend to interpet certain things (rlimit_core) a bit differently from others, and hence our handlers get called differently from others. I think we can commit to compat between differently versioned containers and hosts to some degree, but I am a bit conservative in commiting to more than that on this interface.

The problem with switching the Apport's backend from Apport to systemd-coredump is backward/forward compatibility: Let's assume we have following basic installations:

  • old host: Apport, no systemd-coredump
  • old container: Apport, no systemd-coredump
  • new host: systemd-coredump as crash handler, Apport with systemd-coredump backend
  • new container: systemd-coredump and Apport with systemd-coredump backend installed
host container current outcome
old old works via Apport socket
old new works via Apport socket
new old does not work due to missing systemd-coredump in container
new new works via systemd-coredump socket

So we need a solution for a new host with old containers. Changing the default setup of the old containers to include systemd-coredump is too invasive. We could modify Apport to be able to read the crash information from systemd-coredump from the host via a socket. That's this ticket about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants