test/system: Test a container with an old forward incompatible runtime #1187

debarshiray · 2022-12-05T20:17:29Z

Commit ae43560 had added a test with a similar intention. When the test suite is run on a Fedora Rawhide host, it tests whether the containers for the two previous stable Fedora releases start or not. Fedora N-2 reaches End of Life 4 weeks after Fedora N is released [1]. So, testing the containers for Fedora Rawhide and the two previous stable releases on a Fedora Rawhide host is a decent test of general backwards compatibility.

However, as seen recently [2], this isn't enough to catch some known ABI compatibility issues [3,4]. These involve toolbox binaries built on hosts with newer toolchains that aren't meant to be run against containers with older runtimes. A targeted test is needed to defend against these scenarios.

The fedora-toolbox:34 image has glibc-2.33, which is old enough to be unable to run binaries compiled on Fedora 35 with glibc-2.34 and newer.

[1] https://docs.fedoraproject.org/en-US/releases/

[2] #1180

[3] Commit 6063eb2
#821

[4] Commit 6ad9c63
#529

softwarefactory-project-zuul · 2022-12-05T20:51:28Z

Build succeeded.

✔️ unit-test SUCCESS in 8m 22s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 8m 19s
✔️ system-test-fedora-rawhide SUCCESS in 33m 04s
✔️ system-test-fedora-36 SUCCESS in 11m 00s
✔️ system-test-fedora-35 SUCCESS in 12m 04s

Fedora 32 reached End of Life on 25th May 2021: https://docs.fedoraproject.org/en-US/releases/eol/ That's quite old because right now Fedora 35 is nearing its End of Life. Since the tests are intended for Toolbx, not the Fedora infrastructure, it will be better to use a newer image, because images that are too old can get lost from registry.fedoraproject.org. The fedora-toolbox:34 image can be a drop-in replacement for the fedora-toolbox:32 image for the purposes of this test suite, and has the advantage of being newer. Note that fedora-toolbox:34 is also old enough to test that the toolbox binary runs against it's build-time ABI from the host, and not the Toolbx container's ABI, when it's invoked as the entry point of the container [1,2]. This is important because the subsequent commit will add a test to ensure that. [1] Commit 6063eb2 containers#821 [2] Commit 6ad9c63 containers#529 containers#1187

Commit ae43560 had added a test with a similar intention. When the test suite is run on a Fedora Rawhide host, it tests whether the containers for the two previous stable Fedora releases start or not. Fedora N-2 reaches End of Life four weeks after Fedora N is released. So, testing the containers for Fedora Rawhide and the two previous stable releases on a Fedora Rawhide host is a decent test of general backwards compatibility. However, as seen recently [1], this isn't enough to catch some known ABI compatibility issues [2,3]. These involve toolbox binaries built on hosts with newer toolchains that aren't meant to be run against containers with older runtimes. A targeted test is needed to defend against these scenarios. The fedora-toolbox:34 image has glibc-2.33, which is old enough to be unable to run binaries compiled on Fedora 35 with glibc-2.34 and newer. [1] containers#1180 [2] Commit 6063eb2 containers#821 [3] Commit 6ad9c63 containers#529 https://docs.fedoraproject.org/en-US/releases/ containers#1187

Commit ae43560 had added a test with a similar intention. When the test suite is run on a Fedora Rawhide host, it tests whether the containers for the two previous stable Fedora releases start or not. Fedora N-2 reaches End of Life 4 weeks after Fedora N is released [1]. So, testing the containers for Fedora Rawhide and the two previous stable releases on a Fedora Rawhide host is a decent test of general backwards compatibility. However, as seen recently [2], this isn't enough to catch some known ABI compatibility issues [3,4]. These involve toolbox binaries built on hosts with newer toolchains that aren't meant to be run against containers with older runtimes. A targeted test is needed to defend against these scenarios. The fedora-toolbox:34 image has glibc-2.33, which is old enough to be unable to run binaries compiled on Fedora 35 with glibc-2.34 and newer. [1] https://docs.fedoraproject.org/en-US/releases/ [2] containers#1180 [3] Commit 6063eb2 containers#821 [4] Commit 6ad9c63 containers#529 containers#1187

softwarefactory-project-zuul · 2022-12-07T12:55:14Z

Build succeeded.

✔️ unit-test SUCCESS in 9m 04s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 8m 55s
✔️ system-test-fedora-rawhide SUCCESS in 30m 37s
✔️ system-test-fedora-36 SUCCESS in 12m 26s
✔️ system-test-fedora-35 SUCCESS in 13m 11s

akdev1l · 2022-12-07T14:53:59Z

It seems like you are going to end up playing whack-a-mole with any foreseeable ABI incompatibility

I don't think the binary patching that is being done currently is supported by glibc and it seems likely to introduce all sorts of unintended consequences.

If toolbox is meant to run containerized and support multiple distributions then it should really not depend on the host ABI.

Depending on CLI and POSIX has turned out ok for the sister project distrobox - I don't see why we can't do the same here.

just my 2¢

debarshiray · 2022-12-07T15:07:48Z

It seems like you are going to end up playing whack-a-mole with any
foreseeable ABI incompatibility

I don't think the binary patching that is being done currently is
supported by glibc and it seems likely to introduce all sorts of
unintended consequences.

At least some of the GNU C Library maintainers are aware of this use-case.

If toolbox is meant to run containerized and support multiple
distributions then it should really not depend on the host ABI.

Have you seen this blog post?

Depending on CLI and POSIX has turned out ok for the sister project
distrobox - I don't see why we can't do the same here.

Are you proposing that Toolbx go back to the POSIX shell implementation?

I am reluctant to do that for various reasons, but before we get into that, did you notice that Toolbx actually used to be implemented in POSIX shell?

akdev1l · 2022-12-07T15:29:09Z

Have you seen this blog post?

I have indeed. I think the original argument for the binary patching is very weak tbh.

However, that would prevent us from using glibc’s Name Service Switch to look up usernames and groups, or to resolve host names.

Not sure why toolbx would need to resolve names by itself as the images are pulled via podman pull - and if I remember correctly last time we spoke about this you mentioned the subuid/subgid support but then adding that was complicated by the binary patching itself.

Personally I don't see why toolbx should have network access as all the network interactions ought to be done by podman and not toolbx itself. (if you can see a flaw in this reasoning I would appreciate if you point it out)

Are you proposing that Toolbx go back to the POSIX shell implementation?

No, I am proposing that of trying to use CGO we would just shell out to the respective utilities - the same way it is currently done for podman (https://github.com/containers/toolbox/blob/main/src/cmd/create.go#L473)

For example that would mean using option number 1 in #1180

Attempt to shell out to getsubids and parse the result (or at least check the exit status).

Edit: ideally it would also mean revisiting #832

mhjacks · 2022-12-12T15:51:11Z

As far as subids go, we also have the option of not doing the check at all, and letting podman throw the error if it's going to.

debarshiray · 2022-12-14T14:26:49Z

However, that would prevent us from using glibc’s Name Service Switch to look up usernames and groups, or to resolve host names.

Not sure why toolbx would need to resolve names by itself

Off the top of my head:

To prompt the user about missing updates and containers that have reached End of Life.
To provide better progress reporting about image downloads. There's an inherent tension between keeping Toolbx's UI/UX clean and just being a Podman wrapper. I won't be surprised if we end up pulling the images in-process at some point.

Some of these can be found under the Project Pickle label.

as the images are pulled via podman pull

That's only true if you look at the code base today. It might not hold true in the future.

and if I remember correctly last time we spoke
about this you mentioned the subuid/subgid support but then adding that
was complicated by the binary patching itself.

Umm... I can't remember adding that last part, but I may be misremembering.

Are you proposing that Toolbx go back to the POSIX shell implementation?

No, I am proposing that of trying to use CGO we would just shell
out to the respective utilities - the same way it is currently done for
podman (https://github.com/containers/toolbox/blob/main/src/cmd/create.go#L473)

For example that would mean using option number 1 in #1180

Shelling out (or more correctly, fork-exec) is inherently slower than calling a function within the same process. The need to potentially migrate already forces us to do some, otherwise needless, file operations on every invocation of toolbox(1). If Podman was updated, it will also involve shelling out to podman system migrate. Shelling out getsubids(1) will add to this load.

Currently we have people still complaining about performance; and there is an urge to push Toolbx closer to being the default shell on OSes like Fedora Silverblue. So, performance matters, and we need to be careful about what we add to the common code paths.

Secondly, Toolbx's target audience doesn't want to know anything about containers. They just want an interactive command line environment that's familiar to them. So, keeping the UI/UX clean matters.

We need to somehow balance those two. One option is to just not check for anything at all and let Podman error out, except most users will be facing an error message that's hard to understand. It can be argued that we can assume everybody has users created with a shadow-utils that's new enough to always define the subordinate ID ranges, but then you can't say that about --preserve-fds. Plus, this is only true for the code base today. Nobody knows what other use-case will show up in the future.

In general, it does seem limiting to constrain ourselves to only shelling out to other binaries, parsing their outputs and use exit codes for error handling. Using a static binary or not using CGO eventually leads to cases where Toolbx might need a custom helper binary. How will we manage the compatibility between the different parts then?

mhjacks · 2022-12-14T14:56:44Z

From my perspective, I'm not certain that toolbx is going to add an enormous amount of value to explain podman errors. If a user has a situation where podman is going to error out, they're going to have to deal with containers, pure and simple; toolbx can at best reference a man page; but over time even that may provide different advice than podman's documented error handling.

I'm not sure how common these kinds of cases are - generally, I would expect that anyone using FreeIPA subid's (since they're not the default, or if they are they're new enough that they require some intervention to set up) would be a bit more advanced, since the kinds of users that want FreeIPA need to get more in the weeds about UIDs, GIDs anyway. I'm very leery of penalizing the whole project for what feels like a very niche use case - but at the same time, the current behavior is a blocker for FreeIPA users, because toolbx is erroring when it shouldn't.

To me, that says that toolbx's approach of "hmm, there was an error, run podman {verbosely} to learn more" is the right one. It's forward portable, and backward portable, and it doesn't require toolbx to keep state on those situations.

debarshiray · 2022-12-15T22:05:06Z

Let's not fragment the discussion across too many different places. It will be good to consolidate the discussion around subordinate IDs and enterprise set-ups in #1180

TL;DR: dlopen(3) solves the current problem, worst case we call getsubids(1).

debarshiray marked this pull request as draft December 5, 2022 20:17

debarshiray changed the title ~~test/system: Ensure that binaries are run against their build-time ABI~~ [WIP] test/system: Ensure that binaries are run against their build-time ABI Dec 5, 2022

debarshiray force-pushed the wip/rishi/test-system-ABI-PT_INTERP-RPATH-RUNPATH branch from 5b66aa2 to eee8a47 Compare December 7, 2022 11:58

debarshiray force-pushed the wip/rishi/test-system-ABI-PT_INTERP-RPATH-RUNPATH branch from eee8a47 to 50ff619 Compare December 7, 2022 12:01

debarshiray marked this pull request as ready for review December 7, 2022 12:01

debarshiray changed the title ~~[WIP] test/system: Ensure that binaries are run against their build-time ABI~~ test/system: Ensure that binaries are run against their build-time ABI Dec 7, 2022

debarshiray force-pushed the wip/rishi/test-system-ABI-PT_INTERP-RPATH-RUNPATH branch from 50ff619 to aeb5d8e Compare December 7, 2022 12:05

debarshiray changed the title ~~test/system: Ensure that binaries are run against their build-time ABI~~ test/system: Test a container with an old forward incompatible runtime Dec 7, 2022

debarshiray mentioned this pull request Dec 7, 2022

Enhance subid check by using shadow library function #1180

Closed

debarshiray merged commit aeb5d8e into containers:main Dec 7, 2022

debarshiray deleted the wip/rishi/test-system-ABI-PT_INTERP-RPATH-RUNPATH branch December 7, 2022 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test/system: Test a container with an old forward incompatible runtime #1187

test/system: Test a container with an old forward incompatible runtime #1187

debarshiray commented Dec 5, 2022 •

edited

Loading

softwarefactory-project-zuul bot commented Dec 5, 2022

softwarefactory-project-zuul bot commented Dec 7, 2022

akdev1l commented Dec 7, 2022

debarshiray commented Dec 7, 2022

akdev1l commented Dec 7, 2022 •

edited

Loading

mhjacks commented Dec 12, 2022

debarshiray commented Dec 14, 2022 •

edited

Loading

mhjacks commented Dec 14, 2022

debarshiray commented Dec 15, 2022

test/system: Test a container with an old forward incompatible runtime #1187

test/system: Test a container with an old forward incompatible runtime #1187

Conversation

debarshiray commented Dec 5, 2022 • edited Loading

softwarefactory-project-zuul bot commented Dec 5, 2022

softwarefactory-project-zuul bot commented Dec 7, 2022

akdev1l commented Dec 7, 2022

debarshiray commented Dec 7, 2022

akdev1l commented Dec 7, 2022 • edited Loading

mhjacks commented Dec 12, 2022

debarshiray commented Dec 14, 2022 • edited Loading

mhjacks commented Dec 14, 2022

debarshiray commented Dec 15, 2022

debarshiray commented Dec 5, 2022 •

edited

Loading

akdev1l commented Dec 7, 2022 •

edited

Loading

debarshiray commented Dec 14, 2022 •

edited

Loading