Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/system: Test a container with an old forward incompatible runtime #1187

Conversation

debarshiray
Copy link
Member

@debarshiray debarshiray commented Dec 5, 2022

Commit ae43560 had added a test with a similar intention. When the test suite is run on a Fedora Rawhide host, it tests whether the containers for the two previous stable Fedora releases start or not. Fedora N-2 reaches End of Life 4 weeks after Fedora N is released [1]. So, testing the containers for Fedora Rawhide and the two previous stable releases on a Fedora Rawhide host is a decent test of general backwards compatibility.

However, as seen recently [2], this isn't enough to catch some known ABI compatibility issues [3,4]. These involve toolbox binaries built on hosts with newer toolchains that aren't meant to be run against containers with older runtimes. A targeted test is needed to defend against these scenarios.

The fedora-toolbox:34 image has glibc-2.33, which is old enough to be unable to run binaries compiled on Fedora 35 with glibc-2.34 and newer.

[1] https://docs.fedoraproject.org/en-US/releases/

[2] #1180

[3] Commit 6063eb2
#821

[4] Commit 6ad9c63
#529

@debarshiray debarshiray marked this pull request as draft December 5, 2022 20:17
@debarshiray debarshiray changed the title test/system: Ensure that binaries are run against their build-time ABI [WIP] test/system: Ensure that binaries are run against their build-time ABI Dec 5, 2022
@softwarefactory-project-zuul
Copy link

Build succeeded.

✔️ unit-test SUCCESS in 8m 22s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 8m 19s
✔️ system-test-fedora-rawhide SUCCESS in 33m 04s
✔️ system-test-fedora-36 SUCCESS in 11m 00s
✔️ system-test-fedora-35 SUCCESS in 12m 04s

Fedora 32 reached End of Life on 25th May 2021:
https://docs.fedoraproject.org/en-US/releases/eol/

That's quite old because right now Fedora 35 is nearing its End of Life.

Since the tests are intended for Toolbx, not the Fedora infrastructure,
it will be better to use a newer image, because images that are too old
can get lost from registry.fedoraproject.org.  The fedora-toolbox:34
image can be a drop-in replacement for the fedora-toolbox:32 image for
the purposes of this test suite, and has the advantage of being newer.

Note that fedora-toolbox:34 is also old enough to test that the toolbox
binary runs against it's build-time ABI from the host, and not the
Toolbx container's ABI, when it's invoked as the entry point of the
container [1,2].  This is important because the subsequent commit will
add a test to ensure that.

[1] Commit 6063eb2
    containers#821

[2] Commit 6ad9c63
    containers#529

containers#1187
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Dec 7, 2022
Commit ae43560 had added a test with a similar intention.  When
the test suite is run on a Fedora Rawhide host, it tests whether the
containers for the two previous stable Fedora releases start or not.
Fedora N-2 reaches End of Life four weeks after Fedora N is released.
So, testing the containers for Fedora Rawhide and the two previous
stable releases on a Fedora Rawhide host is a decent test of general
backwards compatibility.

However, as seen recently [1], this isn't enough to catch some known
ABI compatibility issues [2,3].  These involve toolbox binaries built
on hosts with newer toolchains that aren't meant to be run against
containers with older runtimes.  A targeted test is needed to defend
against these scenarios.

The fedora-toolbox:34 image has glibc-2.33, which is old enough to be
unable to run binaries compiled on Fedora 35 with glibc-2.34 and newer.

[1] containers#1180

[2] Commit 6063eb2
    containers#821

[3] Commit 6ad9c63
    containers#529

https://docs.fedoraproject.org/en-US/releases/

containers#1187
@debarshiray debarshiray force-pushed the wip/rishi/test-system-ABI-PT_INTERP-RPATH-RUNPATH branch from 5b66aa2 to eee8a47 Compare December 7, 2022 11:58
debarshiray added a commit to debarshiray/toolbox that referenced this pull request Dec 7, 2022
Commit ae43560 had added a test with a similar intention.  When
the test suite is run on a Fedora Rawhide host, it tests whether the
containers for the two previous stable Fedora releases start or not.
Fedora N-2 reaches End of Life four weeks after Fedora N is released.
So, testing the containers for Fedora Rawhide and the two previous
stable releases on a Fedora Rawhide host is a decent test of general
backwards compatibility.

However, as seen recently [1], this isn't enough to catch some known
ABI compatibility issues [2,3].  These involve toolbox binaries built
on hosts with newer toolchains that aren't meant to be run against
containers with older runtimes.  A targeted test is needed to defend
against these scenarios.

The fedora-toolbox:34 image has glibc-2.33, which is old enough to be
unable to run binaries compiled on Fedora 35 with glibc-2.34 and newer.

[1] containers#1180

[2] Commit 6063eb2
    containers#821

[3] Commit 6ad9c63
    containers#529

https://docs.fedoraproject.org/en-US/releases/

containers#1187
@debarshiray debarshiray force-pushed the wip/rishi/test-system-ABI-PT_INTERP-RPATH-RUNPATH branch from eee8a47 to 50ff619 Compare December 7, 2022 12:01
@debarshiray debarshiray marked this pull request as ready for review December 7, 2022 12:01
@debarshiray debarshiray changed the title [WIP] test/system: Ensure that binaries are run against their build-time ABI test/system: Ensure that binaries are run against their build-time ABI Dec 7, 2022
Commit ae43560 had added a test with a similar intention.  When
the test suite is run on a Fedora Rawhide host, it tests whether the
containers for the two previous stable Fedora releases start or not.
Fedora N-2 reaches End of Life 4 weeks after Fedora N is released [1].
So, testing the containers for Fedora Rawhide and the two previous
stable releases on a Fedora Rawhide host is a decent test of general
backwards compatibility.

However, as seen recently [2], this isn't enough to catch some known
ABI compatibility issues [3,4].  These involve toolbox binaries built
on hosts with newer toolchains that aren't meant to be run against
containers with older runtimes.  A targeted test is needed to defend
against these scenarios.

The fedora-toolbox:34 image has glibc-2.33, which is old enough to be
unable to run binaries compiled on Fedora 35 with glibc-2.34 and newer.

[1] https://docs.fedoraproject.org/en-US/releases/

[2] containers#1180

[3] Commit 6063eb2
    containers#821

[4] Commit 6ad9c63
    containers#529

containers#1187
@debarshiray debarshiray force-pushed the wip/rishi/test-system-ABI-PT_INTERP-RPATH-RUNPATH branch from 50ff619 to aeb5d8e Compare December 7, 2022 12:05
@debarshiray debarshiray changed the title test/system: Ensure that binaries are run against their build-time ABI test/system: Test a container with an old forward incompatible runtime Dec 7, 2022
@softwarefactory-project-zuul
Copy link

Build succeeded.

✔️ unit-test SUCCESS in 9m 04s
✔️ unit-test-migration-path-for-coreos-toolbox SUCCESS in 8m 55s
✔️ system-test-fedora-rawhide SUCCESS in 30m 37s
✔️ system-test-fedora-36 SUCCESS in 12m 26s
✔️ system-test-fedora-35 SUCCESS in 13m 11s

@akdev1l
Copy link

akdev1l commented Dec 7, 2022

It seems like you are going to end up playing whack-a-mole with any foreseeable ABI incompatibility

I don't think the binary patching that is being done currently is supported by glibc and it seems likely to introduce all sorts of unintended consequences.

If toolbox is meant to run containerized and support multiple distributions then it should really not depend on the host ABI.

Depending on CLI and POSIX has turned out ok for the sister project distrobox - I don't see why we can't do the same here.

just my 2¢

@debarshiray debarshiray merged commit aeb5d8e into containers:main Dec 7, 2022
@debarshiray debarshiray deleted the wip/rishi/test-system-ABI-PT_INTERP-RPATH-RUNPATH branch December 7, 2022 15:02
@debarshiray
Copy link
Member Author

It seems like you are going to end up playing whack-a-mole with any
foreseeable ABI incompatibility

I don't think the binary patching that is being done currently is
supported by glibc and it seems likely to introduce all sorts of
unintended consequences.

At least some of the GNU C Library maintainers are aware of this use-case.

If toolbox is meant to run containerized and support multiple
distributions then it should really not depend on the host ABI.

Have you seen this blog post?

Depending on CLI and POSIX has turned out ok for the sister project
distrobox - I don't see why we can't do the same here.

Are you proposing that Toolbx go back to the POSIX shell implementation?

I am reluctant to do that for various reasons, but before we get into that, did you notice that Toolbx actually used to be implemented in POSIX shell?

@akdev1l
Copy link

akdev1l commented Dec 7, 2022

Have you seen this blog post?

I have indeed. I think the original argument for the binary patching is very weak tbh.

However, that would prevent us from using glibc’s Name Service Switch to look up usernames and groups, or to resolve host names.

Not sure why toolbx would need to resolve names by itself as the images are pulled via podman pull - and if I remember correctly last time we spoke about this you mentioned the subuid/subgid support but then adding that was complicated by the binary patching itself.

Personally I don't see why toolbx should have network access as all the network interactions ought to be done by podman and not toolbx itself. (if you can see a flaw in this reasoning I would appreciate if you point it out)

Are you proposing that Toolbx go back to the POSIX shell implementation?

No, I am proposing that of trying to use CGO we would just shell out to the respective utilities - the same way it is currently done for podman (https://github.com/containers/toolbox/blob/main/src/cmd/create.go#L473)

For example that would mean using option number 1 in #1180

Attempt to shell out to getsubids and parse the result (or at least check the exit status).

Edit: ideally it would also mean revisiting #832

@mhjacks
Copy link

mhjacks commented Dec 12, 2022

As far as subids go, we also have the option of not doing the check at all, and letting podman throw the error if it's going to.

@debarshiray
Copy link
Member Author

debarshiray commented Dec 14, 2022

However, that would prevent us from using glibc’s Name Service Switch to look up usernames and groups, or to resolve host names.

Not sure why toolbx would need to resolve names by itself

Off the top of my head:

  • To prompt the user about missing updates and containers that have reached End of Life.
  • To provide better progress reporting about image downloads. There's an inherent tension between keeping Toolbx's UI/UX clean and just being a Podman wrapper. I won't be surprised if we end up pulling the images in-process at some point.

Some of these can be found under the Project Pickle label.

as the images are pulled via podman pull

That's only true if you look at the code base today. It might not hold true in the future.

and if I remember correctly last time we spoke
about this you mentioned the subuid/subgid support but then adding that
was complicated by the binary patching itself.

Umm... I can't remember adding that last part, but I may be misremembering.

Are you proposing that Toolbx go back to the POSIX shell implementation?

No, I am proposing that of trying to use CGO we would just shell
out to the respective utilities - the same way it is currently done for
podman (https://github.com/containers/toolbox/blob/main/src/cmd/create.go#L473)

For example that would mean using option number 1 in #1180

Shelling out (or more correctly, fork-exec) is inherently slower than calling a function within the same process. The need to potentially migrate already forces us to do some, otherwise needless, file operations on every invocation of toolbox(1). If Podman was updated, it will also involve shelling out to podman system migrate. Shelling out getsubids(1) will add to this load.

Currently we have people still complaining about performance; and there is an urge to push Toolbx closer to being the default shell on OSes like Fedora Silverblue. So, performance matters, and we need to be careful about what we add to the common code paths.

Secondly, Toolbx's target audience doesn't want to know anything about containers. They just want an interactive command line environment that's familiar to them. So, keeping the UI/UX clean matters.

We need to somehow balance those two. One option is to just not check for anything at all and let Podman error out, except most users will be facing an error message that's hard to understand. It can be argued that we can assume everybody has users created with a shadow-utils that's new enough to always define the subordinate ID ranges, but then you can't say that about --preserve-fds. Plus, this is only true for the code base today. Nobody knows what other use-case will show up in the future.

In general, it does seem limiting to constrain ourselves to only shelling out to other binaries, parsing their outputs and use exit codes for error handling. Using a static binary or not using CGO eventually leads to cases where Toolbx might need a custom helper binary. How will we manage the compatibility between the different parts then?

@mhjacks
Copy link

mhjacks commented Dec 14, 2022

From my perspective, I'm not certain that toolbx is going to add an enormous amount of value to explain podman errors. If a user has a situation where podman is going to error out, they're going to have to deal with containers, pure and simple; toolbx can at best reference a man page; but over time even that may provide different advice than podman's documented error handling.

I'm not sure how common these kinds of cases are - generally, I would expect that anyone using FreeIPA subid's (since they're not the default, or if they are they're new enough that they require some intervention to set up) would be a bit more advanced, since the kinds of users that want FreeIPA need to get more in the weeds about UIDs, GIDs anyway. I'm very leery of penalizing the whole project for what feels like a very niche use case - but at the same time, the current behavior is a blocker for FreeIPA users, because toolbx is erroring when it shouldn't.

To me, that says that toolbx's approach of "hmm, there was an error, run podman {verbosely} to learn more" is the right one. It's forward portable, and backward portable, and it doesn't require toolbx to keep state on those situations.

@debarshiray
Copy link
Member Author

Let's not fragment the discussion across too many different places. It will be good to consolidate the discussion around subordinate IDs and enterprise set-ups in #1180

TL;DR: dlopen(3) solves the current problem, worst case we call getsubids(1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants