kola: Add `isolation=readonly|dynamicuser` #3060

cgwalters · 2022-08-31T20:28:24Z

EDIT: I apologize for the tone in this PR that started from me; it was absolutely unnecessary. Again, I apologize.

Part of my battle against duplicative comments in our kola tests.
We have a few tests that write a comment like this:

 # - exclusive: false
 #   - This test doesn't make meaningful changes to the system and
 #     should be able to be combined with other tests.

Of course, this comment is already redundant because the
meaning of the exclusive tag is defined canonically in coreos-assembler
(here) and copy-pasting that into every test that uses it would
be pointlessly verbose.

But - we can do one better. Instead of having a test flag which
is mainly an "I promise not to mutate the system in a way which could
interfere with other tests", let's add a field that enforces this.
Then it doesn't need to be commented; we have a variety of
tests which are just "system inspection" (e.g. query rpmdb) and
run just fine with DynamicUser=yes and hence cannot affect
the system, and hence are inherently isolated from other concurrent
tests.

travier · 2022-09-01T09:06:33Z

This is great!
Edit: Turns out it's not as simple as that.

dustymabe · 2022-09-01T13:40:55Z

docs/kola/external-tests.md

+and `dynamicuser` means `DynamicUser=yes`.  Setting either of these options
+also implies `exclusive: false`.  Use these for tests that are mostly about


I really think it's best if we don't rely on exclusive: false being implicit anywhere. Tests can do things in their butane config that are exclusive and would/could cause other tests to fail.

If you want an isolation and dynamicuser flag that's OK I guess, but I really want to keep exlusive: false explicit.

Hmm, we could also enforce that such tests don't provide an ignition config.

I'm not really opposed to also requiring an explicit exclusive: false as long as we don't force people to copy-paste the rationale for it 😄

Now strengthened this to prevent the test from providing an Ignition config too!

I did also test the testing framework manually:

$ cat > src/config/tests/kola/content-origins/config.bu << 'EOF' variant: fcos version: 1.2.0 EOF $ kola run ext.config.content-origins Error: test ext.config.content-origins specifies isolation=dynamicuser but includes an Ignition config 2022-09-01T14:09:32Z cli: test ext.config.content-origins specifies isolation=dynamicuser but includes an Ignition config $

Note that you still will probably include a comment/rationale for isolation: readonly. I'm not really sure what you gain here.

Why would I add a comment/rationale for that? Would you block coreos/fedora-coreos-config#1949 on that? If so I'm sorry I just disagree, it's self-evidently obvious and doesn't need a comment.

I think the problem is that a lot of the configuration options here aren't self-evident especially to someone who didn't write the test and it's safer to just default to including a rationale for everything to make sure we catch the full context.

I agree with you that if you read the documentation on this one, and then you read the systemd documentation on that option, then it's clear: "this test is completely read-only and doesn't write to the filesystem at all", but that requires... work, we could just state it up front. It doesn't really cost us anything to help the user read/understand what's going on.

Would you block coreos/fedora-coreos-config#1949 on that?

And.. the answer is "no", I'm not that dogmatic. I'd suggest it but if you firmly say "are you going to block this on that?" I wouldn't. Later on, though, if I ever touched the file I'd be like "hmm, wonder why these options aren't documented like all the others?" and I'd add it back.

OK, I think we can continue a debate about the comment schema after landing this.

Depends: coreos/coreos-assembler#3060

tests/kola-ci-self/tests/kola/exclusive-readonly

Part of my war against duplicative comments in our kola tests. We have a few tests that write a comment like this: ``` # - exclusive: false # - This test doesn't make meaningful changes to the system and # should be able to be combined with other tests. ``` Of course, this comment is already redundant because the meaning of the `exclusive` tag is defined canonically in coreos-assembler (here) and copy-pasting that into every test that uses it would be pointlessly verbose. But - we can do one better. Instead of having a test flag which is mainly an "I promise not to mutate the system in a way which could interfere with other tests", let's add a field that *enforces* this. Then it doesn't need to be commented; we have a variety of tests which are just "system inspection" (e.g. query rpmdb) and run just fine with `DynamicUser=yes` and hence *cannot* affect the system, and hence are inherently isolated from other concurrent tests.

dustymabe · 2022-09-01T19:16:11Z

mantle/kola/harness.go

+		if userdata != nil {
+			return fmt.Errorf("test %v specifies isolation=%v but includes an Ignition config", testname, targetMeta.Isolation)
+		}
+		targetMeta.Exclusive = false


Suggested change

targetMeta.Exclusive = false

Still need to drop this bit as discussed in https://github.com/coreos/coreos-assembler/pull/3060/files#r960684553 and https://github.com/coreos/coreos-assembler/pull/3060/files#discussion_r960795802

There's no use case for exclusive: true; isolation=<something> tests right? Hence, it would always be exclusive: false; isolation=<something> which is redundant.

I definitely see this replacing most of our use of exclusive: false and so I hence don't see the value in requiring it to also be specified.

In practice, our test framework is mostly something that needs to be understood by a relatively small set of people. Sure, this is a change that some people would need to learn about later, but it's really not a big change either, and we have docs. And the existing test corpus acts as an example.

I think enhancing the robustness of the test suite outweighs the very short term transition period where we some people may need to learn that isolation implies ➡️ exclusive: false.

Here's another way to look at this - a lot of people I'm sure start writing a new test by copy-pasting an old one. Today it's really easy to copy-paste an exclusive: false test but not think about what that means in your new test and acidentally write a test that does mutate the system in a way likely to cause flakes.

With isolation - we're adding some locks around the gun case that you'd have to bypass to use the footgun 😄

Your proposal would implicitly support people accidentally doing just isolation: dynamicuser and we'd have to remember that there's no use case for that and to later manually add exclusive: false. Which would be another footgun case, although a much less severe one.

I'm just stating and restating and stating again that I don't want to imply exclusive: false.

If someone added a test that was likely to cause flakes because it's run with other tests we'd catch that today because we automatically re-run failed tests on their own and we'd see that the re-run passed.

I get what you were trying to do with this, but the model we developed doesn't apply 1:1 with what you wanted it to be.

If we change anything what I would prefer is for non-exclusive tests to default to ProtectSystem=strict and then allow adding a tag (or I guess it could be a toplevel option) to disable that (for example to those two tests that you found that don't comply). Then a user has to think about it when they choose to override that default behavior.

I already wrote up a comment in the other thread before I saw Dusty's reply proposing essentially the same thing. Moving it here for completeness:

I think the isolation functionality is potentially useful, but if the goal is to reduce boilerplate comments, the proposed flags are counterproductive. Tests have to opt in to this functionality by writing something like "isolation": "dynamicuser", and if someone wants to find out what that means, they can look it up in cosa docs and learn that it means DynamicUser=yes. Okay, now they have to find an option in systemd docs, which isn't easy unless they know about systemd.directives(7). And why does the test use that? What are the implications? Why would I select dynamicuser rather than readonly, and why can't I select both? (dynamicuser implies readonly, but that's more spelunking in systemd docs.)

So let's figure out what we actually want here, without recreating systemd security option spaghetti. Proposal: for tests that have exclusive: false and do not provide an Ignition/Butane config, automatically set DynamicUser=yes unless a second option is specified, something like restrict: false. That way we default to maximum isolation unless the test says that it wants less; we'll easily notice that case because the test will fail. If there are known cases where we want ProtectSystem=strict without DynamicUser=yes, we can provide an option for that too, or make the first one multi-valued. Maybe in the Ignition + exclusive: false case we'd want to require an explicit restrict: false, to make it clear that protections are being disabled.

I'm just stating and restating and stating again that I don't want to imply exclusive: false.

Your seems to overlap with bgilbert's, so replying below:

and if someone wants to find out what that means, they can look it up in cosa docs and learn that it means DynamicUser=yes. Okay, now they have to find an option in systemd docs, which isn't easy unless they know about systemd.directives(7).

Right, but the developers on the team already have to read the external-tests.md documentation to understand much at all. Needing that isn't...isolated...to this change.

Proposal: for tests that have exclusive: false and do not provide an Ignition/Butane config, automatically set DynamicUser=yes unless a second option is specified, something like restrict: false.

Hmmm...definitely implicit-over-explicit going on here. I think having an ignition/butane config provided turning off isolation would be too magical, and instead we should error out and require isolation: none or so.

Overall I'm absolutely fine with defaulting exclusive: false tests to being isolated...but it will require some ratcheting.

A tricky thing here is that some of our read only tests actually do need to still run as root because they traverse the filesystem and we have some protected directories.

This is why I introduced both isolation: readonly (runs as root, but read only) and isolation: dynamicuser.

Anyways then so then we're agreeing to:

add isolation: none|readonly|dynamicuser to coreos-assembler but don't change its relationship with exclusive

Do a PR to fedora-coreos-config which adds exclusive: false; isolation: none to the tests that need it

Another (or same) PR to fedora-coreos-config which adds exclusive: false; isolation: readonly to tests that need to be root but are still readonly

Change this PR to default exclusive: false to isolation: dynamicuser

I need to more carefully look at the case of external tests with ignition configs and see the impact.

I did see it. It does not address my concerns from the comment #3060 (comment) it was replying to, nor my restatement just now in #3060 (comment). My -1 will remain in place until those concerns are addressed.

Ah...you're proposing s/isolation: none/restrict: false/ basically, right?

I think restrict is slightly better than isolation, but the main point is avoiding dynamicuser and readonly. Instead of dynamicuser, we can have something like restrict: true (implicit because it's the default), and instead of readonly, something like an implicit restrict: true with an explicit run-as-root: true. (I'm open to better naming ideas also.)

Err...I was with you on the first half of the sentence - sure, we can have a boolean instead a tristate (I still don't think a tristate is hard or complicated but whatever, I was mainly on board with just having a boolean). That seems to be what Dusty was arguing for too.

Then in the second half you made up another boolean - and now we have two booleans which is (I hope you'd agree if you step back for a moment) not actually simpler.

For one thing, two booleans is four potential states. The new one of which (restrict: false, run-as-root: true) is redundant. And any future expansion via additional booleans becomes quickly grows the state space.

I think I'd much prefer just a boolean (if that's what we're going down to) alongside the ability for tests to inject arbitrary systemd unit directives, which would likely help solve other things too (e.g. ordering against other units).

Edit: of course, two booleans for this combines with wanting to retain a boolean exclusive: false because now there are other states that seem at best useless (exclusive: true, restrict: true).

The extra boolean isn't simpler for the implementation, but we should prioritize building intuitive directives for the test author. I think distinct flags help with that, but I'm okay with a three-valued enum if we can find sensible names for the values. I see how the arbitrary directive injection could be useful, but using it to override DynamicUser would be pretty awkward, and I don't think encouraging that helps with the goal of simplifying tests.

For completeness, I'll note a couple things. One is that an enum implies that there's only one dimension of hardening, and down the road we might find that we want to compose multiple hardening flags anyway. The other is that systemd has the sort of interlocking options you're arguing against. DynamicUser is a separate option from ProtectSystem, but implies it. (systemd has a gazillion options though, which I agree can be hard to work with.)

I have deadlines I need to focus on, so I'm muting this thread for now. I've tried to fully capture my constraints in #3060 (comment) (but s/flags/options/ if you want to pursue the enum). I'll check back later, which might be after OCP feature freeze, and my -1 stands until then.

bgilbert

Hold pending resolution of #3060 (comment).

openshift-merge-robot · 2022-09-08T00:32:38Z

@cgwalters: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dustymabe · 2024-04-18T18:44:22Z

Closing this. I don't think it's a priority right now. If we want to implement the suggested changes from the discussion let's start with an issue explaining the proposal.

This comment was marked as outdated.

Sign in to view

cgwalters mentioned this pull request Aug 31, 2022

tests/content-origins: Verify expected vendor for all RPM packages coreos/fedora-coreos-config#1947

Merged

cgwalters force-pushed the exclusive-readonly branch from db1f6a7 to d58af5f Compare September 1, 2022 13:30

cgwalters changed the title ~~kola: Force exclusive: false tests to use ProtectSystem=strict~~ kola: Add isolation=readonly|dynamicuser Sep 1, 2022

cgwalters mentioned this pull request Sep 1, 2022

tests: Lift exclusive: false on tests that actually mutate system coreos/fedora-coreos-config#1948

Closed

dustymabe reviewed Sep 1, 2022

View reviewed changes

cgwalters force-pushed the exclusive-readonly branch from d58af5f to 008fd81 Compare September 1, 2022 14:02

cgwalters added a commit to cgwalters/fedora-coreos-config that referenced this pull request Sep 1, 2022

tests: Demonstrate isolation usage in one test

6ad99df

Depends: coreos/coreos-assembler#3060

cgwalters mentioned this pull request Sep 1, 2022

tests: Demonstrate isolation usage in one test coreos/fedora-coreos-config#1949

Draft

dustymabe reviewed Sep 1, 2022

View reviewed changes

tests/kola-ci-self/tests/kola/exclusive-readonly Show resolved Hide resolved

cgwalters force-pushed the exclusive-readonly branch from 008fd81 to a21e0e4 Compare September 1, 2022 17:52

dustymabe reviewed Sep 1, 2022

View reviewed changes

bgilbert suggested changes Sep 6, 2022

View reviewed changes

openshift-merge-robot added the needs-rebase label Sep 8, 2022

dustymabe closed this Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kola: Add `isolation=readonly|dynamicuser` #3060

kola: Add `isolation=readonly|dynamicuser` #3060

cgwalters commented Aug 31, 2022 •

edited

Loading

This comment was marked as outdated.

travier commented Sep 1, 2022 •

edited

Loading

dustymabe Sep 1, 2022 •

edited

Loading

cgwalters Sep 1, 2022

cgwalters Sep 1, 2022

cgwalters Sep 1, 2022

cgwalters Sep 1, 2022

dustymabe Sep 1, 2022

cgwalters Sep 1, 2022

dustymabe Sep 1, 2022

dustymabe Sep 1, 2022 •

edited

Loading

cgwalters Sep 1, 2022

dustymabe Sep 1, 2022

cgwalters Sep 1, 2022 •

edited

Loading

dustymabe Sep 2, 2022

bgilbert Sep 2, 2022

cgwalters Sep 2, 2022 •

edited

Loading

bgilbert Sep 6, 2022

cgwalters Sep 6, 2022

bgilbert Sep 6, 2022

cgwalters Sep 6, 2022 •

edited

Loading

bgilbert Sep 6, 2022

bgilbert left a comment

openshift-merge-robot commented Sep 8, 2022

dustymabe commented Apr 18, 2024

		and `dynamicuser` means `DynamicUser=yes`. Setting either of these options
		also implies `exclusive: false`. Use these for tests that are mostly about

kola: Add isolation=readonly|dynamicuser #3060

kola: Add isolation=readonly|dynamicuser #3060

Conversation

cgwalters commented Aug 31, 2022 • edited Loading

This comment was marked as outdated.

travier commented Sep 1, 2022 • edited Loading

dustymabe Sep 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dustymabe Sep 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgwalters Sep 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgwalters Sep 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgwalters Sep 6, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgilbert left a comment

Choose a reason for hiding this comment

openshift-merge-robot commented Sep 8, 2022

dustymabe commented Apr 18, 2024

kola: Add `isolation=readonly|dynamicuser` #3060

kola: Add `isolation=readonly|dynamicuser` #3060

cgwalters commented Aug 31, 2022 •

edited

Loading

travier commented Sep 1, 2022 •

edited

Loading

dustymabe Sep 1, 2022 •

edited

Loading

dustymabe Sep 1, 2022 •

edited

Loading

cgwalters Sep 1, 2022 •

edited

Loading

cgwalters Sep 2, 2022 •

edited

Loading

cgwalters Sep 6, 2022 •

edited

Loading