Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kola: Add isolation=readonly|dynamicuser #3060

Closed
wants to merge 1 commit into from

Conversation

cgwalters
Copy link
Member

@cgwalters cgwalters commented Aug 31, 2022

EDIT: I apologize for the tone in this PR that started from me; it was absolutely unnecessary. Again, I apologize.


Part of my battle against duplicative comments in our kola tests.
We have a few tests that write a comment like this:

 # - exclusive: false
 #   - This test doesn't make meaningful changes to the system and
 #     should be able to be combined with other tests.

Of course, this comment is already redundant because the
meaning of the exclusive tag is defined canonically in coreos-assembler
(here) and copy-pasting that into every test that uses it would
be pointlessly verbose.

But - we can do one better. Instead of having a test flag which
is mainly an "I promise not to mutate the system in a way which could
interfere with other tests", let's add a field that enforces this.
Then it doesn't need to be commented; we have a variety of
tests which are just "system inspection" (e.g. query rpmdb) and
run just fine with DynamicUser=yes and hence cannot affect
the system, and hence are inherently isolated from other concurrent
tests.


@cgwalters

This comment was marked as outdated.

@travier
Copy link
Member

travier commented Sep 1, 2022

This is great!
Edit: Turns out it's not as simple as that.

@cgwalters cgwalters changed the title kola: Force exclusive: false tests to use ProtectSystem=strict kola: Add isolation=readonly|dynamicuser Sep 1, 2022
Comment on lines +269 to +270
and `dynamicuser` means `DynamicUser=yes`. Setting either of these options
also implies `exclusive: false`. Use these for tests that are mostly about
Copy link
Member

@dustymabe dustymabe Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think it's best if we don't rely on exclusive: false being implicit anywhere. Tests can do things in their butane config that are exclusive and would/could cause other tests to fail.

If you want an isolation and dynamicuser flag that's OK I guess, but I really want to keep exlusive: false explicit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we could also enforce that such tests don't provide an ignition config.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really opposed to also requiring an explicit exclusive: false as long as we don't force people to copy-paste the rationale for it 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now strengthened this to prevent the test from providing an Ignition config too!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did also test the testing framework manually:

$ cat > src/config/tests/kola/content-origins/config.bu << 'EOF'
variant: fcos
version: 1.2.0
EOF
$ kola run ext.config.content-origins
Error: test ext.config.content-origins specifies isolation=dynamicuser but includes an Ignition config
2022-09-01T14:09:32Z cli: test ext.config.content-origins specifies isolation=dynamicuser but includes an Ignition config
$ 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you still will probably include a comment/rationale for isolation: readonly. I'm not really sure what you gain here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would I add a comment/rationale for that? Would you block coreos/fedora-coreos-config#1949 on that? If so I'm sorry I just disagree, it's self-evidently obvious and doesn't need a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is that a lot of the configuration options here aren't self-evident especially to someone who didn't write the test and it's safer to just default to including a rationale for everything to make sure we catch the full context.

I agree with you that if you read the documentation on this one, and then you read the systemd documentation on that option, then it's clear: "this test is completely read-only and doesn't write to the filesystem at all", but that requires... work, we could just state it up front. It doesn't really cost us anything to help the user read/understand what's going on.

Copy link
Member

@dustymabe dustymabe Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you block coreos/fedora-coreos-config#1949 on that?

And.. the answer is "no", I'm not that dogmatic. I'd suggest it but if you firmly say "are you going to block this on that?" I wouldn't. Later on, though, if I ever touched the file I'd be like "hmm, wonder why these options aren't documented like all the others?" and I'd add it back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think we can continue a debate about the comment schema after landing this.

Part of my war against duplicative comments in our kola tests.
We have a few tests that write a comment like this:

```
 # - exclusive: false
 #   - This test doesn't make meaningful changes to the system and
 #     should be able to be combined with other tests.
```
Of course, this comment is already redundant because the
meaning of the `exclusive` tag is defined canonically in coreos-assembler
(here) and copy-pasting that into every test that uses it would
be pointlessly verbose.

But - we can do one better.  Instead of having a test flag which
is mainly an "I promise not to mutate the system in a way which could
interfere with other tests", let's add a field that *enforces* this.
Then it doesn't need to be commented; we have a variety of
tests which are just "system inspection" (e.g. query rpmdb) and
run just fine with `DynamicUser=yes` and hence *cannot* affect
the system, and hence are inherently isolated from other concurrent
tests.
if userdata != nil {
return fmt.Errorf("test %v specifies isolation=%v but includes an Ignition config", testname, targetMeta.Isolation)
}
targetMeta.Exclusive = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@cgwalters cgwalters Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no use case for exclusive: true; isolation=<something> tests right? Hence, it would always be exclusive: false; isolation=<something> which is redundant.

I definitely see this replacing most of our use of exclusive: false and so I hence don't see the value in requiring it to also be specified.

In practice, our test framework is mostly something that needs to be understood by a relatively small set of people. Sure, this is a change that some people would need to learn about later, but it's really not a big change either, and we have docs. And the existing test corpus acts as an example.

I think enhancing the robustness of the test suite outweighs the very short term transition period where we some people may need to learn that isolation implies ➡️ exclusive: false.

Here's another way to look at this - a lot of people I'm sure start writing a new test by copy-pasting an old one. Today it's really easy to copy-paste an exclusive: false test but not think about what that means in your new test and acidentally write a test that does mutate the system in a way likely to cause flakes.

With isolation - we're adding some locks around the gun case that you'd have to bypass to use the footgun 😄

Your proposal would implicitly support people accidentally doing just isolation: dynamicuser and we'd have to remember that there's no use case for that and to later manually add exclusive: false. Which would be another footgun case, although a much less severe one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just stating and restating and stating again that I don't want to imply exclusive: false.

If someone added a test that was likely to cause flakes because it's run with other tests we'd catch that today because we automatically re-run failed tests on their own and we'd see that the re-run passed.

I get what you were trying to do with this, but the model we developed doesn't apply 1:1 with what you wanted it to be.

If we change anything what I would prefer is for non-exclusive tests to default to ProtectSystem=strict and then allow adding a tag (or I guess it could be a toplevel option) to disable that (for example to those two tests that you found that don't comply). Then a user has to think about it when they choose to override that default behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already wrote up a comment in the other thread before I saw Dusty's reply proposing essentially the same thing. Moving it here for completeness:

I think the isolation functionality is potentially useful, but if the goal is to reduce boilerplate comments, the proposed flags are counterproductive. Tests have to opt in to this functionality by writing something like "isolation": "dynamicuser", and if someone wants to find out what that means, they can look it up in cosa docs and learn that it means DynamicUser=yes. Okay, now they have to find an option in systemd docs, which isn't easy unless they know about systemd.directives(7). And why does the test use that? What are the implications? Why would I select dynamicuser rather than readonly, and why can't I select both? (dynamicuser implies readonly, but that's more spelunking in systemd docs.)

So let's figure out what we actually want here, without recreating systemd security option spaghetti. Proposal: for tests that have exclusive: false and do not provide an Ignition/Butane config, automatically set DynamicUser=yes unless a second option is specified, something like restrict: false. That way we default to maximum isolation unless the test says that it wants less; we'll easily notice that case because the test will fail. If there are known cases where we want ProtectSystem=strict without DynamicUser=yes, we can provide an option for that too, or make the first one multi-valued. Maybe in the Ignition + exclusive: false case we'd want to require an explicit restrict: false, to make it clear that protections are being disabled.

Copy link
Member Author

@cgwalters cgwalters Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just stating and restating and stating again that I don't want to imply exclusive: false.

Your seems to overlap with bgilbert's, so replying below:

and if someone wants to find out what that means, they can look it up in cosa docs and learn that it means DynamicUser=yes. Okay, now they have to find an option in systemd docs, which isn't easy unless they know about systemd.directives(7).

Right, but the developers on the team already have to read the external-tests.md documentation to understand much at all. Needing that isn't...isolated...to this change.

Proposal: for tests that have exclusive: false and do not provide an Ignition/Butane config, automatically set DynamicUser=yes unless a second option is specified, something like restrict: false.

Hmmm...definitely implicit-over-explicit going on here. I think having an ignition/butane config provided turning off isolation would be too magical, and instead we should error out and require isolation: none or so.

Overall I'm absolutely fine with defaulting exclusive: false tests to being isolated...but it will require some ratcheting.

A tricky thing here is that some of our read only tests actually do need to still run as root because they traverse the filesystem and we have some protected directories.

This is why I introduced both isolation: readonly (runs as root, but read only) and isolation: dynamicuser.

Anyways then so then we're agreeing to:

  • add isolation: none|readonly|dynamicuser to coreos-assembler but don't change its relationship with exclusive
  • Do a PR to fedora-coreos-config which adds exclusive: false; isolation: none to the tests that need it
  • Another (or same) PR to fedora-coreos-config which adds exclusive: false; isolation: readonly to tests that need to be root but are still readonly
  • Change this PR to default exclusive: false to isolation: dynamicuser

I need to more carefully look at the case of external tests with ignition configs and see the impact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did see it. It does not address my concerns from the comment #3060 (comment) it was replying to, nor my restatement just now in #3060 (comment). My -1 will remain in place until those concerns are addressed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah...you're proposing s/isolation: none/restrict: false/ basically, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think restrict is slightly better than isolation, but the main point is avoiding dynamicuser and readonly. Instead of dynamicuser, we can have something like restrict: true (implicit because it's the default), and instead of readonly, something like an implicit restrict: true with an explicit run-as-root: true. (I'm open to better naming ideas also.)

Copy link
Member Author

@cgwalters cgwalters Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err...I was with you on the first half of the sentence - sure, we can have a boolean instead a tristate (I still don't think a tristate is hard or complicated but whatever, I was mainly on board with just having a boolean). That seems to be what Dusty was arguing for too.

Then in the second half you made up another boolean - and now we have two booleans which is (I hope you'd agree if you step back for a moment) not actually simpler.

For one thing, two booleans is four potential states. The new one of which (restrict: false, run-as-root: true) is redundant. And any future expansion via additional booleans becomes quickly grows the state space.

I think I'd much prefer just a boolean (if that's what we're going down to) alongside the ability for tests to inject arbitrary systemd unit directives, which would likely help solve other things too (e.g. ordering against other units).

Edit: of course, two booleans for this combines with wanting to retain a boolean exclusive: false because now there are other states that seem at best useless (exclusive: true, restrict: true).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra boolean isn't simpler for the implementation, but we should prioritize building intuitive directives for the test author. I think distinct flags help with that, but I'm okay with a three-valued enum if we can find sensible names for the values. I see how the arbitrary directive injection could be useful, but using it to override DynamicUser would be pretty awkward, and I don't think encouraging that helps with the goal of simplifying tests.

For completeness, I'll note a couple things. One is that an enum implies that there's only one dimension of hardening, and down the road we might find that we want to compose multiple hardening flags anyway. The other is that systemd has the sort of interlocking options you're arguing against. DynamicUser is a separate option from ProtectSystem, but implies it. (systemd has a gazillion options though, which I agree can be hard to work with.)

I have deadlines I need to focus on, so I'm muting this thread for now. I've tried to fully capture my constraints in #3060 (comment) (but s/flags/options/ if you want to pursue the enum). I'll check back later, which might be after OCP feature freeze, and my -1 stands until then.

Copy link
Contributor

@bgilbert bgilbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold pending resolution of #3060 (comment).

@openshift-merge-robot
Copy link

@cgwalters: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dustymabe
Copy link
Member

Closing this. I don't think it's a priority right now. If we want to implement the suggested changes from the discussion let's start with an issue explaining the proposal.

@dustymabe dustymabe closed this Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants