Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kola: Add isolation=readonly|dynamicuser #3060

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/kola/external-tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,13 @@ is simple and is not expected to conflict with other tests, it should be marked
`exclusive: false`. When the `exclusive` key is not provided, tests are marked
`exclusive: true` by default.

The `isolation` key can take two values: `readonly` and `dynamicuser`. These
are thin wrappers for the equivalent systemd options; `readonly` equals `ProtectSystem=strict`,
and `dynamicuser` means `DynamicUser=yes`. Setting either of these options
also implies `exclusive: false`. Use these for tests that are mostly about
Comment on lines +269 to +270
Copy link
Member

@dustymabe dustymabe Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think it's best if we don't rely on exclusive: false being implicit anywhere. Tests can do things in their butane config that are exclusive and would/could cause other tests to fail.

If you want an isolation and dynamicuser flag that's OK I guess, but I really want to keep exlusive: false explicit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we could also enforce that such tests don't provide an ignition config.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really opposed to also requiring an explicit exclusive: false as long as we don't force people to copy-paste the rationale for it 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now strengthened this to prevent the test from providing an Ignition config too!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did also test the testing framework manually:

$ cat > src/config/tests/kola/content-origins/config.bu << 'EOF'
variant: fcos
version: 1.2.0
EOF
$ kola run ext.config.content-origins
Error: test ext.config.content-origins specifies isolation=dynamicuser but includes an Ignition config
2022-09-01T14:09:32Z cli: test ext.config.content-origins specifies isolation=dynamicuser but includes an Ignition config
$ 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you still will probably include a comment/rationale for isolation: readonly. I'm not really sure what you gain here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would I add a comment/rationale for that? Would you block coreos/fedora-coreos-config#1949 on that? If so I'm sorry I just disagree, it's self-evidently obvious and doesn't need a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is that a lot of the configuration options here aren't self-evident especially to someone who didn't write the test and it's safer to just default to including a rationale for everything to make sure we catch the full context.

I agree with you that if you read the documentation on this one, and then you read the systemd documentation on that option, then it's clear: "this test is completely read-only and doesn't write to the filesystem at all", but that requires... work, we could just state it up front. It doesn't really cost us anything to help the user read/understand what's going on.

Copy link
Member

@dustymabe dustymabe Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you block coreos/fedora-coreos-config#1949 on that?

And.. the answer is "no", I'm not that dogmatic. I'd suggest it but if you firmly say "are you going to block this on that?" I wouldn't. Later on, though, if I ever touched the file I'd be like "hmm, wonder why these options aren't documented like all the others?" and I'd add it back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think we can continue a debate about the comment schema after landing this.

read-only system inspection. If specified, the test *cannot* provide
an Ignition (or butane) config either.

The `conflicts` key takes a list of test names that conflict with this test.
This key can only be specified if `exclusive` is marked `false` since
`exclusive: true` tests are run exclusively in their own VM. At runtime,
Expand Down
31 changes: 30 additions & 1 deletion mantle/kola/harness.go
Original file line number Diff line number Diff line change
Expand Up @@ -826,6 +826,7 @@ type externalTestMeta struct {
AppendKernelArgs string `json:"appendKernelArgs,omitempty"`
AppendFirstbootKernelArgs string `json:"appendFirstbootKernelArgs,omitempty"`
Exclusive bool `json:"exclusive"`
Isolation string `json:"isolation"`
TimeoutMin int `json:"timeoutMin"`
Conflicts []string `json:"conflicts"`
AllowConfigWarnings bool `json:"allowConfigWarnings"`
Expand Down Expand Up @@ -943,6 +944,24 @@ func registerExternalTest(testname, executable, dependencydir string, userdata *
targetMeta = &metaCopy
}

switch targetMeta.Isolation {
case "readonly":
case "dynamicuser":
// These tests cannot provide their own Ignition
if userdata != nil {
return fmt.Errorf("test %v specifies isolation=%v but includes an Ignition config", testname, targetMeta.Isolation)
}
targetMeta.Exclusive = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@cgwalters cgwalters Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no use case for exclusive: true; isolation=<something> tests right? Hence, it would always be exclusive: false; isolation=<something> which is redundant.

I definitely see this replacing most of our use of exclusive: false and so I hence don't see the value in requiring it to also be specified.

In practice, our test framework is mostly something that needs to be understood by a relatively small set of people. Sure, this is a change that some people would need to learn about later, but it's really not a big change either, and we have docs. And the existing test corpus acts as an example.

I think enhancing the robustness of the test suite outweighs the very short term transition period where we some people may need to learn that isolation implies ➡️ exclusive: false.

Here's another way to look at this - a lot of people I'm sure start writing a new test by copy-pasting an old one. Today it's really easy to copy-paste an exclusive: false test but not think about what that means in your new test and acidentally write a test that does mutate the system in a way likely to cause flakes.

With isolation - we're adding some locks around the gun case that you'd have to bypass to use the footgun 😄

Your proposal would implicitly support people accidentally doing just isolation: dynamicuser and we'd have to remember that there's no use case for that and to later manually add exclusive: false. Which would be another footgun case, although a much less severe one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just stating and restating and stating again that I don't want to imply exclusive: false.

If someone added a test that was likely to cause flakes because it's run with other tests we'd catch that today because we automatically re-run failed tests on their own and we'd see that the re-run passed.

I get what you were trying to do with this, but the model we developed doesn't apply 1:1 with what you wanted it to be.

If we change anything what I would prefer is for non-exclusive tests to default to ProtectSystem=strict and then allow adding a tag (or I guess it could be a toplevel option) to disable that (for example to those two tests that you found that don't comply). Then a user has to think about it when they choose to override that default behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already wrote up a comment in the other thread before I saw Dusty's reply proposing essentially the same thing. Moving it here for completeness:

I think the isolation functionality is potentially useful, but if the goal is to reduce boilerplate comments, the proposed flags are counterproductive. Tests have to opt in to this functionality by writing something like "isolation": "dynamicuser", and if someone wants to find out what that means, they can look it up in cosa docs and learn that it means DynamicUser=yes. Okay, now they have to find an option in systemd docs, which isn't easy unless they know about systemd.directives(7). And why does the test use that? What are the implications? Why would I select dynamicuser rather than readonly, and why can't I select both? (dynamicuser implies readonly, but that's more spelunking in systemd docs.)

So let's figure out what we actually want here, without recreating systemd security option spaghetti. Proposal: for tests that have exclusive: false and do not provide an Ignition/Butane config, automatically set DynamicUser=yes unless a second option is specified, something like restrict: false. That way we default to maximum isolation unless the test says that it wants less; we'll easily notice that case because the test will fail. If there are known cases where we want ProtectSystem=strict without DynamicUser=yes, we can provide an option for that too, or make the first one multi-valued. Maybe in the Ignition + exclusive: false case we'd want to require an explicit restrict: false, to make it clear that protections are being disabled.

Copy link
Member Author

@cgwalters cgwalters Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just stating and restating and stating again that I don't want to imply exclusive: false.

Your seems to overlap with bgilbert's, so replying below:

and if someone wants to find out what that means, they can look it up in cosa docs and learn that it means DynamicUser=yes. Okay, now they have to find an option in systemd docs, which isn't easy unless they know about systemd.directives(7).

Right, but the developers on the team already have to read the external-tests.md documentation to understand much at all. Needing that isn't...isolated...to this change.

Proposal: for tests that have exclusive: false and do not provide an Ignition/Butane config, automatically set DynamicUser=yes unless a second option is specified, something like restrict: false.

Hmmm...definitely implicit-over-explicit going on here. I think having an ignition/butane config provided turning off isolation would be too magical, and instead we should error out and require isolation: none or so.

Overall I'm absolutely fine with defaulting exclusive: false tests to being isolated...but it will require some ratcheting.

A tricky thing here is that some of our read only tests actually do need to still run as root because they traverse the filesystem and we have some protected directories.

This is why I introduced both isolation: readonly (runs as root, but read only) and isolation: dynamicuser.

Anyways then so then we're agreeing to:

  • add isolation: none|readonly|dynamicuser to coreos-assembler but don't change its relationship with exclusive
  • Do a PR to fedora-coreos-config which adds exclusive: false; isolation: none to the tests that need it
  • Another (or same) PR to fedora-coreos-config which adds exclusive: false; isolation: readonly to tests that need to be root but are still readonly
  • Change this PR to default exclusive: false to isolation: dynamicuser

I need to more carefully look at the case of external tests with ignition configs and see the impact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did see it. It does not address my concerns from the comment #3060 (comment) it was replying to, nor my restatement just now in #3060 (comment). My -1 will remain in place until those concerns are addressed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah...you're proposing s/isolation: none/restrict: false/ basically, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think restrict is slightly better than isolation, but the main point is avoiding dynamicuser and readonly. Instead of dynamicuser, we can have something like restrict: true (implicit because it's the default), and instead of readonly, something like an implicit restrict: true with an explicit run-as-root: true. (I'm open to better naming ideas also.)

Copy link
Member Author

@cgwalters cgwalters Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err...I was with you on the first half of the sentence - sure, we can have a boolean instead a tristate (I still don't think a tristate is hard or complicated but whatever, I was mainly on board with just having a boolean). That seems to be what Dusty was arguing for too.

Then in the second half you made up another boolean - and now we have two booleans which is (I hope you'd agree if you step back for a moment) not actually simpler.

For one thing, two booleans is four potential states. The new one of which (restrict: false, run-as-root: true) is redundant. And any future expansion via additional booleans becomes quickly grows the state space.

I think I'd much prefer just a boolean (if that's what we're going down to) alongside the ability for tests to inject arbitrary systemd unit directives, which would likely help solve other things too (e.g. ordering against other units).

Edit: of course, two booleans for this combines with wanting to retain a boolean exclusive: false because now there are other states that seem at best useless (exclusive: true, restrict: true).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra boolean isn't simpler for the implementation, but we should prioritize building intuitive directives for the test author. I think distinct flags help with that, but I'm okay with a three-valued enum if we can find sensible names for the values. I see how the arbitrary directive injection could be useful, but using it to override DynamicUser would be pretty awkward, and I don't think encouraging that helps with the goal of simplifying tests.

For completeness, I'll note a couple things. One is that an enum implies that there's only one dimension of hardening, and down the road we might find that we want to compose multiple hardening flags anyway. The other is that systemd has the sort of interlocking options you're arguing against. DynamicUser is a separate option from ProtectSystem, but implies it. (systemd has a gazillion options though, which I agree can be hard to work with.)

I have deadlines I need to focus on, so I'm muting this thread for now. I've tried to fully capture my constraints in #3060 (comment) (but s/flags/options/ if you want to pursue the enum). I'll check back later, which might be after OCP feature freeze, and my -1 stands until then.

case "":
break
default:
return fmt.Errorf("test %v specifies unknown isolation=%v", testname, targetMeta.Isolation)
}

if userdata == nil {
userdata = conf.EmptyIgnition()
}

warningsAction := conf.FailWarnings
if targetMeta.AllowConfigWarnings {
warningsAction = conf.IgnoreWarnings
Expand Down Expand Up @@ -980,6 +999,16 @@ Environment=KOLA_TEST_EXE=%s
Environment=%s=%s
ExecStart=%s
`, unitName, testname, base, kolaExtBinDataEnv, destDataDir, remotepath)
switch targetMeta.Isolation {
case "readonly":
unit += "ProtectSystem=strict\nPrivateTmp=yes\n"
case "dynamicuser":
unit += "DynamicUser=yes\n"
case "":
break
default:
return fmt.Errorf("test %v specifies unknown isolation=%v", testname, targetMeta.Isolation)
}
if targetMeta.InjectContainer {
if CosaBuild == nil {
return fmt.Errorf("test %v uses injectContainer, but no cosa build found", testname)
Expand Down Expand Up @@ -1091,7 +1120,7 @@ func registerTestDir(dir, testprefix string, children []os.FileInfo) error {
var dependencydir string
var meta externalTestMeta
var err error
userdata := conf.EmptyIgnition()
var userdata *conf.UserData
executables := []string{}
for _, c := range children {
fpath := filepath.Join(dir, c.Name())
Expand Down
9 changes: 9 additions & 0 deletions tests/kola-ci-self/tests/kola/exclusive-readonly
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash
# This verifies that isolation=dynamicuser maps to systemd DynamicUser=yes and
# (unlike the default kola model) runs as non-root.
# kola: { "isolation": "dynamicuser" }
dustymabe marked this conversation as resolved.
Show resolved Hide resolved
set -xeuo pipefail

id=$(id -u)
test "$id" '!=' 0
echo "ok dynamicuser"