Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intial implementation for #11807 #11813

Merged
merged 7 commits into from
Feb 24, 2022
Merged

Intial implementation for #11807 #11813

merged 7 commits into from
Feb 24, 2022

Conversation

apollo13
Copy link
Contributor

@apollo13 apollo13 commented Jan 10, 2022

Looks good so far:

➜  nomad git:(main) ✗ ./bin/nomad namespace list                                         
Name     Description               Drivers
default  Default shared namespace  *
test     <none>                    docker,test
➜  nomad git:(main) ✗ ./bin/nomad job plan test.nomad                                    
Error during plan: Unexpected response code: 500 (1 error occurred:
	* used task driver 'lala' in cache[redis] is not allowed in namespace test

@tgross Do you mind giving it an initial quick review -- if I am on the right track I will continue with it.

Refs #11807

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is looking really good @apollo13! I've left some comments based on our internal discussion of the RFC.

nomad/job_endpoint_validators.go Outdated Show resolved Hide resolved
Comment on lines 4960 to 4964
// NamespaceCapabilities represents a set of capabilities allowed for this
// namespace, to be checked at job submission time.
type NamespaceCapabilities struct {
EnabledTaskDrivers []string
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From our internal discussion of the RFC, I think we're going to want to have both a allowlist and a denylist. So can we add a DisabledTaskDrivers here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, how will those interact? exclusive or will allowlist allow and then denylist will deny from that set?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like the place where Apache's confusing order by allow,deny allow from all logic fits. I think it should be you allow first, and then prune out of that set.

api/namespace.go Outdated Show resolved Hide resolved
nomad/structs/structs.go Show resolved Hide resolved
api/namespace.go Outdated Show resolved Hide resolved
command/namespace_apply.go Outdated Show resolved Hide resolved
@apollo13
Copy link
Contributor Author

@tgross is that more in line with what you were thinking? (HCL parsing is weird at best, but it seems to work :D)

@apollo13
Copy link
Contributor Author

Btw, this could also nicely save #6554 -- but we'd have to modify the structs a bit.

Ie:

TaskDrivers = [{
  name: "docker",
  disabled: false,
  options: ["volumes", "privileged"]
}, {name: "raw_exec"} ...]

That said, given how dangerous most drivers can be I do not see much point in disabled drivers and would just keep an "allow list". I'd really love to fix #6554 with this as well

command/namespace_list.go Outdated Show resolved Hide resolved
nomad/structs/structs.go Outdated Show resolved Hide resolved
command/namespace_apply.go Show resolved Hide resolved
command/namespace_apply.go Outdated Show resolved Hide resolved
command/namespace_apply.go Outdated Show resolved Hide resolved
command/namespace_apply.go Outdated Show resolved Hide resolved
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example HCL spec:

name        = "dev"
description = "Namespace for developers"

capabilities {
  enabled_task_drivers  = ["docker", "exec"]
  disabled_task_drivers = ["raw_exec"]
}

Taking it for a spin:

$ nomad namespace apply ./namespace.hcl
Successfully applied namespace specification "dev"!

$ nomad namespace list
Name     Description               EnabledDrivers  DisabledDrivers
default  Default shared namespace  *               <none>
dev      Namespace for developers  docker,exec     raw_exec

$ nomad namespace status dev
Name            = dev
Description     = Namespace for developers
Quota           = <none>
EnabledDrivers  = docker,exec
DisabledDrivers = raw_exec

So far so good!

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @apollo13 this is looking really good! To get this out of draft status and moving forward:

  • Let's write some tests.
  • Would you mind adding a changelog file at .changelog/11807.txt? The format looks like this.
  • The changes to the command help text should get mirrored to apply and status docs.

nomad/job_endpoint_validators_test.go Outdated Show resolved Hide resolved
@the-maldridge
Copy link

So I've had more time to think about this in relation to #6554 and I think a really elegant solution would just be to dispense the task driver plugins multiple times. As I understand it, you can kind of put whatever you want as the HCL block identifier for plugin configs. For drivers that have a concept of lower privilege modes, you could just dispense the driver twice with two different configs (thinking mostly of container drivers here like docker, podman, and containerd, where the driver is aware of flags that need to be monitored in the config). You could then only allow use of the privileged drivers in specific namespaces.

This approach would also let you do this in a mostly opaque way, since Nomad doesn't need to know which machines or drivers provide privileged access, from its perspective its just another driver.

@tgross
Copy link
Member

tgross commented Feb 2, 2022

For drivers that have a concept of lower privilege modes, you could just dispense the driver twice with two different configs (thinking mostly of container drivers here like docker, podman, and containerd, where the driver is aware of flags that need to be monitored in the config). You could then only allow use of the privileged drivers in specific namespaces.

Hm, yeah I like where you're going with that. (That work would be separate from this PR though, so that we can ship this feature independently.)

@the-maldridge
Copy link

For sure would be a distinct item. I was wanting to raise it here because I think that once this is merged #6605 can be solved almost entirely by existing features and functionality.

@apollo13
Copy link
Contributor Author

apollo13 commented Feb 8, 2022

@tgross Pushed a few more changes. There is one failing test (at least) in https://app.circleci.com/pipelines/github/hashicorp/nomad/19158/workflows/fddc3c0b-1b53-467e-87d5-b6aa50d77b9d/jobs/193573 -- I am not sure how to fix it? One way to fix it would be to adjust my error message about non-existing namespace to match the one upsertJobImpl raises but then I am not sure if the existing test checks the wanted state…

@apollo13
Copy link
Contributor Author

apollo13 commented Feb 8, 2022

And now I killed semgrep:

2022-02-08T09:39:30.1063733Z === [ERROR] `/root/.local/bin/semgrep --skip-unknown-extensions --disable-nosem --json --autofix --dryrun --time --timeout-threshold 3 --config .semgrep/ --debug -o /tmp/tmpq_7xg6m2 .changelog/11807.txt nomad/job_endpoint_validators.go nomad/job_endpoint_validators_test.go api/namespace.go command/namespace_apply.go command/namespace_status.go nomad/job_endpoint.go nomad/structs/structs.go website/content/docs/commands/namespace/apply.mdx website/content/docs/commands/namespace/status.mdx` failed with exit code 7
2022-02-08T09:39:30.1065510Z 
2022-02-08T09:39:30.1066447Z This is an internal error, please file an issue at https://github.com/returntocorp/semgrep-action/issues/new/choose

@apollo13 apollo13 marked this pull request as ready for review February 20, 2022 11:59
@apollo13 apollo13 requested a review from tgross February 20, 2022 12:00
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great, @apollo13

And now I killed semgrep:

Weird! We're still experimenting with semgrep a bit so we won't consider that blocking this PR. I'll open an issue with upstream on that run. Edit: already done by @shoenig https://github.com/returntocorp/semgrep-action/issues/505

nomad/job_endpoint_validators.go Outdated Show resolved Hide resolved
nomad/job_endpoint_validators_test.go Outdated Show resolved Hide resolved
website/content/docs/commands/namespace/apply.mdx Outdated Show resolved Hide resolved
nomad/job_endpoint_validators.go Outdated Show resolved Hide resolved
nomad/structs/structs.go Outdated Show resolved Hide resolved
nomad/job_endpoint_validators.go Outdated Show resolved Hide resolved
nomad/job_endpoint_validators.go Outdated Show resolved Hide resolved
nomad/job_endpoint_validators.go Outdated Show resolved Hide resolved
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few more comments. Thanks for your patience in refining this!

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I don't know why the website preview is skipping this build, but I've got our docs folks looking at it. In the meantime, I pulled the PR down locally and looked at the doc page updates and that looks good too.

Thanks for this contribution, @apollo13! This will ship in Nomad 1.3.0.

@tgross tgross merged commit b84f70a into hashicorp:main Feb 24, 2022
@tgross tgross added this to the 1.3.0 milestone Feb 24, 2022
@apollo13
Copy link
Contributor Author

Wow this is great, thank you so much!

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants