Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drivers/exec+java: reduce default set of linux capabilities #10600

Merged
merged 8 commits into from
May 17, 2021

Conversation

shoenig
Copy link
Member

@shoenig shoenig commented May 15, 2021

d817d12

drivers/exec: enable setting allow_caps on exec driver

This PR enables setting allow_caps on the exec driver
plugin configuration, as well as cap_add and cap_drop in
exec task configuration. These options replicate the
functionality already present in the docker task driver.

Important: this change also reduces the default set of
capabilities enabled by the exec driver to match the
default set enabled by the docker driver. Until v1.0.5
the exec task driver would enable all capabilities supported
by the operating system. v1.0.5 removed NET_RAW from that
list of default capabilities, but left may others which
could potentially also be leveraged by compromised tasks.

Important: the "root" user is still special cased when
used with the exec driver. Older versions of Nomad enabled
enabled all capabilities supported by the operating system
for tasks set with the root user. To maintain compatibility
with existing clusters we continue supporting this "feature",
however we maintain support for the legacy set of capabilities
rather than enabling all capabilities now supported on modern
operating systems.

c53a948

drivers/java: enable setting allow_caps on java driver

Enable setting allow_caps on the java task driver plugin, along
with the associated cap_add and cap_drop options in java task
configuration.

7984182

drivers/docker: reuse capabilities plumbing in docker driver

This changeset does not introduce any functional change for the
docker driver, but rather cleans up the implementation around
computing configured capabilities by re-using code written for
the exec/java task drivers.

a71081c

docs: update docs for linux capabilities in exec/java/docker drivers

Update docs for allow_caps, cap_add, cap_drop in exec/java/docker driver
pages. Also update upgrade guide with guidance on new default linux
capabilities for exec and java drivers.

Copy link
Contributor

@notnoop notnoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes and the code LGTM.

I'm uncertain about the timing - whether we have sufficient baking time for 1.1.0 or whether to wait until 1.2.0. While it's backward incompatible, we have plenty of precedence for security related backward incompatible changes in point releases at this point.

//
// Important (!): when adding fields, make sure to update the RPC methods in
// grpcExecutorClient.Launch and grpcExecutorServer.Launch. Number of hours
// spent tracking this down: too many.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 I've been bitten by this too; or when you update the serializer but not deserializer. For these types, it might make sense to have a property test that ensures data round-trips without any loss.

Comment on lines +539 to +540
// when running as root, use the legacy set of system capabilities, so
// that we do not break existing nomad clusters using this "feature"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunate. The good thing is that allowing root in exec tasks is an opt-in behavior. We can also change it and provide the legacy configuration flag.

@@ -494,7 +495,6 @@ CapAmb: 0000000000000000`,

for _, c := range cases {
t.Run(c.user, func(t *testing.T) {
require := require.New(t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This delights me - I dislike require.New(t) - it doesn't really save any keystores and makes it easy to get subtest assertions associated with the wrong t.

Comment on lines +111 to +112
// upper - indicates whether to uppercase and prefix capabilities with CAP_
func (s *Set) Slice(upper bool) []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these representations (caps_ upper case vs "naked" lower case) have a proper names or distinct usage? It might make sense to have different methods with distinct names? The boolean parameter is hard to decipher on the call sites.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what gets called as input to either docker (lower-case) or libcontainer (upper-case). Log messages and tests use lower case, just because (IMHO) it's easier on the eyes.

We could have two methods - maybe Enums() and ... Args()? Naming the second one is difficult 🙂

SYS_TIME SYS_TTY_CONFIG WAKE_ALARM
```

The capabilities now enabled by default are modeled after Docker default [`linux capabilities`]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line might be a bit misleading/confusing:

  • linux capabilities doesn't seem to link anywhere, and probably needs to?
  • the capabilities listed below don't exactly match what Nomad uses, because it includes NET_RAW. Might be worth re-clarifying this to make it clear to administrators performing an upgrade what capabilities their tasks will have.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The anchor magic at the bottom links linux capabilities to https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities (the interesting bit of which is the table that can't be linked directly to)

Ahhhh nice catch! I copied this list from the old set; the new set does not include NET_RAW

```

which is modeled after the capabilities allowed by [docker by default][docker_caps]
(sans [`NET_RAW`][no_net_raw]). Allows the operator to control which capabilities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😅 Honestly, I didn't know what "sans" meant. Might be worth considering if this should be something simpler, like "without"?

Suggested change
(sans [`NET_RAW`][no_net_raw]). Allows the operator to control which capabilities
(without [`NET_RAW`][no_net_raw]). Allows the operator to control which capabilities

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😬 You aren't the only - I get confused every time I try to remember the meaning of San Serif fonts :).

This PR enables setting allow_caps on the exec driver
plugin configuration, as well as cap_add and cap_drop in
exec task configuration. These options replicate the
functionality already present in the docker task driver.

Important: this change also reduces the default set of
capabilities enabled by the exec driver to match the
default set enabled by the docker driver. Until v1.0.5
the exec task driver would enable all capabilities supported
by the operating system. v1.0.5 removed NET_RAW from that
list of default capabilities, but left may others which
could potentially also be leveraged by compromised tasks.

Important: the "root" user is still special cased when
used with the exec driver. Older versions of Nomad enabled
enabled all capabilities supported by the operating system
for tasks set with the root user. To maintain compatibility
with existing clusters we continue supporting this "feature",
however we maintain support for the legacy set of capabilities
rather than enabling all capabilities now supported on modern
operating systems.
Enable setting allow_caps on the java task driver plugin, along
with the associated cap_add and cap_drop options in java task
configuration.
This changeset does not introduce any functional change for the
docker driver, but rather cleans up the implementation around
computing configured capabilities by re-using code written for
the exec/java task drivers.
Update docs for allow_caps, cap_add, cap_drop in exec/java/docker driver
pages. Also update upgrade guide with guidance on new default linux
capabilities for exec and java drivers.
The error output being checked depends on the linux caps supported
by the particular operating system. Fix these test cases to just
check that an error did occur.
Looks like we no longer need a package.
Add capabilities to the LaunchRequest proto so that the
capabilities set actually gets plumbed all the way through
to task launch.
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants