-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically adjust Elastic Agent hostPath permissions #6599
Conversation
Disable global CA test when running locally. Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
buildkite test this -f p=gke,s=7.17.8 |
buildkite test this -f p=gke,s=7.17.8 |
buildkite test this -f p=gke,s=8.6.2 |
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
@@ -97,11 +95,7 @@ spec: | |||
podTemplate: | |||
spec: | |||
serviceAccountName: elastic-agent | |||
hostNetwork: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we removing hostNetwork: true
?
@@ -79,8 +79,6 @@ spec: | |||
spec: | |||
serviceAccountName: fleet-server | |||
automountServiceAccountToken: true | |||
securityContext: | |||
runAsUser: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still required for the container's CA bundle to be updated, see:
cloud-on-k8s/pkg/controller/agent/pod.go
Lines 360 to 366 in 379c19a
// Beats managed by the Elastic Agent don't trust the Elasticsearch CA that Elastic Agent itself is configured // to trust. There is currently no way to configure those Beats to trust a particular CA. The intended way to handle // it is to allow Fleet to provide Beat output settings, but due to https://github.com/elastic/kibana/issues/102794 // this is not supported outside of UI. To workaround this limitation the Agent is going to update Pod-wide CA store // before starting Elastic Agent. cmd := trustCAScript(path.Join(certificatesDir(esAssociation), CAFileName)) return builder.WithCommand([]string{"/usr/bin/env", "bash", "-c", cmd}), nil - https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-elastic-agent-fleet-known-limitations.html#k8s-elastic-agent-fleet-known-limitations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't manage to get it working on OpenShift:
pod/elastic-agent-agent-mdgdt 0/1 Init:Error 2 (26s ago) 29s 10.128.2.24 barkbay-ocp-kwdmv-worker-d-84dkv.c.elastic-cloud-dev.internal <none> <none>
pod/elastic-agent-agent-p9vql 0/1 Init:Error 2 (26s ago) 30s 10.131.0.17 barkbay-ocp-kwdmv-worker-c-5q6vm.c.elastic-cloud-dev.internal <none> <none>
pod/elastic-agent-agent-r5dxn 0/1 Init:Error 2 (26s ago) 30s 10.129.2.18 barkbay-ocp-kwdmv-worker-b-6sh5v.c.elastic-cloud-dev.internal <none> <none
k logs pod/elastic-agent-agent-p9vql -c permissions
chmod: changing permissions of '/usr/share/elastic-agent/state': Permission denied
Setting privileged: true
helps the init container to run:
--- a/pkg/controller/agent/volume.go
+++ b/pkg/controller/agent/volume.go
@@ -7,6 +7,7 @@ package agent
import (
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/resource"
+ ptr "k8s.io/utils/pointer"
"github.com/blang/semver/v4"
@@ -57,7 +58,8 @@ func maybeAgentInitContainerForHostpathVolume(spec *agentv1alpha1.AgentSpec, v s
Command: hostPathVolumeInitContainerCommand(),
Name: hostPathVolumeInitContainerName,
SecurityContext: &corev1.SecurityContext{
- RunAsUser: pointer.Int64(0),
+ RunAsUser: pointer.Int64(0),
+ Privileged: ptr.Bool(true),
},
Resources: hostPathVolumeInitContainerResources,
VolumeMounts: []corev1.VolumeMount{
But then the agent container fails to start:
pod/elastic-agent-agent-78bjs 0/1 CrashLoopBackOff 5 (95s ago) 4m36s 10.128.2.25 barkbay-ocp-kwdmv-worker-d-84dkv.c.elastic-cloud-dev.internal <none> <none>
pod/elastic-agent-agent-r8gmq 0/1 CrashLoopBackOff 5 (89s ago) 4m35s 10.131.0.18 barkbay-ocp-kwdmv-worker-c-5q6vm.c.elastic-cloud-dev.internal <none> <none>
pod/elastic-agent-agent-w2ztb 0/1 CrashLoopBackOff 5 (102s ago) 4m36s 10.129.2.20 barkbay-ocp-kwdmv-worker-b-6sh5v.c.elastic-cloud-dev.internal <none> <none>
kubectl logs elastic-agent-agent-r8gmq -n agent -f
Defaulted container "agent" out of: agent, permissions (init)
Error: preparing STATE_PATH(/usr/share/elastic-agent/state) failed: mkdir /usr/share/elastic-agent/state/data: permission denied
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.6/fleet-troubleshooting.html
(the ServiceAccount
I'm using for Agent is in the privileged
SCC)
pkg/controller/agent/volume.go
Outdated
set -e | ||
if [[ -d /usr/share/elastic-agent/state ]]; then | ||
chmod g+rw /usr/share/elastic-agent/state | ||
chgrp 1000 /usr/share/elastic-agent/state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 1000
?
The
I guess we use |
That is correct.
That is odd as I tested on Openshift, and still have the agent running successfully in my cluster
I did have to run this:
Maybe it's something to do with Openshift versions? What version were you running @barkbay ?
I'm going to wipe this out, and start fresh and see if I can replicate what you're seeing.... |
To be honest I'm a bit surprised that it's possible to change the permissions on the host file system from a container without being privileged because of |
I have to admit that I'm not a big fan of depending on such implementation detail. We should assume that a container can run as any user id. |
Good point. I'll work to deduce the group from the configuration so this can run as any and update when implementation/testing is complete. |
I did the same test again on a brand new cluster with the same result:
The original resources manifest I used is here: https://gist.github.com/barkbay/e9c240ea1a7333d428e5508a155de66c#file-kubernetes-integration-yaml Note that I adjusted the namespace as resources are usually never deployed in the |
As mentioned in one of my previous message I suspect
|
Thanks for the follow up. I replicated the same behavior late yesterday after having massive issues with my OpenShift cluster and finally just rebuilding it. I'm still a bit baffled as to why it was working before but will move forward with what I'm seeing now |
…econciliation. Detect openshift when adding agent init container to be able to add 'privileged: true' automatically. Run 'chcon' on Agent state directory when running within Openshift. Add bool func to our utils/pointer package. Update tests for new functionality Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Signed-off-by: Michael Montgomery <mmontg1@gmail.com>
Drive-by comment: Is it a good idea to special case OpenShift here? Would not the same restrictions that we are trying to work around for OpenShift apply to any non-OpenShift cluster as well if it has SELinux set up? |
|
||
const ( | ||
hostPathVolumeInitContainerName = "permissions" | ||
chconCmd = "chcon -Rt svirt_sandbox_file_t /usr/share/elastic-agent/state" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using chcon
in a privileged container without explicit user consent seems wrong to me from a security point of view.
Yes. Also I'm wondering if the opposite is possible: running More generally, I'm a bit puzzled by the idea of building a feature on something that I considered a "best effort" (using |
This is being closed in favor of documenting a daemonset that can be used to prepare the agent directory for running elastic agent without the need to run as root. This decision was made after discussing the security concerns around automatically managing these permissions without explicit user consent, and having a daemonset that needed to be applied prior to running Agent allows such consent. |
closes #6239
closes #6543
Background
Currently it is required to have the following set when running Elastic Agent with a
hostPath
:The only way to avoid this is configuring an
emptyDir
volume instead ofhostPath
.What this proposes
Detailed further in #6239 we want to automatically add an
initContainer
that maintains the Agent permissions to avoid requiring the Agent to run perpetually as root.If the following all are true, an
initContainer
is automatically added to Elastic Agent that maintains permissionsemptyDir
.Additional notable change
runAsUser: 0
from all e2e Agent tests as it's unnecessary now.privileged: true
is required to be enabled for the initConatiner, along withchcon -Rt svirt_sandbox_file_t /usr/share/elastic-agent/state
is needed to be run to managed Selinux permissions properly.Testing
make e2e-local
of all Agent e2e tests using8.7.0-SNAPSHOT
and enabling all disabled Agent tests because of TestFleet* fail on 8.6.x #6331TODO