Potentially chown Elastic Agent hostpath data directory #6239

naemono · 2022-12-15T15:08:12Z

There have been a number of issues/PRs concerning this issue: #5993, #6147, #6205, #6193.

The following is required when running Elastic Agent with a hostPath:

    podTemplate:
      spec:
        containers:
          - name: agent
            securityContext:
              runAsUser: 0

If not, you get this error:

Error: preparing STATE_PATH(/usr/share/elastic-agent/state) failed: mkdir /usr/share/elastic-agent/state/data: permission denied
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.5/fleet-troubleshooting.html

An initContainer that does the following allows Elastic Agent to work properly without The Agent Itself using runAsUser: 0

      initContainers:
      - command:
        - sh
        - -c
        - chown 1000:1000 /usr/share/elastic-agent/state
        image: docker.elastic.co/beats/elastic-agent:8.5.0
        imagePullPolicy: IfNotPresent
        name: permissions
        securityContext:
          runAsUser: 0

This is more complicated in a situation such as openshift where UIDs are randomized, but likely doable.

So the question is, do we pursue this path to make the UX for Elastic Agent more consistent between empty emptyDir, and hostPath?

Security Note

The initContainer still runs as runAsUser: 0.
At least it only runs for a couple of seconds, as opposed to running "forever" as uid 0, which seems to minimize the time frame where a security issue could stem from this.

The text was updated successfully, but these errors were encountered:

naemono · 2022-12-15T20:29:55Z

After discussion, we've decided to take the approach of using an init container to make this user experience better. Since the gid in openshift is known, we'll take this approach:

      initContainers:
      - command:
        - sh
        - -c
        - chmod g+w /usr/share/elastic-agent/state && chgrp 1000 /usr/share/elastic-agent/state
        image: docker.elastic.co/beats/elastic-agent:8.5.0
        imagePullPolicy: IfNotPresent
        name: permissions
        securityContext:
          runAsUser: 0

brsolomon-deloitte · 2023-01-13T16:31:33Z

Also related: #6280

gittihub123 · 2023-02-13T15:17:26Z

Hi @naemono
I have been stuck with this issue for a couple of days and can't get it working.
We are using Openshift 4.12 & argoCD with the elastic operator in Openshift.

I followed the official eck k8s 2.6 documentation and created the required resources.

Worth mentioning is that we implemented the compliance operator and have used the CIS operator to hardening the platform.

apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: fleet-server-dev
  namespace: elastic-dev
spec:
  version: 8.6.1
  kibanaRef:
    name: kibanadev
  elasticsearchRefs:
  - name: esdev01
  mode: fleet
  fleetServerEnabled: true
  deployment:
    replicas: 1
    podTemplate:
      spec:
        serviceAccountName: elastic-agent
        automountServiceAccountToken: true
        securityContext:
          runAsUser: 0
---
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: elastic-agent-dev
  namespace: elastic-dev
spec:
  version: 8.6.1
  kibanaRef:
    name: kibanadev
  fleetServerRef:
    name: fleet-server-dev
  mode: fleet
  daemonSet:
    podTemplate:
      spec:
        serviceAccountName: elastic-agent
        automountServiceAccountToken: true
        securityContext:
          runAsUser: 0
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: elastic-agent
rules:
- apiGroups: [""] # "" indicates the core API group
  resources:
  - pods
  - nodes
  - namespaces
  verbs:
  - get
  - watch
  - list
- apiGroups: ["coordination.k8s.io"]
  resources:
  - leases
  verbs:
  - get
  - create
  - update
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: elastic-agent
  namespace: elastic-dev
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: elastic-agent
subjects:
- kind: ServiceAccount
  name: elastic-agent
  namespace: elastic-dev
roleRef:
  kind: Role
  name: elastic-agent
  apiGroup: rbac.authorization.k8s.io

Rolebinding

Name:         elastic-agent-rb
Labels:       <none>
Annotations:  <none>
Role:
  Kind:  ClusterRole
  Name:  system:openshift:scc:privileged
Subjects:
  Kind            Name           Namespace
  ----            ----           ---------
  ServiceAccount  elastic-agent  elastic-dev

The hostpath is created on the physical machine but we are still getting permissions denied!

Error: preparing STATE_PATH(/usr/share/elastic-agent/state) failed: mkdir /usr/share/elastic-agent/state/data: permission denied
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.6/fleet-troubleshooting.html

naemono · 2023-02-15T14:08:01Z

@gittihub123 I'll investigate this and get back to you.

naemono · 2023-02-15T14:25:24Z

@gittihub123 The below appears to be required in the case of openshift:

  deployment:
    replicas: 1
    podTemplate:
      spec:
        containers:
        - name: agent
          securityContext:  
            privileged: true <==== This is the piece that's required in openshift

gittihub123 · 2023-02-21T15:01:29Z

Hi @naemono
This does not work on Openshift cluster because SElinux block it from creating files on the host filesystem.

The same applies when I try to create a standalone filebeat instance with this configuration.

# CRD to create beats with ECK (Pod(s))
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
   name: panos-filebeat
   namespace: elastic-dev
spec:
  type: filebeat
  version: 8.6.1
  elasticsearchRef:
    name: esdev
  kibanaRef:
    name: kibanadev
  config:
    filebeat.modules:
    - module: panw
      panos:
        enabled: true
        var.syslog_host: 0.0.0.0
  daemonSet:
    podTemplate:
      spec:
        serviceAccountName: filebeat
        automountServiceAccountToken: true
        securityContext:
          privileged: true

Error message

one or more objects failed to apply, reason: admission webhook "elastic-beat-validation-v1beta1.k8s.elastic.co" denied the request: Beat.beat.k8s.elastic.co "panos-filebeat" is invalid: privileged: Invalid value: "privileged": privileged field found in the kubectl.kubernetes.io/last-applied-configuration annotation is unknown. This is often due to incorrect indentation in the manifest.

naemono · 2023-02-21T16:07:39Z

@gittihub123 Running Agent and/or Beat in an openshift environment has many more complexities than running in a standard Kubernetes environment. We document these issues here. We also have some beats recipes that we use in our e2e tests that we run on a regular basis here. I just successfully deployed this beat recipe on an openshift 4.9 cluster, following our documentation noted above, specifically:

oc adm policy add-scc-to-user privileged -z filebeat -n elastic

Then applied this manifest, which worked after a bit of time (beat pods crash once or twice while users/api keys are being propagated throughout the Elastic stack)

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: filebeat
spec:
  type: filebeat
  version: 8.6.0
  elasticsearchRef:
    name: testing
  kibanaRef:
    name: kibana
  config:
    filebeat.autodiscover.providers:
    - node: ${NODE_NAME}
      type: kubernetes
      hints.default_config.enabled: "false"
      templates:
      - condition.equals.kubernetes.namespace: log-namespace
        config:
        - paths: ["/var/log/containers/*${data.kubernetes.container.id}.log"]
          type: container
      - condition.equals.kubernetes.labels.log-label: "true"
        config:
        - paths: ["/var/log/containers/*${data.kubernetes.container.id}.log"]
          type: container
    processors:
    - add_cloud_metadata: {}
    - add_host_metadata: {}
  daemonSet:
    podTemplate:
      spec:
        serviceAccountName: filebeat
        automountServiceAccountToken: true
        terminationGracePeriodSeconds: 30
        # dnsPolicy: ClusterFirstWithHostNet
        # hostNetwork: true # Allows to provide richer host metadata
        containers:
        - name: filebeat
          securityContext:
            runAsUser: 0
            # If using Red Hat OpenShift uncomment this:
            privileged: true
          volumeMounts:
          - name: varlogcontainers
            mountPath: /var/log/containers
          - name: varlogpods
            mountPath: /var/log/pods
          - name: varlibdockercontainers
            mountPath: /var/lib/docker/containers
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
        volumes:
        - name: varlogcontainers
          hostPath:
            path: /var/log/containers
        - name: varlogpods
          hostPath:
            path: /var/log/pods
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: filebeat
rules:
- apiGroups: [""] # "" indicates the core API group
  resources:
  - namespaces
  - pods
  - nodes
  verbs:
  - get
  - watch
  - list
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: filebeat
  namespace: elastic
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: filebeat
subjects:
- kind: ServiceAccount
  name: filebeat
  namespace: elastic
roleRef:
  kind: ClusterRole
  name: filebeat
  apiGroup: rbac.authorization.k8s.io
# ---
# My Elasticsearch cluster already existed....
# apiVersion: elasticsearch.k8s.elastic.co/v1
# kind: Elasticsearch
# metadata:
#   name: elasticsearch
# spec:
#   version: 8.6.1
#   nodeSets:
#   - name: default
#     count: 3
#     config:
#       node.store.allow_mmap: false
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: kibana
spec:
  version: 8.6.0
  count: 1
  elasticsearchRef:
    name: testing
# ...

Note the difference in the daemonset.podTemplate.spec and where the securityContext is applied:

  daemonSet:
    podTemplate:
      spec:
        serviceAccountName: filebeat
        automountServiceAccountToken: true
        terminationGracePeriodSeconds: 30
        # dnsPolicy: ClusterFirstWithHostNet
        # hostNetwork: true # Allows to provide richer host metadata
        containers:
        - name: filebeat
          securityContext:
            runAsUser: 0
            # If using Red Hat OpenShift uncomment this:
            privileged: true

gittihub123 · 2023-02-22T10:36:53Z

Hi @naemono
Thank you for the explaination. The filebeat work now but our goal is to implement elastic agent and activate different types of modules to collect syslog from outside of the cluster, from palo alto, cisco FTD, Cisco ASA etc.

So far, the elastic-agent is running and is managed by fleet but it's only collecting logs from Openshift (logs/metrics). The elastic stack is running in the same namespaces and I have connection between all pods (Elasticsearch, kibana, fleet & elastic-agent).

This is my configuration

apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: {{ .Values.kibana.name }}
  namespace: {{ .Values.namespace }}
spec:
  http:
    tls:
      certificate:
        secretName: {{ .Values.tls.certificate }}
  config:
    server.publicBaseUrl: "https://XXX.YYY.ZZZ/"
    xpack.fleet.agents.elasticsearch.hosts: ["https://esdev-es-http.elastic-dev.svc:9200"]
    xpack.fleet.agents.fleet_server.hosts: ["https://fleet-server-dev-agent-http.elastic-dev.svc:8220"]
    xpack.fleet.packages:
      - name: system
        version: latest
      - name: elastic_agent
        version: latest
      - name: fleet_server
        version: latest
    xpack.fleet.agentPolicies:
      - name: Fleet Server test
        id: eck-fleet-server
        is_default_fleet_server: true
        namespace: agent
        monitoring_enabled:
          - logs
          - metrics
        package_policies:
        - name: fleet_server-1
          id: fleet_server-1
          package:
            name: fleet_server
      - name: Elastic Agent on ECK policy
        id: eck-agent
        namespace: agent
        monitoring_enabled:
          - logs
          - metrics
        unenroll_timeout: 900
        is_default: true
        package_policies:
          - name: system-1
            id: system-1
            package:
              name: system
          - name: CiscoFTD
            id: CiscoFTD
            package:
              name: Cisco FTD
          - name: palo-alto
            id: palo-alto
            package:
              name: panos
  version: {{ .Values.version }}
  count: {{ .Values.kibana.nodes }}
  elasticsearchRef:
    name: {{ .Values.name }}
  podTemplate:
    spec:
      containers:
      - name: kibana
        resources:
          limits:
            memory: {{ .Values.kibana.resources.limits.memory }}
            cpu: {{ .Values.kibana.resources.limits.cpu }}

I believe the network flow would be something like this right?

Syslog source (ciscoFTD, palo alto etc) -> Openshift route (for example ciscoftd.dev.test.com) -> elastic agent SVC (created by me to expose the elastic agents) -> elastic-agent pods.

This should be possible or should we try to do it another way?

Thanks.

naemono · 2023-02-24T15:24:31Z

Syslog source (ciscoFTD, palo alto etc) -> Openshift route (for example ciscoftd.dev.test.com) -> elastic agent SVC (created by me to expose the elastic agents) -> elastic-agent pods.

This solution makes sense to me using a custom tcp agent integration...

ebuildy · 2023-05-02T11:54:24Z

the solution will not work if you use a keystore !

Because operator append an initContainer before the permissions container ....

naemono added the discuss We need to figure this out label Dec 15, 2022

botelastic bot added the triage label Dec 15, 2022

naemono added the >feature Adds or discusses adding a feature to the product label Dec 15, 2022

botelastic bot removed the triage label Dec 15, 2022

naemono added triage and removed discuss We need to figure this out >feature Adds or discusses adding a feature to the product labels Dec 15, 2022

thbkrkr added the >enhancement Enhancement of existing functionality label Dec 16, 2022

botelastic bot removed the triage label Dec 16, 2022

naemono self-assigned this Feb 15, 2023

naemono mentioned this issue Mar 2, 2023

Elastic Agent installed packages permission errors #6266

Closed

naemono mentioned this issue Mar 27, 2023

Automatically adjust Elastic Agent hostPath permissions #6599

Closed

9 tasks

gbschenkel mentioned this issue Apr 5, 2023

Run as non-root Elasticsearch is outdated and not relevant when running on OpenShift #5913

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potentially chown Elastic Agent hostpath data directory #6239

Potentially chown Elastic Agent hostpath data directory #6239

naemono commented Dec 15, 2022

naemono commented Dec 15, 2022

brsolomon-deloitte commented Jan 13, 2023

gittihub123 commented Feb 13, 2023

naemono commented Feb 15, 2023

naemono commented Feb 15, 2023

gittihub123 commented Feb 21, 2023

naemono commented Feb 21, 2023

gittihub123 commented Feb 22, 2023

naemono commented Feb 24, 2023

ebuildy commented May 2, 2023

Potentially chown Elastic Agent hostpath data directory #6239

Potentially chown Elastic Agent hostpath data directory #6239

Comments

naemono commented Dec 15, 2022

naemono commented Dec 15, 2022

brsolomon-deloitte commented Jan 13, 2023

gittihub123 commented Feb 13, 2023

naemono commented Feb 15, 2023

naemono commented Feb 15, 2023

gittihub123 commented Feb 21, 2023

naemono commented Feb 21, 2023

gittihub123 commented Feb 22, 2023

naemono commented Feb 24, 2023

ebuildy commented May 2, 2023