Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potentially chown Elastic Agent hostpath data directory #6239

Open
naemono opened this issue Dec 15, 2022 · 10 comments
Open

Potentially chown Elastic Agent hostpath data directory #6239

naemono opened this issue Dec 15, 2022 · 10 comments
Assignees
Labels
>enhancement Enhancement of existing functionality

Comments

@naemono
Copy link
Contributor

naemono commented Dec 15, 2022

There have been a number of issues/PRs concerning this issue: #5993, #6147, #6205, #6193.

The following is required when running Elastic Agent with a hostPath:

    podTemplate:
      spec:
        containers:
          - name: agent
            securityContext:
              runAsUser: 0

If not, you get this error:

Error: preparing STATE_PATH(/usr/share/elastic-agent/state) failed: mkdir /usr/share/elastic-agent/state/data: permission denied
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.5/fleet-troubleshooting.html

An initContainer that does the following allows Elastic Agent to work properly without The Agent Itself using runAsUser: 0

      initContainers:
      - command:
        - sh
        - -c
        - chown 1000:1000 /usr/share/elastic-agent/state
        image: docker.elastic.co/beats/elastic-agent:8.5.0
        imagePullPolicy: IfNotPresent
        name: permissions
        securityContext:
          runAsUser: 0

This is more complicated in a situation such as openshift where UIDs are randomized, but likely doable.

So the question is, do we pursue this path to make the UX for Elastic Agent more consistent between empty emptyDir, and hostPath?

Security Note

  • The initContainer still runs as runAsUser: 0.
  • At least it only runs for a couple of seconds, as opposed to running "forever" as uid 0, which seems to minimize the time frame where a security issue could stem from this.
@naemono naemono added the discuss We need to figure this out label Dec 15, 2022
@botelastic botelastic bot added the triage label Dec 15, 2022
@naemono naemono added the >feature Adds or discusses adding a feature to the product label Dec 15, 2022
@botelastic botelastic bot removed the triage label Dec 15, 2022
@naemono naemono added triage and removed discuss We need to figure this out >feature Adds or discusses adding a feature to the product labels Dec 15, 2022
@naemono
Copy link
Contributor Author

naemono commented Dec 15, 2022

After discussion, we've decided to take the approach of using an init container to make this user experience better. Since the gid in openshift is known, we'll take this approach:

      initContainers:
      - command:
        - sh
        - -c
        - chmod g+w /usr/share/elastic-agent/state && chgrp 1000 /usr/share/elastic-agent/state
        image: docker.elastic.co/beats/elastic-agent:8.5.0
        imagePullPolicy: IfNotPresent
        name: permissions
        securityContext:
          runAsUser: 0

@thbkrkr thbkrkr added the >enhancement Enhancement of existing functionality label Dec 16, 2022
@botelastic botelastic bot removed the triage label Dec 16, 2022
@brsolomon-deloitte
Copy link

Also related: #6280

@gittihub123
Copy link

Hi @naemono
I have been stuck with this issue for a couple of days and can't get it working.
We are using Openshift 4.12 & argoCD with the elastic operator in Openshift.

I followed the official eck k8s 2.6 documentation and created the required resources.

Worth mentioning is that we implemented the compliance operator and have used the CIS operator to hardening the platform.

apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: fleet-server-dev
  namespace: elastic-dev
spec:
  version: 8.6.1
  kibanaRef:
    name: kibanadev
  elasticsearchRefs:
  - name: esdev01
  mode: fleet
  fleetServerEnabled: true
  deployment:
    replicas: 1
    podTemplate:
      spec:
        serviceAccountName: elastic-agent
        automountServiceAccountToken: true
        securityContext:
          runAsUser: 0
---
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: elastic-agent-dev
  namespace: elastic-dev
spec:
  version: 8.6.1
  kibanaRef:
    name: kibanadev
  fleetServerRef:
    name: fleet-server-dev
  mode: fleet
  daemonSet:
    podTemplate:
      spec:
        serviceAccountName: elastic-agent
        automountServiceAccountToken: true
        securityContext:
          runAsUser: 0
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: elastic-agent
rules:
- apiGroups: [""] # "" indicates the core API group
  resources:
  - pods
  - nodes
  - namespaces
  verbs:
  - get
  - watch
  - list
- apiGroups: ["coordination.k8s.io"]
  resources:
  - leases
  verbs:
  - get
  - create
  - update
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: elastic-agent
  namespace: elastic-dev
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: elastic-agent
subjects:
- kind: ServiceAccount
  name: elastic-agent
  namespace: elastic-dev
roleRef:
  kind: Role
  name: elastic-agent
  apiGroup: rbac.authorization.k8s.io

Rolebinding

Name:         elastic-agent-rb
Labels:       <none>
Annotations:  <none>
Role:
  Kind:  ClusterRole
  Name:  system:openshift:scc:privileged
Subjects:
  Kind            Name           Namespace
  ----            ----           ---------
  ServiceAccount  elastic-agent  elastic-dev

The hostpath is created on the physical machine but we are still getting permissions denied!

Error: preparing STATE_PATH(/usr/share/elastic-agent/state) failed: mkdir /usr/share/elastic-agent/state/data: permission denied
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.6/fleet-troubleshooting.html

@naemono
Copy link
Contributor Author

naemono commented Feb 15, 2023

@gittihub123 I'll investigate this and get back to you.

@naemono naemono self-assigned this Feb 15, 2023
@naemono
Copy link
Contributor Author

naemono commented Feb 15, 2023

@gittihub123 The below appears to be required in the case of openshift:

  deployment:
    replicas: 1
    podTemplate:
      spec:
        containers:
        - name: agent
          securityContext:  
            privileged: true <==== This is the piece that's required in openshift

@gittihub123
Copy link

Hi @naemono
This does not work on Openshift cluster because SElinux block it from creating files on the host filesystem.

The same applies when I try to create a standalone filebeat instance with this configuration.

# CRD to create beats with ECK (Pod(s))
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
   name: panos-filebeat
   namespace: elastic-dev
spec:
  type: filebeat
  version: 8.6.1
  elasticsearchRef:
    name: esdev
  kibanaRef:
    name: kibanadev
  config:
    filebeat.modules:
    - module: panw
      panos:
        enabled: true
        var.syslog_host: 0.0.0.0
  daemonSet:
    podTemplate:
      spec:
        serviceAccountName: filebeat
        automountServiceAccountToken: true
        securityContext:
          privileged: true

Error message

one or more objects failed to apply, reason: admission webhook "elastic-beat-validation-v1beta1.k8s.elastic.co" denied the request: Beat.beat.k8s.elastic.co "panos-filebeat" is invalid: privileged: Invalid value: "privileged": privileged field found in the kubectl.kubernetes.io/last-applied-configuration annotation is unknown. This is often due to incorrect indentation in the manifest.

@naemono
Copy link
Contributor Author

naemono commented Feb 21, 2023

@gittihub123 Running Agent and/or Beat in an openshift environment has many more complexities than running in a standard Kubernetes environment. We document these issues here. We also have some beats recipes that we use in our e2e tests that we run on a regular basis here. I just successfully deployed this beat recipe on an openshift 4.9 cluster, following our documentation noted above, specifically:

oc adm policy add-scc-to-user privileged -z filebeat -n elastic

Then applied this manifest, which worked after a bit of time (beat pods crash once or twice while users/api keys are being propagated throughout the Elastic stack)

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: filebeat
spec:
  type: filebeat
  version: 8.6.0
  elasticsearchRef:
    name: testing
  kibanaRef:
    name: kibana
  config:
    filebeat.autodiscover.providers:
    - node: ${NODE_NAME}
      type: kubernetes
      hints.default_config.enabled: "false"
      templates:
      - condition.equals.kubernetes.namespace: log-namespace
        config:
        - paths: ["/var/log/containers/*${data.kubernetes.container.id}.log"]
          type: container
      - condition.equals.kubernetes.labels.log-label: "true"
        config:
        - paths: ["/var/log/containers/*${data.kubernetes.container.id}.log"]
          type: container
    processors:
    - add_cloud_metadata: {}
    - add_host_metadata: {}
  daemonSet:
    podTemplate:
      spec:
        serviceAccountName: filebeat
        automountServiceAccountToken: true
        terminationGracePeriodSeconds: 30
        # dnsPolicy: ClusterFirstWithHostNet
        # hostNetwork: true # Allows to provide richer host metadata
        containers:
        - name: filebeat
          securityContext:
            runAsUser: 0
            # If using Red Hat OpenShift uncomment this:
            privileged: true
          volumeMounts:
          - name: varlogcontainers
            mountPath: /var/log/containers
          - name: varlogpods
            mountPath: /var/log/pods
          - name: varlibdockercontainers
            mountPath: /var/lib/docker/containers
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
        volumes:
        - name: varlogcontainers
          hostPath:
            path: /var/log/containers
        - name: varlogpods
          hostPath:
            path: /var/log/pods
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: filebeat
rules:
- apiGroups: [""] # "" indicates the core API group
  resources:
  - namespaces
  - pods
  - nodes
  verbs:
  - get
  - watch
  - list
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: filebeat
  namespace: elastic
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: filebeat
subjects:
- kind: ServiceAccount
  name: filebeat
  namespace: elastic
roleRef:
  kind: ClusterRole
  name: filebeat
  apiGroup: rbac.authorization.k8s.io
# ---
# My Elasticsearch cluster already existed....
# apiVersion: elasticsearch.k8s.elastic.co/v1
# kind: Elasticsearch
# metadata:
#   name: elasticsearch
# spec:
#   version: 8.6.1
#   nodeSets:
#   - name: default
#     count: 3
#     config:
#       node.store.allow_mmap: false
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: kibana
spec:
  version: 8.6.0
  count: 1
  elasticsearchRef:
    name: testing
# ...

Note the difference in the daemonset.podTemplate.spec and where the securityContext is applied:

  daemonSet:
    podTemplate:
      spec:
        serviceAccountName: filebeat
        automountServiceAccountToken: true
        terminationGracePeriodSeconds: 30
        # dnsPolicy: ClusterFirstWithHostNet
        # hostNetwork: true # Allows to provide richer host metadata
        containers:
        - name: filebeat
          securityContext:
            runAsUser: 0
            # If using Red Hat OpenShift uncomment this:
            privileged: true

@gittihub123
Copy link

Hi @naemono
Thank you for the explaination. The filebeat work now but our goal is to implement elastic agent and activate different types of modules to collect syslog from outside of the cluster, from palo alto, cisco FTD, Cisco ASA etc.

So far, the elastic-agent is running and is managed by fleet but it's only collecting logs from Openshift (logs/metrics). The elastic stack is running in the same namespaces and I have connection between all pods (Elasticsearch, kibana, fleet & elastic-agent).

This is my configuration

apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: {{ .Values.kibana.name }}
  namespace: {{ .Values.namespace }}
spec:
  http:
    tls:
      certificate:
        secretName: {{ .Values.tls.certificate }}
  config:
    server.publicBaseUrl: "https://XXX.YYY.ZZZ/"
    xpack.fleet.agents.elasticsearch.hosts: ["https://esdev-es-http.elastic-dev.svc:9200"]
    xpack.fleet.agents.fleet_server.hosts: ["https://fleet-server-dev-agent-http.elastic-dev.svc:8220"]
    xpack.fleet.packages:
      - name: system
        version: latest
      - name: elastic_agent
        version: latest
      - name: fleet_server
        version: latest
    xpack.fleet.agentPolicies:
      - name: Fleet Server test
        id: eck-fleet-server
        is_default_fleet_server: true
        namespace: agent
        monitoring_enabled:
          - logs
          - metrics
        package_policies:
        - name: fleet_server-1
          id: fleet_server-1
          package:
            name: fleet_server
      - name: Elastic Agent on ECK policy
        id: eck-agent
        namespace: agent
        monitoring_enabled:
          - logs
          - metrics
        unenroll_timeout: 900
        is_default: true
        package_policies:
          - name: system-1
            id: system-1
            package:
              name: system
          - name: CiscoFTD
            id: CiscoFTD
            package:
              name: Cisco FTD
          - name: palo-alto
            id: palo-alto
            package:
              name: panos
  version: {{ .Values.version }}
  count: {{ .Values.kibana.nodes }}
  elasticsearchRef:
    name: {{ .Values.name }}
  podTemplate:
    spec:
      containers:
      - name: kibana
        resources:
          limits:
            memory: {{ .Values.kibana.resources.limits.memory }}
            cpu: {{ .Values.kibana.resources.limits.cpu }}

I believe the network flow would be something like this right?

Syslog source (ciscoFTD, palo alto etc) -> Openshift route (for example ciscoftd.dev.test.com) -> elastic agent SVC (created by me to expose the elastic agents) -> elastic-agent pods.

This should be possible or should we try to do it another way?

Thanks.

@naemono
Copy link
Contributor Author

naemono commented Feb 24, 2023

Syslog source (ciscoFTD, palo alto etc) -> Openshift route (for example ciscoftd.dev.test.com) -> elastic agent SVC (created by me to expose the elastic agents) -> elastic-agent pods.

This solution makes sense to me using a custom tcp agent integration...

@ebuildy
Copy link

ebuildy commented May 2, 2023

the solution will not work if you use a keystore !

Because operator append an initContainer before the permissions container ....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants