Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic Agent installed packages permission errors #6266

Closed
gertjanvg opened this issue Jan 2, 2023 · 3 comments
Closed

Elastic Agent installed packages permission errors #6266

gertjanvg opened this issue Jan 2, 2023 · 3 comments
Assignees

Comments

@gertjanvg
Copy link

Bug Report

We have recently migrated from different services to the ECK-managed Elastic Agent for our monitoring. Everything works as it should, however we noticed that our Kubernetes state metrics stopped working after ~30 minutes.
After some digging I found the following error logs from the Agent:

[elastic_agent.metricbeat][error] Failed to list light metricsets for module kubernetes: getting metricsets for module 'kubernetes': loading light module 'kubernetes' definition: loading module configuration from '/usr/share/elastic-agent/data/elastic-agent-0e1a73/install/metricbeat-8.5.3-linux-x86_64/module/kubernetes/module.yml': config file ("/usr/share/elastic-agent/data/elastic-agent-0e1a73/install/metricbeat-8.5.3-linux-x86_64/module/kubernetes/module.yml") must be owned by the user identifier (uid=0) or root

along with nearly identical messages for a bunch more modules not being loaded properly, like rabbitmq, windows, golang, haproxy etc.

It appears that the error coincides with the time at which state-metrics stop working.

Next thing I did is shell into the pod to see why, and it turns out that the installed packages in the /usr/share/elastic-agent/data/elastic-agent-[hash]/install folder are owned by elastic-agent, rather than root:

root@elastic-agent-agent-5h22z:/usr/share/elastic-agent/data/elastic-agent-0e1a73/install# ll
total 77
drwxr-xr-x 10 elastic-agent elastic-agent 10 Dec  6 00:08 ./
drwxrwx---  5 root          root           6 Dec  6 00:08 ../
drwxr-xr-x  2 elastic-agent elastic-agent  8 Dec  5 04:55 apm-server-8.5.3-linux-x86_64/
drwxr-xr-x  2 elastic-agent elastic-agent  9 Dec  6 00:08 cloudbeat-8.5.3-linux-x86_64/
drwxr-xr-x  2 elastic-agent elastic-agent  6 Dec  6 00:08 endpoint-security-8.5.3-linux-x86_64/
drwxr-xr-x  6 elastic-agent elastic-agent 14 Jan  2 12:59 filebeat-8.5.3-linux-x86_64/
drwxr-xr-x  2 elastic-agent elastic-agent  3 Dec  5 05:52 fleet-server-8.5.3-linux-x86_64/
drwxr-xr-x  4 elastic-agent elastic-agent 12 Dec  6 00:08 heartbeat-8.5.3-linux-x86_64/
drwxr-xr-x  6 elastic-agent elastic-agent 14 Jan  2 12:59 metricbeat-8.5.3-linux-x86_64/
drwxr-xr-x  3 elastic-agent elastic-agent 13 Dec  6 00:08 osquerybeat-8.5.3-linux-x86_64/

This is set by default in the official containers to make the compatible with non-root users, however this conflicts with the safety requirements of metricbeat.

Could this be related to the fact that Elastic Agent recently changed to require running it with root? #6147

Even though we do not currently use the hostPath it still will not run with a non-root user due to some other permission issue.

Environment

  • Elasticsearch version: 8.5.3

  • Elastic Agent version: 8.5.3

  • ECK version: 2.5.0

  • Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.7", GitCommit:"b56e432f2191419647a6a13b9f5867801850f969", GitTreeState:"clean", BuildDate:"2022-02-16T11:50:27Z", GoVersion:"go1.16.14", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.7", GitCommit:"b56e432f2191419647a6a13b9f5867801850f969", GitTreeState:"clean", BuildDate:"2022-02-17T16:45:54Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}

Agent configuration

apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata: 
  name: elastic-agent
  namespace: es-logging
spec:
  version: 8.5.3
  kibanaRef:
    name: app
    namespace: es-logging
  fleetServerRef: 
    name: fleet-server
    namespace: es-logging
  mode: fleet
  daemonSet:
    podTemplate:
      spec:
        serviceAccountName: elastic-agent
        automountServiceAccountToken: true
        securityContext:
          runAsUser: 0
        volumes:
        - name: agent-data
          emptyDir: {}
@botelastic botelastic bot added the triage label Jan 2, 2023
@naemono naemono self-assigned this Mar 2, 2023
@naemono
Copy link
Contributor

naemono commented Mar 2, 2023

@gertjanvg I'm working to try and replicate your issue. I'm taking ownership as I'm wondering if it's related to #6239 , and the other issues referenced in that issue. I'll update when I can with more information.

@naemono
Copy link
Contributor

naemono commented Mar 3, 2023

@gertjanvg I've been unable to replicate this issue with both ECK version 2.5.0, and 2.6.0 using stack version 8.5.3 with a very similar manifest as you have

https://github.com/elastic/cloud-on-k8s/blob/main/config/recipes/elastic-agent/fleet-kubernetes-integration.yaml

with one distinct change:

kind: Agent
        volumes:
        - name: agent-data
          emptyDir: {}

In both cases, I have had Elastic agents in the daemonset pulling kubernetes metrics for about 24 hours with no errors.

Could. you possibly provide more details into your fleet/kibana/elasticsearch configuration, and potentially attempt with a newer version of ECK 2.6.1, and Elastic stack version 8.6.2? Thanks.

@naemono
Copy link
Contributor

naemono commented Mar 20, 2023

Closing this as I've been unable to reproduce this with the newest version of ECK, and further information has not been provided. If new information comes to light, please feel free to re-open this issue. Thanks.

@naemono naemono closed this as completed Mar 20, 2023
@botelastic botelastic bot removed the triage label Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants