Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filestream metrics might not work correctly in Kubernetes #37925

Open
belimawr opened this issue Feb 8, 2024 · 2 comments
Open

Filestream metrics might not work correctly in Kubernetes #37925

belimawr opened this issue Feb 8, 2024 · 2 comments
Labels
bug Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@belimawr
Copy link
Contributor

belimawr commented Feb 8, 2024

There are some situations the Filestream input will have its metrics collection disabled in Kubernetes, that happens when:

  • A filestream input with ID xyz is created
  • The filestream xyz works for a while
  • The filestream xyz is stopped
  • A new filestream using the same xyz ID is created

This is somehow common when running Filebeat with autodiscover on Kubernetes, the autodiscover code might stop and start the same input, for the same files. This has no effect in the data collection, however due to a bug (#31767) on how we keep track of IDs and start/stop filestream inputs we never clean the ID registry leading to two issues:

  1. The log message

    filestream input with ID 'xyz' already exists, this will lead to data duplication, please use a different ID. Metrics collection has been disabled on this input.
    is logged even though there is only a single input with ID xyz running

  2. Metrics collection is disabled (side effect from setting metricsID = "").

While the erroneous log message is annoying, disabling metrics collection is a bigger issue that requires attention.

How to reproduce

  1. Create a Kubernetes cluster
  2. Deploy Filebeat using the filebeat-deplyment.yml below
  3. Check the logs for a log message like
    {"log.level":"error","@timestamp":"2024-02-08T12:09:37.932Z","log.logger":"input","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/filestream/internal/input-logfile.(*InputManager).Create","file.name":"input-logfile/manager.go","file.line":183},"message":"filestream input with ID 'filestream-kubernetes-pod-95a3dadcb6d36ee0542391c016d3e4a3e638b110600078600b55961de5682908' already exists, this will lead to data duplication, please use a different ID. Metrics collection has been disabled on this input.","service.name":"filebeat","ecs.version":"1.6.0"}                                                                                                                                        
    
filebeat-deployment.yml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: filebeat
  namespace: kube-system
  labels:
    k8s-app: filebeat
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: filebeat
  labels:
    k8s-app: filebeat
rules:
- apiGroups: [""] # "" indicates the core API group
  resources:
  - namespaces
  - pods
  - nodes
  verbs:
  - get
  - watch
  - list
- apiGroups: ["apps"]
  resources:
    - replicasets
  verbs: ["get", "list", "watch"]
- apiGroups: ["batch"]
  resources:
    - jobs
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: filebeat
  # should be the namespace where filebeat is running
  namespace: kube-system
  labels:
    k8s-app: filebeat
rules:
  - apiGroups:
      - coordination.k8s.io
    resources:
      - leases
    verbs: ["get", "create", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: filebeat-kubeadm-config
  namespace: kube-system
  labels:
    k8s-app: filebeat
rules:
  - apiGroups: [""]
    resources:
      - configmaps
    resourceNames:
      - kubeadm-config
    verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: filebeat
subjects:
- kind: ServiceAccount
  name: filebeat
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: filebeat
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: filebeat
  namespace: kube-system
subjects:
  - kind: ServiceAccount
    name: filebeat
    namespace: kube-system
roleRef:
  kind: Role
  name: filebeat
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: filebeat-kubeadm-config
  namespace: kube-system
subjects:
  - kind: ServiceAccount
    name: filebeat
    namespace: kube-system
roleRef:
  kind: Role
  name: filebeat-kubeadm-config
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: filebeat-config
  namespace: kube-system
  labels:
    k8s-app: filebeat
data:
  filebeat.yml: |-
    filebeat.autodiscover:
      providers:
        - type: kubernetes
          node: ${NODE_NAME}
          hints.enabled: true
          hints.default_config:
            type: filestream
            prospector.scanner.symlinks: true
            id: filestream-kubernetes-pod-${data.kubernetes.container.id}
            take_over: true
            paths:
              - /var/log/containers/*${data.kubernetes.container.id}.log
            parsers:
            - container: ~ 
    processors:
      - add_host_metadata:

    http:
      enabled: true

    output.elasticsearch:
      hosts: ["https://my-cluster.elastic-cloud.com:443"] # ad real credentials
      port: 443
      protocol: "https"
      username: "elastic"
      password: "changeme"
      allow_older_versions: true
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: filebeat
  namespace: kube-system
  labels:
    k8s-app: filebeat
spec:
  selector:
    matchLabels:
      k8s-app: filebeat
  template:
    metadata:
      labels:
        k8s-app: filebeat
    spec:
      serviceAccountName: filebeat
      terminationGracePeriodSeconds: 30
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: filebeat
        image: docker.elastic.co/beats/filebeat:8.12.1
        args: [
          "-c", "/etc/filebeat.yml",
          "-e",
        ]
        env:
        - name: ELASTICSEARCH_HOST
          value: elasticsearch
        - name: ELASTICSEARCH_PORT
          value: "9200"
        - name: ELASTICSEARCH_USERNAME
          value: elastic
        - name: ELASTICSEARCH_PASSWORD
          value: changeme
        - name: ELASTIC_CLOUD_ID
          value:
        - name: ELASTIC_CLOUD_AUTH
          value:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        securityContext:
          runAsUser: 0
          # If using Red Hat OpenShift uncomment this:
          #privileged: true
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi
        volumeMounts:
        - name: config
          mountPath: /etc/filebeat.yml
          readOnly: true
          subPath: filebeat.yml
        - name: data
          mountPath: /usr/share/filebeat/data
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: varlog
          mountPath: /var/log
          readOnly: true
      volumes:
      - name: config
        configMap:
          defaultMode: 0640
          name: filebeat-config
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: varlog
        hostPath:
          path: /var/log
      # data folder stores a registry of read status for all files, so we don't send everything again on a Filebeat pod restart
      - name: data
        hostPath:
          # When filebeat runs as non-root user, this directory needs to be writable by group (g+w).
          path: /var/lib/filebeat-data
          type: DirectoryOrCreate
---

I haven't looked enough into that to fully assess how the metrics are working (or not), but when I SSH into the container and run curl localhost:5066/inputs/ | jq I get a valid output with metrics that are changing over time.

  {
    "bytes_processed_total": 125774,
    "events_processed_total": 716,
    "files_active": 1,
    "files_closed_total": 0,
    "files_opened_total": 1,
    "id": "filestream-kubernetes-pod-95a3dadcb6d36ee0542391c016d3e4a3e638b110600078600b55961de5682908",
    "input": "filestream",
    "messages_read_total": 716,
    "processing_errors_total": 0,
    "processing_time": {
      "histogram": {
        "count": 716,
        "max": 2003408353,
        "mean": 1174213257.6648045,
        "median": 1001358532.5,
        "min": 1051078,
        "p75": 1733609597,
        "p95": 1916199464.6,
        "p99": 1982980687.14,
        "p999": 2003408353,
        "stddev": 556202480.3060538
      }
    }
  }

However I haven't managed to access how correct they are. I'll update this issue once I have more information.

@belimawr belimawr added bug Team:Elastic-Agent Label for the Agent team labels Feb 8, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Jun 3, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

No branches or pull requests

3 participants