8.7.1: K8s Stand-alone Kubernetes Deployment - Filestream input with ID '' Already Exists #2701

berglh · 2023-05-17T03:14:41Z

Version: 8.7.1
Operating System: Amazon EKS (v1.19 AMI)
Discuss Forum URL: https://discuss.elastic.co/t/8-7-1-stand-alone-kubernetes-deployment-filestream-input-with-id-already-exists/333148
Steps to Reproduce:

I've used the example K8s manifest to deploy Elastic Agent on our AWS EKS cluster. I'm running the matching version of kube-state-metrics (v2.3.0) for our AWS EKS instances. I followed the documentation in the EKS section to comment out modules that are unavailable in AWS EKS such as the control plane and audit logs.

Recently, I upgraded to 8.7.0 and noticed a huge amount of log volume increase. A large portion of this was coming from Elastic Agent logs itself. Some of this was resolved in 8.7.1 due to issues with the logging, and since then setting the logging level to warning has reduced my events per minute from about 100k to about 2k. Most of the remaining 2k Elastic Agent logs are the following:

filestream input with ID '' already exists, this will lead to data duplication, please use a different ID. I then added manual IDs to every data_stream in my configmap, and discovered it's related to:

      - type: filestream
        id: container-log-${kubernetes.pod.name}-${kubernetes.container.id}
        use_output: default
        meta:
          package:
            name: kubernetes
            version: 1.29.2
        data_stream:
          namespace: default
        streams:
          - data_stream:
              dataset: kubernetes.container_logs
              type: logs
            prospector.scanner.symlinks: true
            parsers:
              - container: ~
              # - ndjson:
              #     target: json
              # - multiline:
              #     type: pattern
              #     pattern: '^\['
              #     negate: true
              #     match: after
            paths:
              - /var/log/containers/*${kubernetes.container.id}.log
            processors:
              - add_fields:
                  target: orchestrator.cluster
                  fields:
                    name: "our-cluster-name"
                    url: "https://our-cluser.uri.aws.com"

This seems related to the PR #742. However, being a much newer version, seems like this should be fixed. The standalone config map does have the id added from the PR - so I'm not sure why we're seeing this. In combination with this, I have also been receiving the warning:

DEPRECATED: Log input. Use Filestream input instead.

Is there a working filestream configuration that could be used for a Standalone Deployment that doesn't result large volumes of the warnings described above? Just trying to clear this up because space is always at a premium on Elasticsearch.

The text was updated successfully, but these errors were encountered:

cmacknz · 2023-05-18T18:44:21Z

Hmm, filestream input with ID '' already exists, this will lead to data duplication, please use a different ID means there must be a filestream input in your policy without the id: key set, since the ID here is the empty string ''.

Is there another instance of filestream in the agent policy? It may also be the case that the kubernetes variable substitution is failing but I don't recall ever seeing that happen.

If you can upload the archive output by elastic-agent diagnostics it will contain the information needed to debug this further.

berglh · 2023-05-22T00:49:50Z

Hi @cmacknz,

Please find attached the diagnostics bundle that I've removed our API keys and Elasticsearch domains from. I had to revert to the variable based filestream id as shown in the standalone config, as I had changed this for troubleshooting. I also had Pod OOMs running the diagnostics tool. I pushed the limit to 2 GB and it was all working fine.

I might also add, I've been having OOMs and had to push it from 700 to 1024 MB memory limits to get it to be stable in regular production usage. Anyway, hope these logs assist. It does appear that the variables are being resolved in the computed config, but I'm still receiving the errors.

Thanks

Edit: Masking more environment stuff
elastic-agent-diagnostics-2023-05-22T00-34-47Z-00.zip

cmacknz · 2023-05-23T17:14:44Z

Thanks we're aware the memory usage when up in 8.6.x as part of an incremental architecture change, with another change coming that should drop it back down.

You should be able to fix this by duplicating the id into the streams section, each stream requires a unique ID not just the input. For example:

- data_stream:
    namespace: default
  id: container-log-${kubernetes.pod.name}-${kubernetes.container.id}
  meta:
    package:
      name: kubernetes
      version: 1.29.2
  streams:
  - data_stream:
      dataset: kubernetes.container_logs
      type: logs
    id: container-log-${kubernetes.pod.name}-${kubernetes.container.id} # <-- Added unique stream ID
    parsers:
    - container: null
    paths:
    - /var/log/containers/*${kubernetes.container.id}.log
    processors:
    - add_fields:
        fields:
          name: oureksclustername
          url: https://ourekscluster.yl4.ap-southeast-2.eks.amazonaws.com
        target: orchestrator.cluster
    prospector.scanner.symlinks: true
  type: filestream
  use_output: default

For a full explanation see #2573 (comment). Once we confirm this is the fix we'll leave this issue open to adjust the reference template and document this explanation.

berglh · 2023-05-25T23:03:46Z

HI @cmacknz,

I swear I had tried this before; I may have moved the id key to the data_streams section instead of duplicating, as I remember having a syntax error. Eitherway, the above appears to have resolved the issue for me.

I'll leave the issue open as requested.

Many thanks,
Berg

cmacknz · 2023-05-29T15:35:28Z

Closing, looks like the id is set to be unique in the latest example YAML:

elastic-agent/deploy/kubernetes/elastic-agent-standalone-kubernetes.yaml

Lines 333 to 335 in 953fda0

    
                 - id: container-log-${kubernetes.pod.name}-${kubernetes.container.id} 
        
                   type: filestream 
        
                   use_output: default

elastic-agent/deploy/kubernetes/elastic-agent-standalone-kubernetes.yaml

Lines 358 to 360 in 953fda0

    
                 - name: filestream-generic 
        
                   id: hints-container-logs-${kubernetes.hints.container_id} 
        
                   type: filestream

elastic-agent/deploy/kubernetes/elastic-agent-standalone-kubernetes.yaml

Lines 380 to 381 in 953fda0

- id: audit-log

type: filestream

berglh · 2023-05-30T02:45:20Z

Hi @cmacknz,

I'll just add that the id is not set in stream array like you asked me to add in the standalone config, only the top-level data_stream section, which was the cause of my problem.

Cheers,
Berg

cmacknz · 2023-05-30T14:23:21Z

Ah you are correct, I misread it. I'll fix this, thanks.

berglh added the bug Something isn't working label May 17, 2023

cmacknz closed this as completed May 29, 2023

cmacknz reopened this May 30, 2023

cmacknz self-assigned this May 30, 2023

cmacknz added the Team:Elastic-Agent Label for the Agent team label May 30, 2023

cmacknz mentioned this issue Jun 6, 2023

Add filestream stream IDs to k8s manifests. #2788

Merged

cmacknz closed this as completed in #2788 Jun 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8.7.1: K8s Stand-alone Kubernetes Deployment - Filestream input with ID '' Already Exists #2701

8.7.1: K8s Stand-alone Kubernetes Deployment - Filestream input with ID '' Already Exists #2701

berglh commented May 17, 2023 •

edited

Loading

cmacknz commented May 18, 2023

berglh commented May 22, 2023 •

edited

Loading

cmacknz commented May 23, 2023

berglh commented May 25, 2023

cmacknz commented May 29, 2023

berglh commented May 30, 2023

cmacknz commented May 30, 2023

8.7.1: K8s Stand-alone Kubernetes Deployment - Filestream input with ID '' Already Exists #2701

8.7.1: K8s Stand-alone Kubernetes Deployment - Filestream input with ID '' Already Exists #2701

Comments

berglh commented May 17, 2023 • edited Loading

cmacknz commented May 18, 2023

berglh commented May 22, 2023 • edited Loading

cmacknz commented May 23, 2023

berglh commented May 25, 2023

cmacknz commented May 29, 2023

berglh commented May 30, 2023

cmacknz commented May 30, 2023

berglh commented May 17, 2023 •

edited

Loading

berglh commented May 22, 2023 •

edited

Loading