filebeat Duplicated logs at beginning with kubernetes autodiscovery #24208

marqc · 2021-02-24T13:36:20Z

Sometimes, when filebeat discovers new pod/container, it opens log file multiple times and collected logs are duplicated.

Version:

7.11.0 and 7.11.1

Operating System:

filebeat is deployed as daemonset in kube cluster
host machines operate on centos 7.9.2009
cpu: AMD EPYC 7402P 24-Core Processor (48 logical cores) or Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz (32 logical cores)

Configuration:

  filebeat.yml: |
    http:
      enabled: true
      host: 0.0.0.0
      port: 5066
    filebeat:
      registry.flush: 60s
      shutdown_timeout: 10s
    max_procs: ${MAX_PROCS:2}
    queue.mem:
      events: 4096
      flush.min_events: 2048
      flush.timeout: 5s
    logging:
      level: info
      metrics.enabled: false
    path:
      home: /usr/share/filebeat
      config: /etc/filebeat
      data: /var/lib/filebeat
      logs: /var/log/filebeat
      bin: /usr/share/filebeat/bin
    filebeat.autodiscover:
      providers:
        - type: kubernetes
          cleanup_timeout: 5m
          host: ${WORKER_NODE}
          hints.enabled: true
          hints.default_config:
            type: container
            paths:
              - "/var/log/containers/*-${data.kubernetes.container.id}.log"
            scan_frequency: 3s
            max_bytes: 1000000
            clean_removed: true

Filebeat logs:

2021-02-24T14:09:27.455+0100    INFO    log/input.go:157        Configured paths: [/var/log/containers/*-b9270d9d5977e33e3187843649aa76b319487c64a2273114d6e2034e76883660.log]
2021-02-24T14:09:27.499+0100    INFO    log/input.go:157        Configured paths: [/var/log/containers/*-b9270d9d5977e33e3187843649aa76b319487c64a2273114d6e2034e76883660.log]
2021-02-24T14:09:27.500+0100    INFO    log/harvester.go:302    Harvester started for file: /var/log/containers/replaced-713-unversioned-6bdd947456-6x2l2_default_messenger-b9270d9d5977e33e3187843649aa76b319487c64a2273114d6e2034e76883660.log
2021-02-24T14:09:27.504+0100    INFO    input/input.go:136      input ticker stopped
2021-02-24T14:09:27.504+0100    INFO    log/harvester.go:329    Reader was closed: /var/log/containers/replaced-713-unversioned-6bdd947456-6x2l2_default_messenger-b9270d9d5977e33e3187843649aa76b319487c64a2273114d6e2034e76883660.log. Closing.
2021-02-24T14:09:27.542+0100    INFO    log/input.go:157        Configured paths: [/var/log/containers/*-b9270d9d5977e33e3187843649aa76b319487c64a2273114d6e2034e76883660.log]
2021-02-24T14:09:27.565+0100    INFO    log/input.go:157        Configured paths: [/var/log/containers/*-b9270d9d5977e33e3187843649aa76b319487c64a2273114d6e2034e76883660.log]
2021-02-24T14:09:27.566+0100    INFO    log/harvester.go:302    Harvester started for file: /var/log/containers/replaced-713-unversioned-6bdd947456-6x2l2_default_messenger-b9270d9d5977e33e3187843649aa76b319487c64a2273114d6e2034e76883660.log
2021-02-24T14:09:27.568+0100    INFO    input/input.go:136      input ticker stopped
2021-02-24T14:09:27.569+0100    INFO    log/harvester.go:329    Reader was closed: /var/log/containers/replaced-713-unversioned-6bdd947456-6x2l2_default_messenger-b9270d9d5977e33e3187843649aa76b319487c64a2273114d6e2034e76883660.log. Closing.
2021-02-24T14:09:27.609+0100    INFO    log/input.go:157        Configured paths: [/var/log/containers/*-b9270d9d5977e33e3187843649aa76b319487c64a2273114d6e2034e76883660.log]
2021-02-24T14:09:27.631+0100    INFO    log/input.go:157        Configured paths: [/var/log/containers/*-b9270d9d5977e33e3187843649aa76b319487c64a2273114d6e2034e76883660.log]
2021-02-24T14:09:27.632+0100    INFO    log/harvester.go:302    Harvester started for file: /var/log/containers/replaced-713-unversioned-6bdd947456-6x2l2_default_messenger-b9270d9d5977e33e3187843649aa76b319487c64a2273114d6e2034e76883660.log
2021-02-24T14:09:27.633+0100    INFO    input/input.go:136      input ticker stopped
2021-02-24T14:09:27.633+0100    INFO    log/harvester.go:329    Reader was closed: /var/log/containers/replaced-713-unversioned-6bdd947456-6x2l2_default_messenger-b9270d9d5977e33e3187843649aa76b319487c64a2273114d6e2034e76883660.log. Closing.
2021-02-24T14:09:27.651+0100    INFO    log/input.go:157        Configured paths: [/var/log/containers/*-b9270d9d5977e33e3187843649aa76b319487c64a2273114d6e2034e76883660.log]
2021-02-24T14:09:27.670+0100    INFO    log/input.go:157        Configured paths: [/var/log/containers/*-b9270d9d5977e33e3187843649aa76b319487c64a2273114d6e2034e76883660.log]
2021-02-24T14:09:27.671+0100    INFO    log/harvester.go:302    Harvester started for file: /var/log/containers/replaced-713-unversioned-6bdd947456-6x2l2_default_messenger-b9270d9d5977e33e3187843649aa76b319487c64a2273114d6e2034e76883660.log
2021-02-24T14:09:27.672+0100    INFO    input/input.go:136      input ticker stopped
2021-02-24T14:09:27.672+0100    INFO    log/harvester.go:329    Reader was closed: /var/log/containers/replaced-713-unversioned-6bdd947456-6x2l2_default_messenger-b9270d9d5977e33e3187843649aa76b319487c64a2273114d6e2034e76883660.log. Closing.
2021-02-24T14:09:27.691+0100    INFO    log/input.go:157        Configured paths: [/var/log/containers/*-b9270d9d5977e33e3187843649aa76b319487c64a2273114d6e2034e76883660.log]
2021-02-24T14:09:27.711+0100    INFO    log/input.go:157        Configured paths: [/var/log/containers/*-b9270d9d5977e33e3187843649aa76b319487c64a2273114d6e2034e76883660.log]
2021-02-24T14:09:27.712+0100    INFO    log/harvester.go:302    Harvester started for file: /var/log/containers/replaced-713-unversioned-6bdd947456-6x2l2_default_messenger-b9270d9d5977e33e3187843649aa76b319487c64a2273114d6e2034e76883660.log

Debug logs attached at file, search for container cd1336aa6

filtered-debug.log.gz

2021-02-24T14:43:35.460+0100    DEBUG   [harvester]     log/harvester.go:591    Setting offset for file based on seek: /var/log/containers/test-staging-695-unversioned-6b788b889f-vh6zk_default_nginx-cd1336aa6ecb757539a67d6efaca59c86369a4aeb8667a4c06305c7e52d61d02.log
2021-02-24T14:43:35.460+0100    DEBUG   [harvester]     log/harvester.go:577    Setting offset for file: /var/log/containers/test-staging-695-unversioned-6b788b889f-vh6zk_default_nginx-cd1336aa6ecb757539a67d6efaca59c86369a4aeb8667a4c06305c7e52d61d02.log. Offset: 0 
2021-02-24T14:44:00.955+0100    DEBUG   [harvester]     log/harvester.go:577    Setting offset for file: /var/log/containers/test-staging-695-unversioned-6b788b889f-vh6zk_default_nginx-cd1336aa6ecb757539a67d6efaca59c86369a4aeb8667a4c06305c7e52d61d02.log. Offset: 870 
2021-02-24T14:44:02.318+0100    DEBUG   [harvester]     log/harvester.go:577    Setting offset for file: /var/log/containers/test-staging-695-unversioned-6b788b889f-vh6zk_default_nginx-cd1336aa6ecb757539a67d6efaca59c86369a4aeb8667a4c06305c7e52d61d02.log. Offset: 870 
2021-02-24T14:44:04.520+0100    DEBUG   [harvester]     log/harvester.go:577    Setting offset for file: /var/log/containers/test-staging-695-unversioned-6b788b889f-vh6zk_default_nginx-cd1336aa6ecb757539a67d6efaca59c86369a4aeb8667a4c06305c7e52d61d02.log. Offset: 870 
2021-02-24T14:44:06.700+0100    DEBUG   [harvester]     log/harvester.go:577    Setting offset for file: /var/log/containers/test-staging-695-unversioned-6b788b889f-vh6zk_default_nginx-cd1336aa6ecb757539a67d6efaca59c86369a4aeb8667a4c06305c7e52d61d02.log. Offset: 870 
2021-02-24T14:44:07.914+0100    DEBUG   [harvester]     log/harvester.go:577    Setting offset for file: /var/log/containers/test-staging-695-unversioned-6b788b889f-vh6zk_default_nginx-cd1336aa6ecb757539a67d6efaca59c86369a4aeb8667a4c06305c7e52d61d02.log. Offset: 870 
2021-02-24T14:44:09.135+0100    DEBUG   [harvester]     log/harvester.go:577    Setting offset for file: /var/log/containers/test-staging-695-unversioned-6b788b889f-vh6zk_default_nginx-cd1336aa6ecb757539a67d6efaca59c86369a4aeb8667a4c06305c7e52d61d02.log. Offset: 870 
2021-02-24T14:44:10.402+0100    DEBUG   [harvester]     log/harvester.go:577    Setting offset for file: /var/log/containers/test-staging-695-unversioned-6b788b889f-vh6zk_default_nginx-cd1336aa6ecb757539a67d6efaca59c86369a4aeb8667a4c06305c7e52d61d02.log. Offset: 870

There are also error messages:

2021-02-24T14:44:08.779+0100    DEBUG   [autodiscover]  cfgfile/list.go:97      Error creating runner from config: Can only start an input when all related states are finished

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-02-24T16:12:44Z

Pinging @elastic/integrations (Team:Integrations)

marqc · 2022-01-14T09:11:37Z

issue is still present in filebeat 7.16

stephan-erb-by · 2022-03-31T21:17:26Z

@marqc do you have a workaround for the behaviour? Or do you just live with the duplicates? :-/

belimawr · 2022-04-12T08:41:47Z

I have been trying to replicate this issue, but I didn't manage to.

@marqc, @stephan-erb-by are you still facing this issue?

Have you tried the latest versions 7.17.2 and 8.1.2?

stephan-erb-by · 2022-04-12T21:00:15Z

hey @belimawr! I have played around with this a bit and could still see this (or a related) issue in 7.17.2.

I don't have a minimal reproducing example, but could observe the problem with a config such as this one:

filebeat.autodiscover:
  providers:
    - type: kubernetes
      node: ${NODE_NAME}
      hints.enabled: true
      hints.default_config:
        type: filestream
        # We have one filestream per container and can thus use the container id as a unique filestream id
        id: ${data.kubernetes.container.id}    
        paths:
          - "/var/log/containers/*-${data.kubernetes.container.id}.log"
      include_labels:
        - "org.example/foobar"
      add_resource_metadata:
        namespace:
          enabled: false
        node:
          enabled: false

We are launching PODs with multiple containers (istio-init, istio-proxy, mymainservice, mysidecar). This then caused log messages such as this:

    2022-04-01T11:12:15.813Z	ERROR	[input]	input-logfile/manager.go:181	filestream input with ID '9a61bf6086e3cfca3619a5c89464f5c77b61d4461bb81ca3d1070e544ad62be8' already exists, this will lead to data duplication, please use a different ID
    2022-04-01T11:12:15.813Z	INFO	[input.filestream]	compat/compat.go:111	Input filestream starting	{"id": "9a61bf6086e3cfca3619a5c89464f5c77b61d4461bb81ca3d1070e544ad62be8"}

I would assume this should not really happen, as we'd assume each container is only discovered once and gets only one filestream launched.

The issue reminds me a bit of #29015 that was also causing pain for us back in the day. Maybe there is a problem in autodiscovery itself. Could it be, that the auto-discovery discovers a single container multiple times?

belimawr · 2022-04-13T09:25:01Z

@stephan-erb-by thanks a lot for the quick reply! I'm not sure if it's related to #29015, but it could be. It seems to have been solved, so it's hard to tell without reproducing it.

At least this confirms that this issue is related to the duplicated IDs in fliestream, the question now is why there are duplicated IDs if the ID is the container ID (which is unique).

Could you send me the hints you're setting on those pod/containers? If you could share the whole manifest that would be even better. If there are any sensitive information redact them. I'm mostly interested on how the annotations/hints are being set and how the multiple containers are configured on you pods.

mgfnv9 · 2023-02-07T18:27:32Z

hey @belimawr ! I meet same problem. What can I do send for you, for help me with this problem?

belimawr · 2023-02-09T22:07:14Z

hey @belimawr ! I meet same problem. What can I do send for you, for help me with this problem?

The whole Kubernetes manifest (don't forget to redact any sensitive information) and debug logs for the following selectors: input, input.filestream, harvester, autodiscover.

If you can consistently reproduce it, a simple step-by-step on how to reproduce it would be amazing. Preferably using Kind or Minikube.

mgfnv9 · 2023-02-10T07:09:44Z

Hey @belimawr

This is my Kubernetes manifest https://pastebin.com/raw/ZqD2BNJ8

Logs https://pastebin.com/raw/NXzFYxb8

Thanks for you helping.

belimawr · 2023-02-14T11:38:40Z

@mgfnv9 Thanks for the files! I cannot access your logs, I keep getting 403:

Error, this is a private paste or is pending moderation. If this paste belongs to you, please login to Pastebin to view it.

Did you notice any data duplication or just the log error?

The good news is that I managed to reproduce the error log bug! I reproduced it both using Kind and using a VM running Minikube with the none driver (helpful for debugging Filebeat). However I did not experience any data duplication.

I had to modify a little bit the config to get it working with my Kubernetes setup, here is my config:

filebeat.autodiscover:
  providers:
    - type: kubernetes
      node: ${NODE_NAME}
      kube_config: "/root/.kube/config"
      hints.enabled: true
      hints.default_config:
        type: filestream
        id: ${data.kubernetes.container.id}
        prospector.scanner.symlinks: true
        paths:
          - /var/log/containers/*${data.kubernetes.container.id}.log
processors:
  - add_cloud_metadata:
  - add_host_metadata:
  - add_tags:
      tags: ["something"]

cloud.id: "some cloud ID"
cloud.auth: "elastic:very secret password"

logging.level: debug
logging.selectors: [ input, input.filestream, harvester, autodiscover ]

mgfnv9 · 2023-02-14T12:04:13Z

@belimawr Sorry, pastebin don't show this logs.
I put log into the https://wtools.io/paste-code/bJDJ

mgfnv9 · 2023-02-20T12:03:20Z

@belimawr My I ask you, what's the difference between type: filestream or type: container, if I choose
filebeat.autodiscover:
providers:
- type: kubernetes

And what I would prefer?

belimawr · 2023-02-20T18:11:43Z

They're different inputs from Filebeat.
Filestream reads "raw" log files and does not do any transformation unless you define processors/parsers

The container input is meant to read log files from containers, it does some transformations/parsing of the data to extract what is the actual log entry from the entries logged by the container runtime.

I'm hoping to have time to look into this this week.

xBlast59 · 2023-05-17T09:17:45Z

Hello everyone,
Is there a fix for this issue or does anyone have a solution to solve this issue ?

marcoval123 · 2023-06-16T11:43:47Z

Hello everyone,

Same problem filebeat 7.17.3 in k8s cluster 1.23.8 using filebeat autodiscover with input type docker.

See the following logs (same time -> same directory)

2023-06-16T11:19:16.577Z	INFO	[input]	log/input.go:171	Configured paths: [/var/lib/docker/containers/2ac82fdbf6fb151562c65810425ae6ac68e1b63f3a2bfebe029a81b333d826ff/*.log]	{"input_id": "e3001f5b-b0e6-4760-8484-ff6f7967c844"}
2023-06-16T11:19:16.581Z	INFO	[input]	log/input.go:171	Configured paths: [/var/lib/docker/containers/2ac82fdbf6fb151562c65810425ae6ac68e1b63f3a2bfebe029a81b333d826ff/*.log]	{"input_id": "8e080260-e047-4ad2-a0ac-70b2be0b03f2"}

We see that input.type is diferent. The first one is docker and the second one is container. In filebeat configuration we are not using container input type.

Any solution?

belimawr · 2023-06-19T09:23:37Z

Aside from getting two log entries for the same container, are you consistently getting data duplication?

Without more logs it's harder to tell exactly what is happening. We do have a code path that instantiates an input with the objective of validating its configuration, but this input is never started. That seems to be the case there (if you're not experiencing any other issues).

botelastic · 2024-06-18T10:04:29Z

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 24, 2021

andresrc added the Team:Integrations Label for the Integrations team label Feb 24, 2021

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Feb 24, 2021

ph changed the title ~~filebat Duplicated logs at beginning with kubernetes autodiscovery~~ filebeat Duplicated logs at beginning with kubernetes autodiscovery Apr 1, 2022

belimawr mentioned this issue Apr 11, 2022

[Filebeat] Duplicated data when using filestream input #31239

Closed

botelastic bot added the Stalled label Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

filebeat Duplicated logs at beginning with kubernetes autodiscovery #24208

filebeat Duplicated logs at beginning with kubernetes autodiscovery #24208

marqc commented Feb 24, 2021 •

edited

Loading

elasticmachine commented Feb 24, 2021

marqc commented Jan 14, 2022

stephan-erb-by commented Mar 31, 2022

belimawr commented Apr 12, 2022

stephan-erb-by commented Apr 12, 2022 •

edited

Loading

belimawr commented Apr 13, 2022

mgfnv9 commented Feb 7, 2023

belimawr commented Feb 9, 2023

mgfnv9 commented Feb 10, 2023

belimawr commented Feb 14, 2023

mgfnv9 commented Feb 14, 2023

mgfnv9 commented Feb 20, 2023

belimawr commented Feb 20, 2023 •

edited

Loading

xBlast59 commented May 17, 2023

marcoval123 commented Jun 16, 2023

belimawr commented Jun 19, 2023

botelastic bot commented Jun 18, 2024

filebeat Duplicated logs at beginning with kubernetes autodiscovery #24208

filebeat Duplicated logs at beginning with kubernetes autodiscovery #24208

Comments

marqc commented Feb 24, 2021 • edited Loading

elasticmachine commented Feb 24, 2021

marqc commented Jan 14, 2022

stephan-erb-by commented Mar 31, 2022

belimawr commented Apr 12, 2022

stephan-erb-by commented Apr 12, 2022 • edited Loading

belimawr commented Apr 13, 2022

mgfnv9 commented Feb 7, 2023

belimawr commented Feb 9, 2023

mgfnv9 commented Feb 10, 2023

belimawr commented Feb 14, 2023

mgfnv9 commented Feb 14, 2023

mgfnv9 commented Feb 20, 2023

belimawr commented Feb 20, 2023 • edited Loading

xBlast59 commented May 17, 2023

marcoval123 commented Jun 16, 2023

belimawr commented Jun 19, 2023

botelastic bot commented Jun 18, 2024

marqc commented Feb 24, 2021 •

edited

Loading

stephan-erb-by commented Apr 12, 2022 •

edited

Loading

belimawr commented Feb 20, 2023 •

edited

Loading