-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filestream data duplication in filebeat 8.9.1 #36379
Comments
Hi @germain05, I have a few questions to better understand the problem you're facing:
|
config.yaml:
rbac.yaml:
Hope the additional informations can help |
Hi @germain05, Thanks for all the information. If there is duplicated data, you will see the duplicated documents in Kibana. The metrics for the duplicated input will not be collected because we can't have two metrics instances with the same ID.
The ID is not duplicated, if the bug is still there it is an odd race condition when inputs are started/stopped that is affecting our bookkeeping code. If I have time this week, I'll try again to reproduce it. |
Hi @belimawr, I unfortunately didn't had time to try to reproduce it in a smallerl, controlled experiment. I will wait again and see if you can reproduce it, and also have some explanation of why it still happen. Thanks for the efforts. |
@belimawr please can you share the part of your code causing this bug, it will be nice if I can share with my team mates what exactly is the root cause of the issue, so it is clear for everyone. |
Hi @germain05, or course I can :D The PR fixing the issue is this one: #35134. Here is the explanation. Bear in mind that the code I'm linking already contains the fix. This duplicated message happens because during a config validation beats/filebeat/input/v2/compat/compat.go Lines 73 to 86 in 3161fc0
beats/filebeat/input/v2/compat/compat.go Line 74 in 3161fc0
When the Filestream input is instantiated, we add it's ID to a map in the input manager: beats/filebeat/input/filestream/internal/input-logfile/manager.go Lines 181 to 192 in 3161fc0
This entry was only removed when the input was stopped: beats/filebeat/input/filestream/internal/input-logfile/manager.go Lines 251 to 255 in 3161fc0
The PR I linked above fixes it. At least on our tests we didn't manage to reproduce the issue after the fix and we have seen it working in many deployments. Another possibility for this issue was because the autodiscover code was generating too many "duplicated" events what could lead to inputs being started/stopped multiple times within a few seconds which could lead Filebeat to an odd state (mostly related to the Log input). This also has been resolved: #35645. |
Hi @belimawr , I want to let you know that, we could confirm that we actually have duplicate data and they are shipped to kibana. Is possible to investigate further this issue? |
Hi could be that this issue: #36378, is related to the issue I have here? |
I managed to do some testing, and I can confirm I'm able to reproduce getting the log messages about duplicated IDs with Filebeat
However I did not manage to reproduce any data duplication. #36378 is an interesting new data point on your case, @germain05. Are you migrating from the log/container input to filestream input using the takeover mode? The take over mode is definitely a new variable that is worth investigating. It would be very helpful if you can consistently reproduce it and describe how to reproduce in a controlled environment. |
@belimawr I am migrating from container input to filestream |
That really changes things and gives us a clear path to investigate. If you look at your data, were the duplicated events ingested far apart in time? One of my guesses is that the state migration from the log (container) input to filestream is not working as expected on this case. |
@germain05 I was talked with the team and there are a few things to notice:
If I have time this week, I'll try to dig a bit more into this issue. |
The PR introducing the If you still have some logs from when you first changed to filestream, that will likely help you to understand if the migration of the state is happening correctly. |
no they are next to each order, even at the same time |
I was already aware of the issue regarding the migration from the Your willingness to investigate this issue further is much appreciated. We'll be eagerly awaiting any updates or findings you might uncover. |
@rdner pointed out what is happening: The take_over mode, as it is implemented only works when Filebeat is starting, before the inputs are started because it needs exclusive access to the registry. This means that any dynamic loading of inputs (like autodiscover does) will not work, the take_over will just not run. Which will lead to data duplication because the file states are not migrated from the log input to the filestream input. |
I was pretty much convince that the issue was lying somewhere within the take_over. What will be the next step then, any nearest plan for a smooth migration from what about the log messages with duplicated IDs you could reproduce? any plan to fix that? even if the data in that case are not actually duplicated |
Yes, we have an open issue for that: #34393. It should be done at some point in the near future, but I can't give any specific deadline or target release.
Yes, I can reproduce it and I re-opened the original issue with a description of how I reproduced it. #31767 it also should be fixed in the near future. |
Closing in favor of #31767 |
Hello, I read various issues regarding the data duplication error messages in filebeat logs, However I haven't really understood what the root cause is. Please can some one explain me really what the root cause is? below is the error message
{"log.level":"error","@timestamp":"2023-08-21T10:05:27.167Z","log.logger":"input","log.origin":{"file.name":"input-logfile/manager.go","file.line":183},"message":"filestream input with ID 'filestream-kubernetes-pod-1170a1d331b2efe193b3816759e44d789b1841d8bed791f9859d442098341d9d' already exists, this will lead to data duplication, please use a different ID. Metrics collection has been disabled on this input.","service.name":"filebeat","ecs.version":"1.6.0"}
below is my daemonset.yaml:
The text was updated successfully, but these errors were encountered: