-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Filebeat] Duplicated data when using filestream input #31239
Comments
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
@belimawr based on my reading of elastic/kibana#129851 (comment) it appears that
is not a recommended workaround. Is my understanding correct? |
Indeed. Thanks for catching this. I'll update the issue. |
Hi Team, Cloud AR Operations here. We are working to give credits back this customer ( Org-2878897275, Ubrich ) for the increase in usage due to this technical issue. I was provided with a 1:4 to 1:5 ratio of impact.
Also, I am aware that the cluster is still bloated - So far I have calculations based on the 1:4 & 1:5 ratio estimates, as well an estimate of credits based on their July 2022 usage. Working Calc Doc: Related case: 00929762 |
@irislanderos it is almost impossible to calculate the right ratio. As soon as you have an input without an ID specified, data will be duplicated. If you have X inputs, data may be duplciated X time. Thus having an accurate ratio is clearly impossible. @belimawr thoughts? |
I agree with @jlind23. It's not possible to have an exact estimation of the data duplication ratio. One thing that plays a role is the number of times Filebeat is restarted. Most of the steps that lead to this bug happen synchronously and the time when they happen will impact the amount of data duplicated. |
Hi All, Duplicate data is sent from Filebeat. Filebeat.yml ============================== Filebeat inputs ===============================filebeat.inputs:
|
The agent will use a second Filebeat process to ingest logs when log monitoring is enabled. Having two Filebeat processes running can be normal and does not necessarily cause data duplication. The processes started by the agent are controlled through the agent policy, and not the Filebeat configuration file. Please start a thread on https://discuss.elastic.co/tag/elastic-agent and someone will help you determine if this is a configuration issue or a bug in the agent. If it is a bug we will open a new issue to track it here. |
When there is more than one filestream input with the same ID (or without an ID) data duplication will happen when Filebeat is restarted or a new input with duplicated ID is started/created.
The root cause is related to the clean up of old entries in the registry.
Possible ways this bug manifests
How to detect the issue
The latest (
v7.17.2
andv8.1.2
) versions of Filebeat will issue a log error when this situation is detected:For multiple filestream inputs without IDs the message looks like:
Workaround
Set unique IDs to every filestream input for standalone Filebeat and standalone Elastic-Agent
Related issues
The text was updated successfully, but these errors were encountered: