You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the past, Metricbeat's Elasticsearch module has created issues with performance because it consumes APIs that didn't scale well with the size of the monitored cluster. Thanks to a lot of effort by the Elasticsearch team, these APIs now perform much better.
Despite these improvements we still see issues in ESS but now it seems the problem is that Metricbeat is consuming too much CPU when parsing and processing the large responses that Elasticsearch returns. The effort for Elasticsearch to generate these responses is fairly small and thus if you look at the CPU usage of Elasticsearch itself it is low (on the master nodes where this happens), but we see performance issues because Metricbeat takes up the CPU trying to process the response, leaving little CPU for the master node to use which causes general instability.
A larger fix for this is outlined here, to make Metricbeat adopt it's resource usage based on available CPU to not crowd out the other processes that are running.
We may want to also consider revisiting elastic/kibana#130575 and seeing if we can get the same data through other APIs which may have smaller responses to process.
Short term improvement
We have gotten feedback that the code in the Elasticsearch module could be optimized to reduce the CPU/Memory usage as well as speed up the processing of responses.
The main culprits seem to be an excessive usage of mapstr and schema, as well as unmarshalling too much of the JSON response (more than we need to generate the event documents). We should also see if it's possible for us to reduce the amount of data we send to Elasticsearch since that also takes time when the cluster becomes larger.
Development tips
Metricbeat has cpuprofile and memprofile as flags you can use to enable resource profiling.
AC
Usage of mapstr is eliminated
Usage of schema is replaced with a hard coded Go struct that can be used for JSON parsing but only for the exact data we need
Documents are trimmed to only send fields that are indexed
A noticeable improvement in CPU usage is measured for large clusters
The text was updated successfully, but these errors were encountered:
If Elasticsearch is sending data which is not important to Metricbeat, it would be worth trying to make use of the ?filter_path query option to drop the unimportant bits before they even leave Elasticsearch, which should save CPU on both sides. This won't solve the O(#shards) scaling issues ofc but it might extend the runway a bit.
Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!
We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!
Background
In the past, Metricbeat's Elasticsearch module has created issues with performance because it consumes APIs that didn't scale well with the size of the monitored cluster. Thanks to a lot of effort by the Elasticsearch team, these APIs now perform much better.
Despite these improvements we still see issues in ESS but now it seems the problem is that Metricbeat is consuming too much CPU when parsing and processing the large responses that Elasticsearch returns. The effort for Elasticsearch to generate these responses is fairly small and thus if you look at the CPU usage of Elasticsearch itself it is low (on the master nodes where this happens), but we see performance issues because Metricbeat takes up the CPU trying to process the response, leaving little CPU for the master node to use which causes general instability.
A larger fix for this is outlined here, to make Metricbeat adopt it's resource usage based on available CPU to not crowd out the other processes that are running.
We may want to also consider revisiting elastic/kibana#130575 and seeing if we can get the same data through other APIs which may have smaller responses to process.
Short term improvement
We have gotten feedback that the code in the Elasticsearch module could be optimized to reduce the CPU/Memory usage as well as speed up the processing of responses.
The main culprits seem to be an excessive usage of
mapstr
andschema
, as well as unmarshalling too much of the JSON response (more than we need to generate the event documents). We should also see if it's possible for us to reduce the amount of data we send to Elasticsearch since that also takes time when the cluster becomes larger.Development tips
Metricbeat has
cpuprofile
andmemprofile
as flags you can use to enable resource profiling.AC
mapstr
is eliminatedschema
is replaced with a hard coded Go struct that can be used for JSON parsing but only for the exact data we needThe text was updated successfully, but these errors were encountered: