-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing logic to keep list of unique cluster UUIDs #22808
Fixing logic to keep list of unique cluster UUIDs #22808
Conversation
Pinging @elastic/stack-monitoring (Stack monitoring) |
Pinging @elastic/integrations-services (Team:Services) |
clusterToPipelinesMap[overrideClusterUUID] = pipelines | ||
return clusterToPipelinesMap | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this section because it's also incorrect. We shouldn't be using the override cluster UUID, if set, for all pipelines so broadly. Instead we should figure out (as we do further below) the cluster UUIDs appropriate for each pipeline and build the cluster UUID => pipelines map that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still a change in semantics and maybe a bc breaking change.
Before this change all pipelines have had been associated with overrideClusterUUID
. With this change pipelines are only associated with overrideClusterUUID
iff they do no clusterUUID set. This way overrideClusterUUID
actually becomes defaultClusterUUID
.
Instead of removing this line, how about printing a deprecation warning if this setting is configured and introduce an a default cluster uuid setting that matches the new semantics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, looking at where the value of overrideClusterUUID
comes from ultimately, it comes from the monitoring.cluster_uuid
setting in logstash.yml
: https://www.elastic.co/guide/en/logstash/current/monitoring-with-metricbeat.html#define-cluster__uuid.
Reading the documentation for that setting:
To bind the metrics of Logstash to a specific cluster, optionally define ...
This makes me think, if this setting is set and therefore overrideClusterUUID
is set, we should make that the cluster UUID for all pipelines, ignoring what Elasticsearch clusters each pipeline might individually be talking to.
So I'm going to revert this change and, in fact, add a similar code block in the makeClusterToPipelinesMap
function in the node
metricset.
/cc @sayden |
for _, pipeline := range pipelines { | ||
var clusterUUIDs []string | ||
clusterUUIDs := make(map[string]struct{}, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is common.StringSet
.
The overall data expension starts by converting the map[string]pipelineState
into a []pipelineState
. Would it make sense to keep the map instead, to reduce the chance of creating duplicate entries in general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the code to use common.StringSet
in 4cb337f but I didn't follow this:
The overall data expension starts by converting the
map[string]pipelineState
into a[]pipelineState
. Would it make sense to keep the map instead, to reduce the chance of creating duplicate entries in general?
I think it's because I'm not finding the map[string]pipelineState
you mentioned. Where are you seeing that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do the conversion here.
I was wondering if it is necessary, or maybe can even lead to similar problems (in the future) if we are not careful about not producing duplicates.
} | ||
|
||
for _, clusterUUID := range clusterUUIDs { | ||
for clusterUUID, _ := range clusterUUIDs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit clusterUUID := range clusterUUIDS { ... }
clusterToPipelinesMap[overrideClusterUUID] = pipelines | ||
return clusterToPipelinesMap | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still a change in semantics and maybe a bc breaking change.
Before this change all pipelines have had been associated with overrideClusterUUID
. With this change pipelines are only associated with overrideClusterUUID
iff they do no clusterUUID set. This way overrideClusterUUID
actually becomes defaultClusterUUID
.
Instead of removing this line, how about printing a deprecation warning if this setting is configured and introduce an a default cluster uuid setting that matches the new semantics?
}, | ||
}, | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this particular case able to reproduce the issue we have had? If not, we should add one.
The expannsion into the cluster map is somewhat critical in order to produce correct docs, that show the correct association of configurations to single Elasticsearch clusters. Can we make this test more exhaustive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the case that was problematic. Before this PR this case would've produced 3 PipelineState
associated with the prod_cluster_id
cluster UUID when we should've only got 1 PipelineState
.
I will expand this test to add a few more test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more test cases in 672ee5e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
@@ -67,20 +67,20 @@ func makeClusterToPipelinesMap(pipelines []logstash.PipelineState, overrideClust | |||
clusterToPipelinesMap = make(map[string][]logstash.PipelineState) | |||
|
|||
for _, pipeline := range pipelines { | |||
var clusterUUIDs []string | |||
clusterUUIDs := make(map[string]struct{}, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common.StringSet
@urso I've addressed all your feedback. Would you mind re-reviewing please, when you get a chance? Thanks! |
} | ||
|
||
for _, clusterUUID := range clusterUUIDs { | ||
for _, clusterUUID := range clusterUUIDs.ToSlice() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ToSlice
is a little overkill here (it copies and sorts all strings). The StringSet is defined as type StringSet map[string]struct{}
and can be directly use with the range-clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in dcfce4d.
#22808) (#22817) * Fixing logic to keep list of unique cluster UUIDs (#22808) * Fixing logic to keep list of unique cluster UUIDs * Adding CHANGELOG entry * Use common.StringSet * Adding more test cases * Adding back logic to broadly override cluster UUID for all pipelines, if set * Removing ToSlice() * Fixing loop * Fixing up CHANGELOG
* Fixing logic to keep list of unique cluster UUIDs * Adding CHANGELOG entry * Use common.StringSet * Adding more test cases * Adding back logic to broadly override cluster UUID for all pipelines, if set * Removing ToSlice() * Fixing loop
…-issues * upstream/master: (41 commits) Fix version parser regex for packaging (elastic#22581) Fix local_dynamic documentation and add providers inline doc. (elastic#22657) fix: use proper param name for e2e tests (elastic#22836) [Heartbeat] Fix exit on disabled monitor (elastic#22829) Update Golang to 1.14.12 (elastic#22790) docs: fix setup.template.overwrite typos (elastic#22804) Add docs section for ECS EC2 monitoring (elastic#22784) Fixing logic to keep list of unique cluster UUIDs (elastic#22808) Skip somewhat flaky UDP system test on Windows (elastic#22810) Fix polling node when it is not ready and monitor by hostname (elastic#22666) Skip Filebeat test_shutdown on windows 7 (elastic#22797) Make monitoring Namespace thread-safe (elastic#22640) Drop pkt_dstaddr and pkt_srcaddr when equals to "-" (elastic#22721) Add support for reading from UNIX datagram sockets (elastic#22699) Fix export dashboard command from Elastic Cloud (elastic#22746) Skip flaky winlogbeat test on Windows-7 (elastic#22754) Missing `>` (elastic#22763) (elastic#22766) Fix k8s watcher issue when node access to list nodes and ns (elastic#22714) [Metricbeat/Kibana/stats] Enforce `exclude_usage=true` (elastic#22732) Avoid sending non-numeric floats in cloud foundry integrations (elastic#22634) ...
…dows-7 * upstream/master: (41 commits) Fix version parser regex for packaging (elastic#22581) Fix local_dynamic documentation and add providers inline doc. (elastic#22657) fix: use proper param name for e2e tests (elastic#22836) [Heartbeat] Fix exit on disabled monitor (elastic#22829) Update Golang to 1.14.12 (elastic#22790) docs: fix setup.template.overwrite typos (elastic#22804) Add docs section for ECS EC2 monitoring (elastic#22784) Fixing logic to keep list of unique cluster UUIDs (elastic#22808) Skip somewhat flaky UDP system test on Windows (elastic#22810) Fix polling node when it is not ready and monitor by hostname (elastic#22666) Skip Filebeat test_shutdown on windows 7 (elastic#22797) Make monitoring Namespace thread-safe (elastic#22640) Drop pkt_dstaddr and pkt_srcaddr when equals to "-" (elastic#22721) Add support for reading from UNIX datagram sockets (elastic#22699) Fix export dashboard command from Elastic Cloud (elastic#22746) Skip flaky winlogbeat test on Windows-7 (elastic#22754) Missing `>` (elastic#22763) (elastic#22766) Fix k8s watcher issue when node access to list nodes and ns (elastic#22714) [Metricbeat/Kibana/stats] Enforce `exclude_usage=true` (elastic#22732) Avoid sending non-numeric floats in cloud foundry integrations (elastic#22634) ...
What does this PR do?
This PR fixes a pretty bad bug with the
logstash
module whenxpack.enabled: true
is set.The
logstash
module has two metricsets:node
andnode_stats
. Whenxpack.enabled: true
is set, these metricsets emit events for Stack Monitoring withtype: logstash_state
andtype: logstash_stats
, respectively.The
node
metricset is supposed to emit as many events perFetch()
cycle as there are monitored Logstash pipelines x the number of Elasticsearch clusters that each such pipeline talks to. If a Logstash pipeline does not talk to an Elasticsearch cluster, one event for that pipeline must be emitted.Instead, due to terribly faulty logic in the code, the
node
metricset was emitting as many events perFetch()
cycle as there are monitored Logstash pipelines x the number of vertices in each such pipeline!There was similar faulty logic in the
node_stats
metricset.This PR fixes this bug in both metricsets.
Why is it important?
By fixing this bug, we are eliminating a huge source of redundant and unnecessary events being published from the
logstash
module whenxpack.enabled: true
is set. This will not only make the module's behavior correct but also drastically reduce memory consumption in Metricbeat and the number of events sent to Metricbeat's output over the wire.Checklist
I have commented my code, particularly in hard-to-understand areasI have made corresponding changes to the documentationI have made corresponding change to the default configuration filesCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.