Fix polling node when it is not ready and monitor by hostname #22666

vjsamuel · 2020-11-19T01:02:17Z

Enhancement

What does this PR do?

This PR ensures that as a last resort we use node's host name to monitor the node in node autodiscover. It also ensures that all events emitted by autodiscover for nodes checks for ready state.

Why is it important?

The kube spec leaves it to providers to either define InternalIP, ExternalIp or HostName on the node object. It is upto us to ensure that we monitor any one of them. Currently we dont use HostName if the other two are missing.

By not checking for ready state, we leave add events to monitor NotReady nodes which is not right.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~[ ] I have made corresponding changes to the documentation~~
~~[ ] I have made corresponding change to the default configuration files~~
~~[ ] I have added tests that prove my fix is effective or that my feature works~~
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

How to test this PR locally

Related issues

elasticmachine · 2020-11-19T01:11:35Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Build Cause: Started by user Chris Mark
Start Time: 2020-12-01T10:13:22.042+0000
Duration: 71 min 33 sec

Test stats 🧪

Test	Results
Failed	0
Passed	16719
Skipped	1372
Total	18091

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test	Results
Failed	0
Passed	16719
Skipped	1372
Total	18091

elasticmachine · 2020-11-19T06:20:55Z

Pinging @elastic/integrations-platforms (Team:Platforms)

ChrsMark

Thanks @vjsamuel looks good, just left some minors. Also consider adding a unit test like

beats/libbeat/autodiscover/providers/kubernetes/node_test.go

Line 134 in e9df1af

Message: "Test node start",

libbeat/autodiscover/providers/kubernetes/node.go

jsoriano

Thanks!

jsoriano · 2020-11-20T09:57:06Z

libbeat/autodiscover/providers/kubernetes/node.go

@@ -168,6 +168,11 @@ func (n *node) emit(node *kubernetes.Node, flag string) {
 		return
 	}

+	// If the node is not in ready state then dont monitor it


Is that also correct for heartbeat? We might want to keep a monitor running to see when the node becomes ready. A node can become NotReady if it is powered off or has some kind of problem.

Also, I think we should still emit a stop event if a node is in NotReady state before being removed.

we can either do it here or move this to onAdd. what i saw is that when beats starts up, it starts to monitor hosts that are in NotReady state which is probably incorrect.

Ok, let's go with this. I guess that for the heartbeat case, if a node unexpectedly disappears, some monitor checks will fail before the node reaches the NotReady state.

jsoriano · 2020-11-20T09:57:15Z

libbeat/autodiscover/providers/kubernetes/node.go

+		if address.Type == v1.NodeHostName && address.Address != "" {
+			return address.Address
+		}
+	}


jsoriano · 2020-11-20T10:02:19Z

jenkins run the tests

libbeat/autodiscover/providers/kubernetes/node.go

ChrsMark · 2020-11-26T08:51:32Z

Thanks for addressing the comments @vjsamuel ! It looks good to me. Would it be possible to add a couple of unit tests for this please?

Also current CI failures look related: https://travis-ci.org/github/elastic/beats/jobs/745712376#L701

vjsamuel · 2020-11-30T07:00:16Z

@ChrsMark updated tests and addressed failures.

ChrsMark · 2020-11-30T08:35:29Z

jenkins run the tests

jsoriano

👍

ChrsMark · 2020-11-30T10:00:39Z

jenkins run the tests

ChrsMark · 2020-12-01T08:35:19Z

jenkins run the tests

ChrsMark · 2020-12-01T11:37:58Z

Thanks for adding this @vjsamuel ! Merging.

…c#22666) (cherry picked from commit 09008c8)

#22814) (cherry picked from commit 09008c8) Co-authored-by: Vijay Samuel <vjsamuel@ebay.com>

…-issues * upstream/master: (41 commits) Fix version parser regex for packaging (elastic#22581) Fix local_dynamic documentation and add providers inline doc. (elastic#22657) fix: use proper param name for e2e tests (elastic#22836) [Heartbeat] Fix exit on disabled monitor (elastic#22829) Update Golang to 1.14.12 (elastic#22790) docs: fix setup.template.overwrite typos (elastic#22804) Add docs section for ECS EC2 monitoring (elastic#22784) Fixing logic to keep list of unique cluster UUIDs (elastic#22808) Skip somewhat flaky UDP system test on Windows (elastic#22810) Fix polling node when it is not ready and monitor by hostname (elastic#22666) Skip Filebeat test_shutdown on windows 7 (elastic#22797) Make monitoring Namespace thread-safe (elastic#22640) Drop pkt_dstaddr and pkt_srcaddr when equals to "-" (elastic#22721) Add support for reading from UNIX datagram sockets (elastic#22699) Fix export dashboard command from Elastic Cloud (elastic#22746) Skip flaky winlogbeat test on Windows-7 (elastic#22754) Missing `>` (elastic#22763) (elastic#22766) Fix k8s watcher issue when node access to list nodes and ns (elastic#22714) [Metricbeat/Kibana/stats] Enforce `exclude_usage=true` (elastic#22732) Avoid sending non-numeric floats in cloud foundry integrations (elastic#22634) ...

…dows-7 * upstream/master: (41 commits) Fix version parser regex for packaging (elastic#22581) Fix local_dynamic documentation and add providers inline doc. (elastic#22657) fix: use proper param name for e2e tests (elastic#22836) [Heartbeat] Fix exit on disabled monitor (elastic#22829) Update Golang to 1.14.12 (elastic#22790) docs: fix setup.template.overwrite typos (elastic#22804) Add docs section for ECS EC2 monitoring (elastic#22784) Fixing logic to keep list of unique cluster UUIDs (elastic#22808) Skip somewhat flaky UDP system test on Windows (elastic#22810) Fix polling node when it is not ready and monitor by hostname (elastic#22666) Skip Filebeat test_shutdown on windows 7 (elastic#22797) Make monitoring Namespace thread-safe (elastic#22640) Drop pkt_dstaddr and pkt_srcaddr when equals to "-" (elastic#22721) Add support for reading from UNIX datagram sockets (elastic#22699) Fix export dashboard command from Elastic Cloud (elastic#22746) Skip flaky winlogbeat test on Windows-7 (elastic#22754) Missing `>` (elastic#22763) (elastic#22766) Fix k8s watcher issue when node access to list nodes and ns (elastic#22714) [Metricbeat/Kibana/stats] Enforce `exclude_usage=true` (elastic#22732) Avoid sending non-numeric floats in cloud foundry integrations (elastic#22634) ...

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 19, 2020

vjsamuel force-pushed the add_node_hostname branch from 9f29b60 to 69e9a3a Compare November 19, 2020 01:05

andresrc added the Team:Platforms Label for the Integrations - Platforms team label Nov 19, 2020

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 19, 2020

ChrsMark reviewed Nov 19, 2020

View reviewed changes

libbeat/autodiscover/providers/kubernetes/node.go Outdated Show resolved Hide resolved

libbeat/autodiscover/providers/kubernetes/node.go Outdated Show resolved Hide resolved

vjsamuel force-pushed the add_node_hostname branch from 69e9a3a to 373f422 Compare November 19, 2020 21:37

jsoriano reviewed Nov 20, 2020

View reviewed changes

jsoriano assigned ChrsMark Nov 20, 2020

jsoriano added needs_backport PR is waiting to be backported to other branches. v7.11.0 labels Nov 20, 2020

ChrsMark reviewed Nov 23, 2020

View reviewed changes

libbeat/autodiscover/providers/kubernetes/node.go Outdated Show resolved Hide resolved

vjsamuel force-pushed the add_node_hostname branch from 373f422 to 5b36e6f Compare November 24, 2020 19:11

vjsamuel added 3 commits November 29, 2020 22:50

Fix polling node when it is not ready and monitor by hostname

1cdbee5

Incorporate review comments

1effe07

Add test cases

b5aa3a5

vjsamuel force-pushed the add_node_hostname branch from 5b36e6f to b5aa3a5 Compare November 30, 2020 06:56

jsoriano approved these changes Nov 30, 2020

View reviewed changes

ChrsMark approved these changes Nov 30, 2020

View reviewed changes

ChrsMark merged commit 09008c8 into elastic:master Dec 1, 2020

ChrsMark pushed a commit to ChrsMark/beats that referenced this pull request Dec 1, 2020

Fix polling node when it is not ready and monitor by hostname (elasti…

4992aec

…c#22666) (cherry picked from commit 09008c8)

ChrsMark mentioned this pull request Dec 1, 2020

Cherry-pick #22666 to 7.x: Fix polling node when it is not ready and monitor by hostname #22814

Merged

3 tasks

ChrsMark removed the needs_backport PR is waiting to be backported to other branches. label Dec 1, 2020

ChrsMark added a commit that referenced this pull request Dec 2, 2020

Fix polling node when it is not ready and monitor by hostname (#22666) (

5030051

#22814) (cherry picked from commit 09008c8) Co-authored-by: Vijay Samuel <vjsamuel@ebay.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix polling node when it is not ready and monitor by hostname #22666

Fix polling node when it is not ready and monitor by hostname #22666

vjsamuel commented Nov 19, 2020 •

edited

Loading

elasticmachine commented Nov 19, 2020 •

edited by jenkins-beats-ci bot

Loading

Build stats

Test stats 🧪

Test stats 🧪

elasticmachine commented Nov 19, 2020

ChrsMark left a comment

jsoriano left a comment

jsoriano Nov 20, 2020

jsoriano Nov 20, 2020

vjsamuel Nov 20, 2020

jsoriano Nov 30, 2020

jsoriano Nov 20, 2020

jsoriano commented Nov 20, 2020

ChrsMark commented Nov 26, 2020 •

edited

Loading

vjsamuel commented Nov 30, 2020

ChrsMark commented Nov 30, 2020

jsoriano left a comment

ChrsMark commented Nov 30, 2020

ChrsMark commented Dec 1, 2020

ChrsMark commented Dec 1, 2020

Fix polling node when it is not ready and monitor by hostname #22666

Fix polling node when it is not ready and monitor by hostname #22666

Conversation

vjsamuel commented Nov 19, 2020 • edited Loading

What does this PR do?

Why is it important?

Checklist

Author's Checklist

How to test this PR locally

Related issues

elasticmachine commented Nov 19, 2020 • edited by jenkins-beats-ci bot Loading

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

Test stats 🧪

elasticmachine commented Nov 19, 2020

ChrsMark left a comment

Choose a reason for hiding this comment

jsoriano left a comment

Choose a reason for hiding this comment

jsoriano Nov 20, 2020

Choose a reason for hiding this comment

jsoriano Nov 20, 2020

Choose a reason for hiding this comment

vjsamuel Nov 20, 2020

Choose a reason for hiding this comment

jsoriano Nov 30, 2020

Choose a reason for hiding this comment

jsoriano Nov 20, 2020

Choose a reason for hiding this comment

jsoriano commented Nov 20, 2020

ChrsMark commented Nov 26, 2020 • edited Loading

vjsamuel commented Nov 30, 2020

ChrsMark commented Nov 30, 2020

jsoriano left a comment

Choose a reason for hiding this comment

ChrsMark commented Nov 30, 2020

ChrsMark commented Dec 1, 2020

ChrsMark commented Dec 1, 2020

vjsamuel commented Nov 19, 2020 •

edited

Loading

elasticmachine commented Nov 19, 2020 •

edited by jenkins-beats-ci bot

Loading

ChrsMark commented Nov 26, 2020 •

edited

Loading