Debounce input reload on autodiscover #35645

belimawr · 2023-06-01T15:56:28Z

What does this PR do?

The Kubernetes autodiscover feature now incorporates a debounce logic when reloading inputs. By default, it waits for at least 1 second before invoking the Reload method. In case of an error, it introduces a 10-second delay before retrying.

The channel used for test synchronisation has been removed and tests now use (assert/require).Eventually.

When Autodiscover calls cfgfile.NewRunnerList to instantiate a RunnerList, it now specifies a different logger name, enabling more granular log filtering.

Debug logs now provide information about the reasons for invoking Reload.

Certain tests that perform sequential actions now utilise require instead of assert to maintain a consistent state avoid cascading failures.

Tests that required updates now leverage require.Eventually instead of wait, providing additional information on failure causes.

Documentation for cfgfile.RunnerList has been improved to enhance clarity.

Why is it important?

Fixes #34388

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~- [ ] I have made corresponding changes to the documentation~~
~~- [ ] I have made corresponding change to the default configuration files~~
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

~~## Author's Checklist~~

How to test this PR locally

Start a Kubernetes cluster (I used Minikube with none driver on a VM)
Deploy some pods to generate logs constantly

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
  labels:
    app: flog-01
  name: flog-01
spec:
  progressDeadlineSeconds: 600
  replicas: 4
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: flog-01
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: flog-01
    spec:
      containers:
      - args:
        - -l
        - -s
        - "1"
        - -d
        - "1"
        image: mingrammer/flog
        imagePullPolicy: Always
        name: flog
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      securityContext: {}
      terminationGracePeriodSeconds: 30

Start Filebeat with the following configuration

filebeat.autodiscover:
  providers:
    - type: kubernetes
      node: ${NODE_NAME}
      hints.enabled: true
      hints.default_config:
        type: container
        id: "runner-${data.kubernetes.container.id}"
        paths:
          - /var/log/containers/*-${data.kubernetes.container.id}.log
        fields:
          role: kubernetes

logging:
  level: debug
  selectors:
    - autodiscover
    - autodiscover.cfgfile
output.elasticsearch:
  hosts:
    - https://foo.bar.cloud.elastic.co:443
  protocol: https
  username: foo
  password: bar

There should be no data loss or constant log messages like:

Error creating runner from config: Can only start an input when all related states are finished

Related issues

~~## Use cases~~
~~## Screenshots~~
~~## Logs~~

mergify · 2023-06-01T15:57:03Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @belimawr? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

elasticmachine · 2023-06-01T16:01:29Z

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

elasticmachine · 2023-06-01T17:16:57Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2023-06-20T14:45:54.810+0000
Duration: 84 min 19 sec

Test stats 🧪

Test	Results
Failed	0
Passed	27449
Skipped	2013
Total	29462

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

CHANGELOG.next.asciidoc

libbeat/autodiscover/autodiscover.go

libbeat/autodiscover/autodiscover_test.go

libbeat/autodiscover/autodiscover.go

ycombinator

Change LGTM. Just left a couple of minor suggestions.

belimawr · 2023-06-08T12:46:44Z

@ycombinator all changes you requested are on b82eda4.

ycombinator

LGTM.

belimawr · 2023-06-09T16:24:11Z

/test

belimawr · 2023-06-09T16:26:08Z

Rebasing onto main is the easiest way to get the tests to run again 🤷‍♂️

belimawr · 2023-06-15T09:39:46Z

This PR is finally ready for a final review. The tests were failing because of a bug I introduced, it has been fixed on 1f38a4c and
65208cb adds tests for that case.

@ycombinator could you review it again?

I changed the implementation

libbeat/autodiscover/autodiscover_test.go

libbeat/autodiscover/autodiscover.go

The Kubernetes autodiscover feature now incorporates a debounce logic when reloading inputs. By default, it waits for at least 1 second before invoking the Reload method. In case of an error, it introduces a 10-second delay before retrying. The channel used for test synchronisation has been removed and tests now use (assert/require).Eventually. When Autodiscover calls `cfgfile.NewRunnerList` to instantiate a RunnerList, it now specifies a different logger name, enabling more granular log filtering. Debug logs now provide information about the reasons for invoking Reload. Certain tests that perform sequential actions now utilise `require` instead of `assert` to maintain a consistent state avoid cascading failures. Tests that required updates now leverage `require.Eventually` instead of `wait`, providing additional information on failure causes. Documentation for `cfgfile.RunnerList` has been improved to enhance clarity.

This tests seems flaky on CI, increasing the timeout might help.

Test the case when handle(Start|Stop) is called multiple times and on at least on the last one they return false.

belimawr · 2023-06-19T10:08:12Z

Tests were failing due to CI issues (pip was failing). I re-based onto main and force-pushed.

ycombinator

LGTM.

The Kubernetes autodiscover feature now incorporates a debounce logic when reloading inputs. By default, it waits for at least 1 second before invoking the Reload method. In case of an error, it introduces a 10-second delay before retrying. The channel used for test synchronisation has been removed and tests now use (assert/require).Eventually. When Autodiscover calls `cfgfile.NewRunnerList` to instantiate a RunnerList, it now specifies a different logger name, enabling more granular log filtering. Debug logs now provide information about the reasons for invoking Reload. Certain tests that perform sequential actions now utilise `require` instead of `assert` to maintain a consistent state avoid cascading failures. Tests that required updates now leverage `require.Eventually` instead of `wait`, providing additional information on failure causes. Documentation for `cfgfile.RunnerList` has been improved to enhance clarity. (cherry picked from commit d270536)

The Kubernetes autodiscover feature now incorporates a debounce logic when reloading inputs. By default, it waits for at least 1 second before invoking the Reload method. In case of an error, it introduces a 10-second delay before retrying. The channel used for test synchronisation has been removed and tests now use (assert/require).Eventually. When Autodiscover calls `cfgfile.NewRunnerList` to instantiate a RunnerList, it now specifies a different logger name, enabling more granular log filtering. Debug logs now provide information about the reasons for invoking Reload. Certain tests that perform sequential actions now utilise `require` instead of `assert` to maintain a consistent state avoid cascading failures. Tests that required updates now leverage `require.Eventually` instead of `wait`, providing additional information on failure causes. Documentation for `cfgfile.RunnerList` has been improved to enhance clarity. (cherry picked from commit d270536) Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>

belimawr · 2023-09-27T08:40:45Z

This also seems to relate to #34717, I'll edit the description of the PR adding this information.

belimawr added the bug label Jun 1, 2023

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 1, 2023

mergify bot assigned belimawr Jun 1, 2023

belimawr added the Team:Elastic-Agent Label for the Agent team label Jun 1, 2023

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 1, 2023

belimawr added the backport-v8.8.0 Automated backport with mergify label Jun 1, 2023

belimawr marked this pull request as ready for review June 1, 2023 16:01

belimawr requested a review from a team as a code owner June 1, 2023 16:01

belimawr requested review from ycombinator and leehinman June 1, 2023 16:01

belimawr force-pushed the fix-input-reload-autodiscover branch 2 times, most recently from 6dc9b2c to 10eac4f Compare June 5, 2023 14:04

ycombinator reviewed Jun 6, 2023

View reviewed changes

CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved

ycombinator reviewed Jun 6, 2023

View reviewed changes

libbeat/autodiscover/autodiscover.go Outdated Show resolved Hide resolved

ycombinator reviewed Jun 6, 2023

View reviewed changes

libbeat/autodiscover/autodiscover_test.go Outdated Show resolved Hide resolved

ycombinator reviewed Jun 6, 2023

View reviewed changes

libbeat/autodiscover/autodiscover.go Outdated Show resolved Hide resolved

ycombinator approved these changes Jun 6, 2023

View reviewed changes

ycombinator previously approved these changes Jun 8, 2023

View reviewed changes

belimawr force-pushed the fix-input-reload-autodiscover branch from b82eda4 to a255240 Compare June 9, 2023 16:25

belimawr requested a review from ycombinator June 14, 2023 16:52

belimawr changed the title ~~Debounce input reload on Kubernetes autodiscover~~ Debounce input reload on autodiscover Jun 15, 2023

ycombinator reviewed Jun 15, 2023

View reviewed changes

libbeat/autodiscover/autodiscover_test.go Show resolved Hide resolved

ycombinator reviewed Jun 15, 2023

View reviewed changes

libbeat/autodiscover/autodiscover.go Show resolved Hide resolved

belimawr added 14 commits June 19, 2023 12:04

Make linter happy

f1507d3

Changelog and PR improvements

b1a541d

PR improvements

e127631

Increase test timeout

f842cb4

This tests seems flaky on CI, increasing the timeout might help.

Increase test timeout, again.

135223e

Debug logs for failing test

b1ff90b

print Beats logs on python test failure

1370448

enable dev build for system test binary

992cc1c

improve test debug

ed78bb7

fix autodiscover

15983ac

pr improvements

dd830a9

add tests to ensure the changes actually work

9b85c3b

Test the case when handle(Start|Stop) is called multiple times and on at least on the last one they return false.

PR improvements

e67e829

belimawr force-pushed the fix-input-reload-autodiscover branch from 5bd1bd8 to e67e829 Compare June 19, 2023 10:07

belimawr requested a review from ycombinator June 19, 2023 16:09

speed up tests

87e8eee

ycombinator approved these changes Jun 20, 2023

View reviewed changes

belimawr merged commit d270536 into elastic:main Jun 20, 2023

belimawr deleted the fix-input-reload-autodiscover branch June 20, 2023 16:43

mergify bot mentioned this pull request Jun 20, 2023

[8.8](backport #35645) Debounce input reload on autodiscover #35836

Merged

reakaleek mentioned this pull request Jul 19, 2023

Fix ironbank validation in 8.8 #36115

Closed

6 tasks

belimawr mentioned this pull request Aug 25, 2023

Filestream data duplication in filebeat 8.9.1 #36379

Closed

belimawr mentioned this pull request Mar 22, 2024

Investigate number of debug messages of add_kubernetes_metadata processor #38529

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debounce input reload on autodiscover #35645

Debounce input reload on autodiscover #35645

belimawr commented Jun 1, 2023 •

edited

Loading

mergify bot commented Jun 1, 2023

elasticmachine commented Jun 1, 2023

elasticmachine commented Jun 1, 2023 •

edited by jenkins-beats-ci bot

Loading

Build stats

Test stats 🧪

ycombinator left a comment

belimawr commented Jun 8, 2023

ycombinator left a comment

belimawr commented Jun 9, 2023

belimawr commented Jun 9, 2023

belimawr commented Jun 15, 2023

belimawr commented Jun 19, 2023

ycombinator left a comment

belimawr commented Sep 27, 2023

Debounce input reload on autodiscover #35645

Debounce input reload on autodiscover #35645

Conversation

belimawr commented Jun 1, 2023 • edited Loading

What does this PR do?

Why is it important?

Checklist

How to test this PR locally

Related issues

mergify bot commented Jun 1, 2023

elasticmachine commented Jun 1, 2023

elasticmachine commented Jun 1, 2023 • edited by jenkins-beats-ci bot Loading

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

🤖 GitHub comments

ycombinator left a comment

Choose a reason for hiding this comment

belimawr commented Jun 8, 2023

ycombinator left a comment

Choose a reason for hiding this comment

belimawr commented Jun 9, 2023

belimawr commented Jun 9, 2023

belimawr commented Jun 15, 2023

belimawr commented Jun 19, 2023

ycombinator left a comment

Choose a reason for hiding this comment

belimawr commented Sep 27, 2023

belimawr commented Jun 1, 2023 •

edited

Loading

elasticmachine commented Jun 1, 2023 •

edited by jenkins-beats-ci bot

Loading