Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Heartbeat] States reloader and eslegclient #33405

Merged
merged 11 commits into from
Nov 2, 2022

Conversation

emilioalvap
Copy link
Collaborator

@emilioalvap emilioalvap commented Oct 19, 2022

What does this PR do?

Requires andrewvc/integrations#1.
Supersedes #33357.
Fixes #33354, #33299 and #33244

This PR:

  • Introduces a DeferredStateLoader reloader for managed output section of heartbeat instances. It holds monitors from running state queries until the agent has had enough time to initialize it async or a default timeout is triggered.
  • Replaces the use of elasticsearch.Client with eslegclient.Connection, which supports all output options by default.
  • Fixes a bug where state would be retrieved even when found.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

To test eslegclient:

  1. Checkout this PR and build HB locally.
  2. Run ES locally with self-signed certificates and use that as HB default output with disabled ssl verification.
  3. Run/ stop HB multiple times. Check in discover that states are reused.

To test output reloader:

  1. Build Heartbeat and Elastic-agent locally.
  2. Enroll a container in a private location policy multiple times and check that states are reused.

Screenshots

image

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 19, 2022
@mergify
Copy link
Contributor

mergify bot commented Oct 19, 2022

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @emilioalvap? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

elasticmachine commented Oct 19, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-10-20T21:11:47.963+0000

  • Duration: 48 min 7 sec

Test stats 🧪

Test Results
Failed 0
Passed 1889
Skipped 25
Total 1914

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@andrewvc andrewvc changed the title [Draft][Heartbeat] States reloadder and eslegclient [Draft][Heartbeat] States reloader and eslegclient Oct 19, 2022
@emilioalvap emilioalvap added Team:obs-ds-hosted-services Label for the Observability Hosted Services team v8.6.0 bug and removed needs_team Indicates that the issue/PR needs a Team:* label labels Oct 20, 2022
@emilioalvap emilioalvap marked this pull request as ready for review October 20, 2022 18:28
@emilioalvap emilioalvap requested a review from a team as a code owner October 20, 2022 18:29
@elasticmachine
Copy link
Collaborator

Pinging @elastic/uptime (Team:Uptime)

@emilioalvap emilioalvap changed the title [Draft][Heartbeat] States reloader and eslegclient [Heartbeat] States reloader and eslegclient Oct 20, 2022
@emilioalvap
Copy link
Collaborator Author

/test

Copy link
Contributor

@andrewvc andrewvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left an initial review based on viewing code alone, will do some manual testing next.

@@ -158,6 +158,8 @@
- name: down
type: integer
description: total down checks run
- name: reason
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should nix this unless we actually populate it in this PR

Ends *State `json:"ends"`
Ends *State `json:"ends"`
// Cause of the transition to this state
Reason string `json:"reason"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, worth nixing unless we use it

@@ -16395,6 +16395,13 @@ type: object

--

*`state.reason`*::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be deleted by a mage update

if esc != nil {
stateLoader = monitorstate.MakeESLoader(esc, "synthetics-*,heartbeat-*", parsedConfig.RunFrom)
stateLoader, replaceStateLoader := monitorstate.AtomicStateLoader(monitorstate.NilStateLoader)
if b.Config.Output.Name() == "elasticsearch" && !b.Manager.Enabled() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add in the changes from https://github.com/elastic/beats/pull/33375/files#diff-3edc3a1c86f948ac04dc2aeef09be1c549a2e5cbedd80a76e49dde4d95fa6883R82 which adds a heartbeat.states.enabled config flag. The service won't have the manager enabled, so we'll have to enable that to have states work with the service.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually checks if heartbeat is in unmanaged mode (!b.Manager.Enabled()) to initialize a non-deferred states loader, so it should apply for the service as well.

IIRC, states flag was motivated by #33357. IMO, this PR supersedes the original problem since we can now load the config on both modes. Is there an scenario where we would still like to disable states?

logp.L().Warn("Timeout trying to defer state loader")
}

defer cancel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why defer these both if no statements follow them? Were they intended to go at the top of the function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a mix of an idiomatic declaration of intent and a sequential mindset when writing it 😄 I can move them to the top

stateLoader, replaceStateLoader := AtomicStateLoader(inner)

wg := sync.WaitGroup{}
ctx, cancel := context.WithTimeout(context.Background(), timeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very idiomatic go concurrency!

@@ -65,3 +67,27 @@ func TestTrackerRecordFlappingDisabled(t *testing.T) {
require.Equal(t, StatusDown, ms.Status)
requireMSStatusCount(t, ms, StatusDown, 1)
}

func TestAtomicStateLoader(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for the new DeferredStateLoader as well?

Copy link
Contributor

@andrewvc andrewvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this with heartbeat's local monitor reloading, it worked fine. I think the CFT cluster I used to test this was missing elastic/kibana#143162, but I'm LGTM given what I've seen here.

@cmacknz
Copy link
Member

cmacknz commented Oct 27, 2022

Drive by comment: this should ensure your ES client keeps working when we merge the agent V2 support for beats in the next few days. I have this PR listed as a "blocker" for our merge of the v2 agent changes here for this reason, to ensure your second ES client keeps working after we merge those changes.

@emilioalvap
Copy link
Collaborator Author

@cmacknz My initial impression was that libbeat output reloader is not affected by V2 changes. To make sure, I've gone back and tested it (partially) on #33157 & https://github.com/elastic/beats/tree/feature-arch-v2. The reload callback is still being fired as expected:

{"file.name":"beater/heartbeat.go","file.line":194},"message":"RELOAD OUTPUT: %!w
(config.Namespace={elasticsearch 0xc000628210})","service.name":"heartbeat","ecs.version":"1.6.0"}

IMO, the scope of elastic/elastic-agent#1476 changes with this approach. Do you think it would be possible to have a similar mechanism with the shipper?

@cmacknz
Copy link
Member

cmacknz commented Oct 31, 2022

IMO, the scope of elastic/elastic-agent#1476 changes with this approach. Do you think it would be possible to have a similar mechanism with the shipper?

Yes, ideally this should be the only change you need to make. When the shipper is introduced, we just need to keep providing the Elasticsearch output configuration to Heartbeat and this code should keep working.

@emilioalvap emilioalvap added the backport-v8.5.0 Automated backport with mergify label Nov 2, 2022
@emilioalvap emilioalvap merged commit 5c1d1af into elastic:main Nov 2, 2022
mergify bot pushed a commit that referenced this pull request Nov 2, 2022
* Better loading / reloading of state tracker

* Refactor states loader to use eslegclient and reload config when running in managed mode

* Add deferred state loader for managed instances, fix some logs

* Fix integration tests

* Review feedback, deferred loader unit test and changelog

Co-authored-by: Andrew Cholakian <andrewvc@elastic.co>
(cherry picked from commit 5c1d1af)
emilioalvap added a commit that referenced this pull request Nov 2, 2022
* Better loading / reloading of state tracker

* Refactor states loader to use eslegclient and reload config when running in managed mode

* Add deferred state loader for managed instances, fix some logs

* Fix integration tests

* Review feedback, deferred loader unit test and changelog

Co-authored-by: Andrew Cholakian <andrewvc@elastic.co>
(cherry picked from commit 5c1d1af)

Co-authored-by: Emilio Alvarez Piñeiro <95703246+emilioalvap@users.noreply.github.com>
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
* Better loading / reloading of state tracker

* Refactor states loader to use eslegclient and reload config when running in managed mode

* Add deferred state loader for managed instances, fix some logs

* Fix integration tests

* Review feedback, deferred loader unit test and changelog

Co-authored-by: Andrew Cholakian <andrewvc@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.5.0 Automated backport with mergify bug Team:obs-ds-hosted-services Label for the Observability Hosted Services team v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Heartbeat] SSL Verification can't be disabled on 8.5 for states ES client
4 participants