Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrations always running regardless of enabled flag #689

Merged
merged 17 commits into from
Jul 21, 2021
Merged

Conversation

mattdurham
Copy link
Collaborator

@mattdurham mattdurham commented Jun 25, 2021

PR Description

The manager is ALWAYS collecting the integration even if enabled is set to false.

Which issue(s) this PR fixes

Honor the enabled flag, when the manager was changed this behavior looks to have been lost.

Closes #686

Notes to the Reviewer

PR Checklist

  • [ X] CHANGELOG updated

@@ -280,6 +280,11 @@ func (m *Manager) ApplyConfig(cfg ManagerConfig) error {
shouldCollect = *common.ScrapeIntegration
}

// If Enabled is false then we should override the should collect
if p.cfg.CommonConfig().Enabled == false {
Copy link
Member

@rfratto rfratto Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think this is being done in the right place. The loop on line 210 would still spawn the integration, which it probably shouldn't be doing.

One way to handle this might be to add a function to return the list of only enabled integrations and iterate over that instead when calling ApplyConfig. That way disabling an integration and removing it from the file are treated as the same.

Copy link
Collaborator Author

@mattdurham mattdurham Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought on that was looking at the documentation is that enabled flag enables the scraping but if you include the integration in the configuration with enabled = false it still will get the statistics but won't send them. Should there be a difference in disabling an integration vs removing it, if the answer is no then I will put in my original fix. Specifically with the agent integration having it available but not sending for debugging purposes seemed useful. Can def go either way though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, the intent is that enabled drives whether the integration should be ran (so disabling it is the same as removing it), and scrape_integration determines if that running integration should be scraped by the Agent automatically.

Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Comment on lines 50 to 59
// Iterate over the configurations and remove any that Enabled = false, this works by setting i to the last element
// and then returning the array minus the last element. This loses order but is performant.
for i := 0; i < len(cfg.Integrations.Integrations);{
if !cfg.Integrations.Integrations[i].CommonConfig().Enabled {
cfg.Integrations.Integrations[i] = cfg.Integrations.Integrations[len(cfg.Integrations.Integrations)-1]
cfg.Integrations.Integrations = cfg.Integrations.Integrations[:len(cfg.Integrations.Integrations)-1]
} else {
i++
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a new function and call it from integrations.Manager.ApplyConfig instead? Filtering the actual slice here will omit the config from the results of /-/config which might be a little confusing to users.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got more complicated. When you remove a configuration it did not stop the scraper, so when a configuration is disabled/removed and reload called Agent would list scrape_error = 404. Since the integration had stopped but the scraper was still running.

@rfratto rfratto changed the title Enabled flag should drive collection Integrations always running regardless of enabled flag Jun 25, 2021
@@ -220,9 +222,14 @@ func (m *Manager) ApplyConfig(cfg ManagerConfig) error {
continue
}
p.stop()
m.im.DeleteConfig(key)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beforehand the scraper was not stopping, but the integration was.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following how this is necessary, the second loop should be deleting the config?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the addition of the enabled check around line 227 the integration process is not started so the integration isn't added to the to m.integrations.

@@ -268,6 +279,7 @@ func (m *Manager) ApplyConfig(cfg ManagerConfig) error {

_ = m.im.DeleteConfig(key)
process.stop()
m.im.DeleteConfig(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to call this twice?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all and removed.

Comment on lines 229 to 231
if !ic.CommonConfig().Enabled {
continue
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think a function to return a slice of enabled integrations would be easier to follow than having to maintain this check across the two loops, IMO.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need the integrations in the array to disable/remove them, ie going from enabled -> disabled on reload.

@@ -220,9 +222,14 @@ func (m *Manager) ApplyConfig(cfg ManagerConfig) error {
continue
}
p.stop()
m.im.DeleteConfig(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following how this is necessary, the second loop should be deleting the config?

@rfratto
Copy link
Member

rfratto commented Jun 25, 2021

Does this fix #686? That reports that set_collectors doesn't appear to be working with node_exporter, alongside this bug.

@mattdurham
Copy link
Collaborator Author

It fixes the enable issue that cropped during #686 , I was unable to reproduce an actual issue with set collectors. I tested several variations and they behaved as expected.

Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Can you add a regression test for this?

I'd also like to see a test that provides we need m.im.DeleteConfig in there twice, because it definitely seems unnecessary which makes it a candidate for accidental removal. Having a test would help prevent this.

@tbrandsma-rw
Copy link

Any updates on when this fix will be merged in? I am currently dealing with the fallout of this issue because the enabled flag isn't being honored.

@mattdurham
Copy link
Collaborator Author

Updated with unit tests and explanation below. Also had to update the mock integration so marshaling would work in the comparison. No tests were broken.

Situation removal of config on line 225 in manager.go

  1. Config is ENABLED
  2. Integrations and Configs and Processes are all ENABLED
    1. integration process is started
    2. instance process is started
    3. configs added
  3. ApplyConfig is called with integration DISABLED
    1. Enter loop at 210
    2. exist check succeeds
    3. yaml is different
    4. integration process is stopped
    5. integration is removed from manager integrations
    6. config is DISABLED so line 229 moves to next integration
    7. loop over manager integrations is called
    8. since the integration was removed in the above the DeleteConfig is never called
    9. the instance process continues running

If you run TestManager_IntegrationEnabledToDisabledReload with line 225 commented out, then the TestManager_IntegrationEnabledToDisabledReload will fail.

@mattdurham mattdurham requested a review from rfratto July 21, 2021 15:13
@mattdurham
Copy link
Collaborator Author

As far as a release, currently unknown, it is going to be bundled with some other large changes.

Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after removing the duplicated if statement in the first loop

Comment on lines 229 to 231
if !ic.CommonConfig().Enabled {
continue
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is in here twice :)

@mattdurham mattdurham merged commit 1c28f09 into main Jul 21, 2021
@mattdurham mattdurham deleted the enabled_issue branch July 21, 2021 17:55
@mattdurham mattdurham mentioned this pull request Sep 7, 2021
3 tasks
mattdurham added a commit that referenced this pull request Nov 11, 2021
* Enabled flag should drive collection

* Satisfy linter

* Fix mock integration so it sets enabled = true by default

* Make enabled the equivalent of not being in the config, removing it at the config reloader so that it propagates from the source downstream

* Fix issue where removing the config/disabling did NOT stop the scraper

* Remove whitespace

* remove unneeded comment

* Remove duplicated delete config

* Update comments and add unit tests. Also had to fix mock integration so that it generated yaml.

* Simplify unit test

* change case because of refactor

* filter integrations to enabled set

* Move check up to the first statement

* fix merge

* Fix for double loop

Co-authored-by: Robert Fratto <robertfratto@gmail.com>
@github-actions github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Apr 11, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't seem to limit series with integrations
3 participants