-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add plugin status plumbing and update server to use it #2041
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few high level comments. Once the mutex cleanup is done we can merge this.
@@ -173,8 +181,15 @@ func (p *Plugin) Reconfigure(ctx context.Context, config interface{}) { | |||
} | |||
p.downloaders[name] = p.newDownloader(name, source) | |||
p.downloaders[name].Start(ctx) | |||
readyNow = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting change. The most pressing issue is that readiness checks in K8s don't take into account discovery. This PR fixes that however if bundles are added/updated and fail to download within a certain amount of time, the liveness probe will fail and the OPA pod will get whacked. Maybe that's the right behaviour. Do we document this somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, and I think that is probably the right behavior for some use-cases.. At that point in time the OPA is not in a state that it should be answering queries since its missing bundles, maybe some applications it doesn't matter, some it does.
Specific to the OPA HTTP server reporting a 500 on /health we document that any non-activated and configured bundle will cause the health check to fail. Maybe what i'll do is revert the change to the example deployment to omit the ?plugin
option on the liveness probe to avoid any accidental confusion on this. People should probably opt in to using it versus just copying the default and having this more aggressive behavior.
5148810
to
09d42dc
Compare
This defines a new status API on the plugins.Manager for plugins to be able to update their status. Signed-off-by: Patrick East <east.patrick@gmail.com>
The discovery plugin will update its status based on the first time it initializes and activates configuration. If a bundle is configured it will switch to OK the first time it activates the configuration of that bundle. If no dynamic configuration is present it will just go to an "ok" status at Start() time. Signed-off-by: Patrick East <east.patrick@gmail.com>
The bundle plugin will emit status based on it having an activated bundle for each configured bundle. At startup it will remain in a "not ready" state until all bundles are activated. Upon reconfiguration it will drop back into "not ready" if new bundles are added or the config for an existing one was changed, until the bundles have been activated. Signed-off-by: Patrick East <east.patrick@gmail.com>
The log plugin will report its status as "ok" at start time and "not ready" at stop. It currently doesn't change status or report any errors. This can potentially be extended in the future. Signed-off-by: Patrick East <east.patrick@gmail.com>
This adds in the status updates for the status plugin (reported to the plugin manager) as well as hooking the status plugin up as a listener for *all* plugin status updates to include in the status API events. Signed-off-by: Patrick East <east.patrick@gmail.com>
The health check now supports a `?plugin` option which will make the response depend on whether or not all configured plugins are in an OK state. The `bundle` parameter will now use the bundle *and* discovery plugin statuses to determine if the bundles are ready. This corrects an issue where discovery bundles, and bundles defined by the discovery dynamic config, were not included with `/health?bundle=true` checks. The URL parameter parsing has also changed to allow for omitting the value for the `bundle` option. It will default to `true` so that `/health?bundle=true` can be shortened to `/health?bundle`. Fixes: open-policy-agent#2010 Fixes: open-policy-agent#2015 Signed-off-by: Patrick East <east.patrick@gmail.com>
This deprecates `/health?bundle` in favor of (the plural) `/health?bundles`. Both have the same API result, taking into account *all* bundle statuses. This just lessen any confusion with regards to whether `bundle` means a single bundle or all. It fits better with the new `plugins` option. Signed-off-by: Patrick East <east.patrick@gmail.com>
Signed-off-by: Patrick East <east.patrick@gmail.com>
09d42dc
to
a9f41c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR adds in a concept of plugin's being able to report their own status to the plugin manager. There is then plumbing added for status update listeners, which the status plugin does. The OPA server has been updated to use the plugin statuses for the bundle and discovery plugin to more correctly report on
/health?bundle=true
for readiness probes (it used to not include discovery at all). In addition the server has a new option for/health?plugins
that will return a status based on all plugins registered (and reporting status). This gives plugin developers a way to influence the result of the health check if a plugin goes into an error state or needs time to initialize.Its broken up into smaller bite-sized commits, each with a commit message explaining more about what changed.
Fixes: #2010
Fixes: #2015