Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add plugin status plumbing and update server to use it #2041

Merged
merged 8 commits into from
Feb 7, 2020
55 changes: 37 additions & 18 deletions docs/content/extensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,15 @@ You can register your factory with OPA by calling
[github.com/open-policy-agent/opa/runtime#RegisterPlugin](https://godoc.org/github.com/open-policy-agent/opa/runtime#RegisterPlugin)
inside your main function.

### Plugin Status
The plugin may (optionally) report its current status to the plugin Manager via the `plugins.Manager#UpdatePluginStatus`
API.

> If no status is provided the plugin is assumed to be working OK.

Typically the plugin should report `StatusNotReady` at creation time and update to `StatusOK` (or `StatusErr`) when
appropriate.

### Putting It Together

The example below shows how you can implement a custom [Decision Logger](../management/#decision-logs)
Expand All @@ -196,39 +205,45 @@ import (
"github.com/open-policy-agent/opa/plugins/logs"
)

const PluginName = "println_decision_logger"

type Config struct {
Stderr bool `json:"stderr"` // false => stdout, true => stderr
}

type PrintlnLogger struct {
mtx sync.Mutex
config Config
manager *plugins.Manager
mtx sync.Mutex
config Config
}

func (p *PrintlnLogger) Start(ctx context.Context) error {
// No-op.
p.manager.UpdatePluginStatus(PluginName, &plugins.Status{State: plugins.StateOK})
return nil
}

func (p *PrintlnLogger) Stop(ctx context.Context) {
// No-op.
p.manager.UpdatePluginStatus(PluginName, &plugins.Status{State: plugins.StateNotReady})
}

func (p *PrintlnLogger) Reconfigure(ctx context.Context, config interface{}) {
p.mtx.Lock()
defer p.mtx.Unlock()
p.config = config.(Config)
p.mtx.Lock()
defer p.mtx.Unlock()
p.config = config.(Config)
}

func (p *PrintlnLogger) Log(ctx context.Context, event logs.EventV1) error {
p.mtx.Lock()
defer p.mtx.Unlock()
w := os.Stdout
if p.config.Stderr {
w = os.Stderr
}
fmt.Fprintln(w, event) // ignoring errors!
return nil
p.mtx.Lock()
defer p.mtx.Unlock()
w := os.Stdout
if p.config.Stderr {
w = os.Stderr
}
_, err := fmt.Fprintln(w, event)
if err != nil {
p.manager.UpdatePluginStatus(PluginName, &plugins.Status{State: plugins.StateErr})
}
return nil
}
```

Expand All @@ -242,9 +257,13 @@ import (

type Factory struct{}

func (Factory) New(_ *plugins.Manager, config interface{}) plugins.Plugin {
func (Factory) New(m *plugins.Manager, config interface{}) plugins.Plugin {

m.UpdatePluginStatus(PluginName, &plugins.Status{State: plugins.StateNotReady})

return &PrintlnLogger{
config: config.(Config),
manager: m,
config: config.(Config),
}
}

Expand All @@ -264,7 +283,7 @@ import (
)

func main() {
runtime.RegisterPlugin("println_decision_logger", Factory{})
runtime.RegisterPlugin(PluginName, Factory{})

if err := cmd.RootCommand.Execute(); err != nil {
fmt.Println(err)
Expand Down
2 changes: 1 addition & 1 deletion docs/content/kubernetes-tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ spec:
name: opa-server
readinessProbe:
httpGet:
path: /health
path: /health?plugins&bundle
scheme: HTTPS
port: 443
initialDelaySeconds: 3
Expand Down
52 changes: 34 additions & 18 deletions docs/content/management.md
Original file line number Diff line number Diff line change
Expand Up @@ -430,9 +430,12 @@ OPA can periodically report status updates to remote HTTP servers. The
updates contain status information for OPA itself as well as the
[Bundles](#bundles) that have been downloaded and activated.

OPA sends status reports whenever bundles are downloaded and activated. If
the bundle download or activation fails for any reason, the status update
will include error information describing the failure.
OPA sends status reports whenever one of the following happens:

* Bundles are downloaded and activated -- If the bundle download or activation fails for any reason, the status update
will include error information describing the failure. This includes Discovery bundles.
* A plugin state has changed -- All plugin status is reported, and an update to any plugin will
trigger a Status API report which contains the latest state.

The status updates will include a set of labels that uniquely identify the
OPA instance. OPA automatically includes an `id` value in the label set that
Expand All @@ -457,23 +460,34 @@ on the agent, updates will be sent to `/status`.

```json
{
"labels": {
"app": "my-example-app",
"id": "1780d507-aea2-45cc-ae50-fa153c8e4a5a",
"version": "{{< current_version >}}"
"labels": {
"app": "my-example-app",
"id": "1780d507-aea2-45cc-ae50-fa153c8e4a5a",
"version": "{{< current_version >}}"
},
"bundles": {
"http/example/authz": {
"active_revision": "ABC",
"last_successful_download": "2018-01-01T00:00:00.000Z",
"last_successful_activation": "2018-01-01T00:00:00.000Z",
"metrics": {
"timer_rego_data_parse_ns": 12345,
"timer_rego_module_compile_ns": 12345,
"timer_rego_module_parse_ns": 12345
}
}
},
"plugins": {
"bundle": {
"state": "OK"
},
"bundles": {
"http/example/authz": {
"active_revision": "TODO",
"last_successful_download": "2018-01-01T00:00:00.000Z",
"last_successful_activation": "2018-01-01T00:00:00.000Z",
"metrics": {
"timer_rego_data_parse_ns": 12345,
"timer_rego_module_compile_ns": 12345,
"timer_rego_module_parse_ns": 12345
}
}
"discovery": {
"state": "OK"
},
"status": {
"state": "OK"
}
},
"metrics": {
"prometheus": {
"go_gc_duration_seconds": {
Expand Down Expand Up @@ -609,6 +623,8 @@ Status updates contain the following fields:
| `discovery.active_revision` | `string` | Opaque revision identifier of the last successful discovery activation. |
| `discovery.last_successful_download` | `string` | RFC3339 timestamp of last successful discovery bundle download. |
| `discovery.last_successful_activation` | `string` | RFC3339 timestamp of last successful discovery bundle activation. |
| `plugins` | `object` | A set of objects describing the state of configured plugins in OPA's runtime. |
| `plugins[_].state | `string` | The state of each plugin. |
| `metrics.prometheus` | `object` | Global performance metrics for the OPA instance. |

If the bundle download or activation failed, the status update will contain
Expand Down
5 changes: 5 additions & 0 deletions docs/content/monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,8 @@ scrape_configs:

OPA exposes a `/health` API endpoint that can be used to perform health checks.
See [Health API](../rest-api#health-api) for details.

### Status API

OPA provides a plugin which can push status to a remote service.
See [Status API](../management#status) for details.
22 changes: 15 additions & 7 deletions docs/content/rest-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -2045,15 +2045,18 @@ that the server is operational. Optionally it can account for bundle activation
(useful for "ready" checks at startup).

#### Query Parameters
`bundle` - Boolean parameter to account for bundle activation status in response.
`bundles` - Boolean parameter to account for bundle activation status in response. This includes
any discovery bundles or bundles defined in the loaded discovery configuration.
`plugins` - Boolean parameter to account for plugin status in response.

#### Status Codes
- **200** - OPA service is healthy. If `bundle=true` then all configured bundles have
been activated.
- **500** - OPA service is not healthy. If `bundle=true` this can mean any of the configured
bundles have not yet been activated.
- **200** - OPA service is healthy. If the `bundles` option is specified then all configured bundles have
been activated. If the `plugins` option is specified then all plugins are in an OK state.
- **500** - OPA service is not healthy. If the `bundles` option is specified this can mean any of the configured
bundles have not yet been activated. If the `plugins` option is specified then at least one
plugin is in a non-OK state.

> *Note*: The bundle activation check is only for initial startup. Subsequent downloads
> *Note*: The bundle activation check is only for initial bundle activation. Subsequent downloads
will not affect the health check. The [Status](../management/#status)
API should be used for more fine-grained bundle status monitoring.

Expand All @@ -2064,7 +2067,12 @@ GET /health HTTP/1.1

#### Example Request (bundle activation)
```http
GET /health?bundle=true HTTP/1.1
GET /health?bundles HTTP/1.1
```

#### Example Request (plugin status)
```http
GET /health?plugins HTTP/1.1
```

#### Healthy Response
Expand Down
33 changes: 32 additions & 1 deletion plugins/bundle/plugin.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import (
"fmt"
"reflect"
"sync"
"time"

"github.com/sirupsen/logrus"

Expand All @@ -36,6 +37,7 @@ type Plugin struct {
mtx sync.Mutex
cfgMtx sync.Mutex
legacyConfig bool
ready bool
}

// New returns a new Plugin with the given config.
Expand All @@ -53,8 +55,10 @@ func New(parsedConfig *Config, manager *plugins.Manager) *Plugin {
status: initialStatus,
downloaders: make(map[string]*download.Downloader),
etags: make(map[string]string),
ready: false,
}
p.initDownloaders()

manager.UpdatePluginStatus(Name, &plugins.Status{State: plugins.StateNotReady})
return p
}

Expand All @@ -75,6 +79,7 @@ func Lookup(manager *plugins.Manager) *Plugin {
func (p *Plugin) Start(ctx context.Context) error {
p.mtx.Lock()
defer p.mtx.Unlock()
p.initDownloaders()
for name, dl := range p.downloaders {
p.logInfo(name, "Starting bundle downloader.")
dl.Start(ctx)
Expand Down Expand Up @@ -160,6 +165,8 @@ func (p *Plugin) Reconfigure(ctx context.Context, config interface{}) {
panic(errors.New("Unable deactivate bundle: " + err.Error()))
}

readyNow := p.ready

for name, source := range p.config.Bundles {
_, updated := updatedBundles[name]
_, isNew := newBundles[name]
Expand All @@ -173,8 +180,15 @@ func (p *Plugin) Reconfigure(ctx context.Context, config interface{}) {
}
p.downloaders[name] = p.newDownloader(name, source)
p.downloaders[name].Start(ctx)
readyNow = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting change. The most pressing issue is that readiness checks in K8s don't take into account discovery. This PR fixes that however if bundles are added/updated and fail to download within a certain amount of time, the liveness probe will fail and the OPA pod will get whacked. Maybe that's the right behaviour. Do we document this somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, and I think that is probably the right behavior for some use-cases.. At that point in time the OPA is not in a state that it should be answering queries since its missing bundles, maybe some applications it doesn't matter, some it does.

Specific to the OPA HTTP server reporting a 500 on /health we document that any non-activated and configured bundle will cause the health check to fail. Maybe what i'll do is revert the change to the example deployment to omit the ?plugin option on the liveness probe to avoid any accidental confusion on this. People should probably opt in to using it versus just copying the default and having this more aggressive behavior.

}
}

if !readyNow {
p.ready = false
p.manager.UpdatePluginStatus(Name, &plugins.Status{State: plugins.StateNotReady})
}

}

// Register a listener to receive status updates. The name must be comparable.
Expand Down Expand Up @@ -298,6 +312,23 @@ func (p *Plugin) process(ctx context.Context, name string, u download.Update) {
p.logInfo(name, "Bundle downloaded and activated successfully.")
}
p.etags[name] = u.ETag

// If the plugin wasn't ready yet then check if we are now after activating this bundle.
if !p.ready {
readyNow := true // optimistically
for _, status := range p.status {
if len(status.Errors) > 0 || (status.LastSuccessfulActivation == time.Time{}) {
readyNow = false // Not ready yet, check again on next bundle activation.
break
}
}

if readyNow {
p.ready = true
p.manager.UpdatePluginStatus(Name, &plugins.Status{State: plugins.StateOK})
}

}
return
}

Expand Down
Loading