Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending OPA's runtime/server #2723

Closed
gshively11 opened this issue Sep 24, 2020 · 8 comments
Closed

Extending OPA's runtime/server #2723

gshively11 opened this issue Sep 24, 2020 · 8 comments
Labels

Comments

@gshively11
Copy link
Contributor

Hello OPA community! We are starting a large project to build an authorization platform using OPA, similar to the approach Netflix took. We're new to golang and just starting to dig into the OPA code base, so we wanted to get your thoughts and opinions on our initial approach, as we're likely to be doing something unwise. Assuming there is some merit to our initial approach, there are likely to be a number of feature requests that we can create/contribute, for which we'll spin up separate issues.

Using OPA in server mode, we want to accomplish the following custom behaviors:

  • Authenticate JWTs that require a signing key to be retrieved from an external API, based on a key ID in the JWT.
  • Inject data into the request input sent to OPA, before any policy evaluation starts.
    • e.g. the identity from the aforementioned JWT authentication, as well as any other data we deem necessary for policy evaluation, likely to be retrieved from external APIs and cached.
  • Control/modify the structure of all logs generated by OPA to fit our desired schema.
  • Hide/disable many of the OPA server routes/features (require HTTPS, only allow requests to evaluate named policy, etc.)
    • We want to restrict the CLI interface that OPA exposes as much as possible, tailored to our specific use case.
  • Add a route that can do named policy evaluation in bulk
  • Return "obligations" from rego policy evaluation (similar to the xacml concept), so that apps can prompt end users to perform actions to get access (e.g. re-authenticate).

After pouring over OPA's runtime/server code the last few days, this is the approach we're taking initially:

  • Create a server package that wraps OPA's server package, wherein we create a mux router with custom middleware and routes, which we pass to OPA's server.
    • One middleware is responsible for extracting the JWT from the headers, verifying the signature, and then creating an identity which we set on the request context.
    • Another middleware is responsible for reading and unmarshalling the request body, injecting the identity information into input, and then re-marshalling it and setting it to request.Body.
  • Create our own runtime package, copying a lot of OPA's runtime package, but using our own server package instead of OPA's.
  • Create our own limited CLI interface that sets up our custom runtime package.
  • Use bundles to load policy, initially from S3 buckets, eventually from a custom API.
  • Push metrics to a custom API. We can't use prometheus and its pull pattern due to our deployment model.

Related thoughts:

  • We don't like having to copy so much code directly from OPA to do these extensions. Is there a better approach we're not seeing?
  • We considered using io.jwt.decode and http.send in rego to handle our jwt authentication. This doesn't seem feasible though, as our API for key retrieval doesn't implement any of the necessary caching headers.
  • We have concerns about http.send in general, in terms of retry, observability, complex policy, credentials, etc., so we'd prefer to have the option to modify input in golang before it reaches rego.
  • Our approach to modify the request body input is terribly inefficient, we'd love an alternative. We don't want to copy/rewrite OPA's entire server package.
  • We don't want to maintain a long-lived fork of OPA, but we are very interested in contributing back to OPA.

Any feedback/advice you can give would be greatly appreciated. Apologies for the wall of text and thank you for your time!

@gshively11
Copy link
Contributor Author

I'm just now realizing I missed the Extending OPA doc, my bad. After reviewing that, it seems like a few of our needs could be met with custom built-in functions (custom jwt authn, enriching input). Having more control over the cli/runtime/server and avoiding larger policies is still something I think we need, but I'm going to spend a little time experimenting with custom built-ins to see where that gets me.

@tsandall
Copy link
Member

First of all, thanks for filing a well-written, detailed issue! I'll reply to a few of your comments/questions that I think are important and then leave some thoughts at the end.

Authenticate JWTs that require a signing key to be retrieved from an external API, based on a key ID in the JWT.

You can accomplish this either with http.send() or by compiling your own custom built-in functions into OPA. You mentioned below that http.send() would not work because the service doesn't set caching headers. How will you handle cache invalidation? The approach that OPA implements for http.send() is fairly standard and should work well with endpoints that serve JWKS or the like. If you absolutely cannot modify the service to conform to HTTP caching standards, the custom built-in route would be the way to go.

I don't see why io.jwt.decode and the related verification functions (e.g,. io.jwt.verify_hs256, etc.) cannot be used if you have a custom built-in function fetching keys. Can you elaborate on why io.jwt.decode (and the related suite of verification functions) is insufficient?

Inject data into the request input sent to OPA, before any policy evaluation starts.

Is there a reason the input data sent to OPA has to be mutated? Normally, the input data sent to OPA would contain a JWT token and the policy would implement rules that verify the token and then expose the claims inside the token for the rest of the policy. This avoids hardcoding logic into the server that can otherwise be specified in the policy itself.

Control/modify the structure of all logs generated by OPA to fit our desired schema.

Can you elaborate a little bit on what you're looking to do here? We've been thinking about hiding the low-level access logs for a little while now. If we made that change, the only logs that would remain would be primarily for integration purposes (e.g., decision logs, status logs, etc.) We could look at providing more fine-grained config to control logging levels (currently it's one-size-fits-all).

Hide/disable many of the OPA server routes/features (require HTTPS, only allow requests to evaluate named policy, etc.)

The listener type and address is configuration given to OPA on startup. What routes are accessible can be controlled via authorization.

Add a route that can do named policy evaluation in bulk

If you wanted to return multiple policy decisions in a single query you could write a rule that produces those decisions in a single JSON document.

Return "obligations" from rego policy evaluation (similar to the xacml concept), so that apps can prompt end users to perform actions to get access (e.g. re-authenticate).

The answer I'd give here is similar to the last point--rules can generate non-boolean values (e..g, maps, lists, etc.) that represent concepts like obligations. Those can be composed and returned just like any other value generated by your rules.

After pouring over OPA's runtime/server code the last few days, this is the approach we're taking initially: [...]

Given your requirements to customize the CLI, listeners, routes, logging, metrics, etc. I'm wondering what value you're going to receive from deploying the OPA runtime as-is. The runtime exposes the OPA API and has a few opinionated choices re: configuration, metrics, logging, etc. If those do not work in your environment you can build your own runtime around the OPA components.

Here's an example that embeds OPA as a library but goes beyond just using the rego package: https://github.com/open-policy-agent/example-api-authz-go. I think that might be useful if you want to go down the path you outlined.

We don't want to maintain a long-lived fork of OPA, but we are very interested in contributing back to OPA.

We're happy to work with folks that want to contribute back. Ideally, IMO, the different components that make up OPA can be reused outside the OPA runtime for custom use cases like this. I recommend looking at that example above to see if it would suit your needs.

@gshively11
Copy link
Contributor Author

gshively11 commented Sep 24, 2020

Hey, thanks for the quick reply!

If you absolutely cannot modify the service to conform to HTTP caching standards, the custom built-in route would be the way to go.

Long term we can probably get the service modified, but for now, I agree that custom built-ins seem to be the right way to go.

Can you elaborate on why io.jwt.decode (and the related suite of verification functions) is insufficient?

I probably should have been clearer. io.jwt.decode actually does work well to decode the token, we just couldn't do the proper verification with http.send, so I had ruled out both. With a custom function to verify, we could do io.jwt.decode, although I think we'll probably just do the decoding in the custom function and return a more simplified object to use in policy. I actually just got this working in a sample project (we already had most of the code written to handle it in golang).

Is there a reason the input data sent to OPA has to be mutated?

Not necessarily, although our thought process was that teams should have to write as little in rego policy as possible. So instead of adding something like

subject = myorg.authn(input.token)
allow {
  subject.id == 123
}

teams could expect input.subject to automatically be populated for them if they sent a valid jwt, and they could just write

allow {
  input.subject.id == 123
}

The same thing would apply for additional metadata we needed to enrich from other external sources. Although when I wrote my original message, I didn't realize we could define built-in functions with our own custom go code, so I'm a little less concerned about the rego function approach now. We still end up requiring additional lines in policy to grab additional data, but we can at least have total control over how that data is fetched/cached/etc.

Can you elaborate a little bit on what you're looking to do here? We've been thinking about hiding the low-level access logs for a little while now.

After looking a little closer at the code, it seems like the only place that logrus is used is in the runtime or in some of the plugins. I thought that was also used in the server, but it looks like that's just the decision logger, which we can already customize I think. So this might be a non-issue, let's drop it for now.

If you wanted to return multiple policy decisions in a single query you could write a rule that produces those decisions in a single JSON document.

Part of the problem might be that I don't totally understand what you mean by this 😄 . I guess I need to go review the docs a bit more. But basically, we want to be able to do something like this:

POST /v1/data/myorg/myteam/somenamedpolicy   // evaluate a single named policy

POST /v1/bulkdata/myorg/myteam/*       // evaluate all policies that are owned by myteam and return the result

POST /v1/bulkdata                      // evaluate the policies mentioned in the body                     
{"input": {}, "policies": ["myorg.myteam.policy1", "myorg.myteam.policy2"]}

We may explore additional policy targeting/grouping behaviors, like adding tags to policies or something.

The answer I'd give here is similar to the last point

As mentioned above, I'll go review the docs, because I'm definitely missing a piece of the puzzle

If those do not work in your environment you can build your own runtime around the OPA components.

That's basically the approach we started out with I think, although now that I better understand how built-ins work, I'm going to take a step back and try the built-in approach to keep the opa runtime intact. The more we can use OPA directly, the better!

Thanks for your time, your thorough reply was most appreciated. I think I have enough info to work for a while. Please feel free to close this issue for now.

@gshively11
Copy link
Contributor Author

Hello again. After a few days of experimentation, I've gained a much better understanding of how rego works. I have a custom built-in function working with my policy and overall I'm really enjoying the opa/rego experience. I am stuck on one thing though. Not sure if you'd prefer this in a separate issue, but I'll start here for now to avoid clutter.

When I attempt to build a bundle with my compiled version of OPA, I get the following error:

error: 1 error occurred: <redacted>:11: rego_type_error: undefined function <my_builtin_func_name>

I don't get this error when I simply use opa run on this same policy, it's able to detect and run my custom builtin just fine. Originally I thought maybe I needed to use a custom capabilities file, but I used the following snippet in my main func to verify that OPA has loaded my custom built in when running opa build, so I don't think that's it.

for _, b := range ast.CapabilitiesForThisVersion().Builtins {
		fmt.Println(b.Name)
	}

I'm still digging in, but hopefully you can point me in the right direction so I don't have to bang my head against the wall for too much longer.

@patrick-east
Copy link
Contributor

Looks like a bug, I see the same behavior with the example one on https://www.openpolicyagent.org/docs/latest/extensions/#adding-built-in-functions-to-the-opa-runtime

{11:41} /t/myopa ❯ cat ./bundle/main.rego
package foo

main {
    github.repo("patrick-east", "opa")
}
{11:41} /t/myopa ❯ go run . eval -b ./bundle -f pretty 'data.foo.main'
true
{11:41} /t/myopa ❯ go run . build -b ./bundle
error: 1 error occurred: bundle/main.rego:4: rego_type_error: undefined function github.repo
exit status 1

In the meantime, to get unblocked, you can make them the old fashioned way with tar -czf bundle.tar.gz ...

@gshively11
Copy link
Contributor Author

Ah, nice find. Did you want me to create a separate issue for tracking? I can take a crack at trying to contribute a fix too, although it might take me a little while to hunt it down since I'm still unfamiliar with golang and the opa codebase.

@patrick-east
Copy link
Contributor

Did you want me to create a separate issue for tracking?

Thats probably best, will let us track the fix in the changelog/release notes more easily that way too.

@gshively11
Copy link
Contributor Author

All my questions have been answered at this point, closing this issue now, thanks again for the help! OPA is rad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

No branches or pull requests

3 participants