Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support OpenID Connect authentication and authorization with OpenPolicyAgent #610

Closed
jlpettersson opened this issue Jun 15, 2020 · 19 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@jlpettersson
Copy link
Member

Expected Behavior

That I can use OpenID Connect authenticated request sent to the Trigger. And that I can check authorization using the OpenPolicyAgent.

Actual Behavior

Only SSH-keypair or a predefined password/credential/Secret can be used as authentication and authorization.

Additional Info

Authentication

Common types of authentication:

  • SSH key pair (typically generated on developers computer and then distributed to both GitRepoProvider and TektonTrigger)
  • Preconfigured password/credential/secret (typical generated on GitRepoProvider and then distributed to TektonTrigger) e.g. Personal Token?
  • OpenID Connect / JWT-token (no key distribution needed, but only configure authorization). This is the typical web-app authentication, and also newer Kubernetes ServiceAccount format.

Examples of OpenID Connect authentication is Azure Active Directory, Sign in with Google / Google Accounts, Sign in with Apple, Keycloak, RedHat Single Sign-on. E.g. typically requests from web-apps, e.g. a dashboard.

Also Kubernetes Bounded Service Account - newer, rotated tokens is on OpenID Connect format with discovery endpoint coming in 1.18.

E.g. an upcoming, not yet existing GitRepoProvider would probably support modern OIDC serviceAccounts. E.g. It can be a provider that runs on Kubernetes (e.g. GitLab) and use the new Kubernetes ServiceAccounts for this.

Use cases

  • the "potentially upcoming GitRepoProvider" above is a use case
  • Having a Build-Pipeline in Cluster X, that trigger a Deploy-Pipeline in Cluster Y (e.g. deploying same image to clusters in multiple regions). This can be done with a curl-Task, or CloudEvent-Task, using Kubernetes ServiceAccount.
  • Requests from an web-app, e.g. a dashboard

Authorization

OpenPolicyAgent is an increasingly popular tool to "externalize authorization" - e.g. for everything within an organization, e.g. Http API authZ, Kafka and other systems with a single authorization language.

This can be used for validating JWT tokens (e.g. OpenID Connect) and also do authorization e.g. if this "API-key" or JWT-token has access to this pipeline.

Final note

On a final note, I think the above can be solved by using Istio, that "inject" an opa istio sidecar. But I would like to be able to use this also without Istio, e.g. for triggering things with CloudEvents that is authenticated with a Kubernetes ServiceAccount.

Example Curl Task

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: curl
spec:
  params:
  - name: url
    type: string
  steps:
  - name: curl
    image: curlimages/curl
    script: "cat /var/run/secrets/tokens/token | xargs -I {} curl -X POST -H \"Authorization: Bearer {}\" $(params.url)"
    volumeMounts:
    - mountPath: /var/run/secrets/tokens
      name: token-volume

and TaskRun

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  generateName: curl-request-
spec:
  taskRef:
    name: curl
  params:
  - name: url
    value: http://tekton-trigger.my-namespace
  podTemplate:
    securityContext:
      supplementalGroups: [ 65534 ]
      fsGroup: 65534
    volumes:
    - name: token-volume
      projected:
        sources:
        - serviceAccountToken:
            path: token
            expirationSeconds: 7200
            audience: tekton-trigger

Example Deployment OPA sidecar

OPA can be used as a proxy in-front of a service, e.g. using Istio. Or as a sidecar that the Trigger asks (http-request) if the "event" is authorized. In a Deployment

    spec:
      containers:
      - name: tekton-trigger
        image: tekton-trigger-image
        env:
          - name: OPA_ADDR
            value: http://localhost:8181
          - name: POLICY_PATH
            value: /v1/data/httpapi/authz
      - name: opa
        image: openpolicyagent/opa
        args:
          - run
          - --server
          - --ignore=.*
          - --log-format=json-pretty
          - --set=decision_logs.console=true
          - "/policies"
        volumeMounts:
        - name: policy-volume
          mountPath: /policies
          readOnly: true
      volumes:
      - name: policy-volume
        configMap:
          name: example-policy

Also a possible solution for e.g. #572

/kind feature

@tekton-robot tekton-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 15, 2020
@jlpettersson
Copy link
Member Author

The new PodTemplate field, may be a good place to add an optional sidecar, for e.g. OPA

The field is added in 52a4cb6

@dibyom
Copy link
Member

dibyom commented Jun 17, 2020

Thanks for the detailed write up. The high level requirement seems to be around that Triggers has no notion of authentication/authorization for incoming requests. A lot (most?) CI/CD use cases are for triggering off of Webhook messages where the standard seems to be around using a shared secret to do some validation -- though it seems for

  • I don't know what the GitRepoProvider is...do you have a link with more details on it?

  • The main use case for the authn/authz seems to be a human user manually triggering a Trigger (vs an automated webhook from somewhere)

  • The sidecar approach seems interesting. Is the idea that the listener that uses this Sidecar only processes these human/manual Triggers? (Otherwise, webhook calls may get blocked since they generally do not contain this kind of auth )

  • Can OPA be an Interceptor (maybe just a WebHook Interceptor) -- Triggers will pass on the request to the OPA service and proceed/not proceed based on what it returns? similar to side car idea just that it is not part of the same deployment

  • For the cases where we are using the identity of a k8s service account for the identity, would any of the work around SAR checks that @gabemontero is doing be related to this?

@jlpettersson
Copy link
Member Author

I don't know what the GitRepoProvider is...do you have a link with more details on it?

Sorry, this was a bit abstract description from my side. With GitRepoProvider I meant a provider of git repositories, e.g. GitHub, GitLab, Bitbucket Server, Bitbucket Cloud, Google Cloud Source Repositories, and similar... a provider of git repositories for customers.

The main use case for the authn/authz seems to be a human user manually triggering a Trigger (vs an automated webhook from somewhere)

The main use case for me currently is to have a BuildPipeline in cluster X that trigger a DeployPipeline in cluster Y. E.g. by sending a cloudEvent (or other http request) from the BuildPipeline to the trigger in cluster Y that then starts a run of DeployPipeline. But I am also interesting in requests from a webapp, as in the latter half of Manual approval and async PipelineRun design doc.

The first case is a way to link two pipelines - with a cloudEvent sent to a trigger to start the second. From the perspective of a Trigger, this could be a single use case - supporting JWT + OPA.

The sidecar approach seems interesting. Is the idea that the listener that uses this Sidecar only processes these human/manual Triggers? (Otherwise, webhook calls may get blocked since they generally do not contain this kind of auth )

Yes, that is my thoughts right know. Because the pre-shared secret and/or ssh-keypair is authorization in itself. But this can be changed/improved later if wanted.

Can OPA be an Interceptor (maybe just a WebHook Interceptor) -- Triggers will pass on the request to the OPA service and proceed/not proceed based on what it returns? similar to side car idea just that it is not part of the same deployment

Sounds reasonable, yes. OPA can be deployed as a DaemonSet as well. I don't know much about the interceptors, but as described here, it sounds reasonable.

For the cases where we are using the identity of a k8s service account for the identity, would any of the work around SAR checks that @gabemontero is doing be related to this?

Not so related to this use case. OPA is a generic policy engine, when Gabe reson about OPA it is in the role as an Admission hook. But what I want here is to use OPA for API authorization.

@jlpettersson
Copy link
Member Author

OPA use rego as the language for writing the logic of authorization. There is already a proposal for a rego-interceptor: #484

In a way, using OPA as a separate service or as a rego-library is similar, but I can see a point when an organization adopt OPA sidecars to "externalize" authorization, e.g. so that you get audit-logs in a separate container. Maybe the use of rego-interceptor or OPA-interceptor should be up to the user.

@gabemontero
Copy link
Contributor

For the cases where we are using the identity of a k8s service account for the identity, would any of the work around SAR checks that @gabemontero is doing be related to this?

Not so related to this use case. OPA is a generic policy engine, when Gabe reson about OPA it is in the role as an Admission hook. But what I want here is to use OPA for API authorization.

Yeah I would concur with @jlpettersson that my refs to OPA wrt access control and authorization are centered around k8s admission control, where k8s RBAC/SARs are the "baked in" k8s mechanism there.

From an OPA perspective, that means https://www.openpolicyagent.org/docs/v0.12.2/kubernetes-admission-control/

Ultimately for the piece I've been advocating for in the API WG and pipelines WG, we need to allow admins to plug in either OPA or k8s RBAC/SAR based implementations as they see fit. And we need to decide if we provide anything out of the box. And we need to decide what if any doc or additional role/rolebinding defs we add to make "who can read what" more explicit.

But OPA certainly overall can be used in a broader scoped than that.

I haven't followed all the links @jlpettersson posted above, but in comparison to the link I just pasted above, I believe he is also minimally talking about things like

OPA's "HTTP API Authorization" i.e. https://www.openpolicyagent.org/docs/v0.12.2/http-api-authorization/

OPA's "SSH and sudo Authorization" i.e. https://www.openpolicyagent.org/docs/v0.12.2/ssh-and-sudo-authorization/

as well as additional integrations.

And I think even beyond what @jlpettersson has noted, I've seen OPA referenced in discussions beyond authentication/authorization into general validation, provisioning, and verdict determination.

A broad foot print for sure.

I'm curious how many existing contributors have dipped their toes into this pool, and have experience wiring all this into k8s based offerings.

@dibyom
Copy link
Member

dibyom commented Jun 19, 2020

@jlpettersson Yeah, there is indeed an open feature request to add a built in rego interceptor like we currently have for CEL 😄 I was hoping though we could try something out even without that:

  1. Run OPA as a separate Deployment (or Daemonset). Expose it via a k8s service

  2. In your eventListener, use a webhookInterceptor to point to the OPA service.

  3. Based on the response from the OPA service, Trigger continues to process or stops.

This should also accomplish the goal of keeping the audit logs in the OPA container. I can think of two possible issues here:

  1. The current WebhookInterceptor contract requires that the service return back the original body. Not sure if OPA can be configured to do this.

  2. We might need Interceptor API: How to represent "OK, but don't continue"? #336

But aside for this, what do you think of this webhook interceptor vs the sidecar approach?

@jlpettersson
Copy link
Member Author

what do you think of this webhook interceptor vs the sidecar approach?

It sounds doable, yes. An alternative is better than no alternative.

@tekton-robot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 14, 2020
@tekton-robot
Copy link

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@afrittoli
Copy link
Member

/remove-lifecycle rotten

@tekton-robot tekton-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 17, 2020
@afrittoli afrittoli reopened this Aug 17, 2020
@tekton-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 15, 2020
@gabemontero
Copy link
Contributor

/remove-lifecycle stale

@tekton-robot tekton-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 15, 2020
@gabemontero
Copy link
Contributor

@jlpettersson Yeah, there is indeed an open feature request to add a built in rego interceptor like we currently have for CEL smile I was hoping though we could try something out even without that:

1. Run OPA as a separate Deployment (or Daemonset). Expose it via a k8s service

2. In your eventListener, use a [webhookInterceptor](https://github.com/tektoncd/triggers/blob/master/docs/eventlisteners.md#webhook-interceptors) to point to the OPA service.

3. Based on the response from the OPA service, Trigger continues to process or stops.

This should also accomplish the goal of keeping the audit logs in the OPA container. I can think of two possible issues here:

1. The current WebhookInterceptor contract requires that the service return back the original body. Not sure if OPA can be configured to do this.

2. We might need #336

But aside for this, what do you think of this webhook interceptor vs the sidecar approach?

Hey @dibyom ... as part of iterating on my securitiy/policy/rbac etc. TEP, I'm revisiting this issue and the various discussion of options.

At the moment, my opinion still is that your proposal from back in June is still the best and most "native" Tekton Triggers solution, in that it only requires OPA (vs. say Istio) and the webhook interceptor path if more of a "first class" approch from a Tekton Triggers perspective vs. the sidecar approach.

With that said, why I am reaching out again here, is that it occurred to me that your pluggable interceptor TEP https://github.com/tektoncd/community/blob/master/teps/0026-interceptor-plugins.md might help make a solution more seamless.

Any quick thoughts on that?

And I guess if the answer is "no", thoughts on iterating on the notion of facilitating this scenario in the context of the TEP and future work for implementing it?

thanks

@dibyom
Copy link
Member

dibyom commented Dec 2, 2020

With that said, why I am reaching out again here, is that it occurred to me that your pluggable interceptor TEP https://github.com/tektoncd/community/blob/master/teps/0026-interceptor-plugins.md might help make a solution more seamless.
Any quick thoughts on that?

Yeah, one issue with the previous webhook interceptors was it was difficult to distinguish between expected and unexpected errors (i.e. #336). Pluggable interceptors should solve this .

@tekton-robot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 2, 2021
@tekton-robot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 1, 2021
@tekton-robot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants