Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RKE2 support for IAM Roles for Service Accounts #2269

Closed
kdalporto opened this issue Dec 15, 2021 · 13 comments
Closed

RKE2 support for IAM Roles for Service Accounts #2269

kdalporto opened this issue Dec 15, 2021 · 13 comments

Comments

@kdalporto
Copy link

What happened:
I'm trying to implement IRSA for a Self-hosted RKE2 kubernetes environment following the documentation here: SELF_HOSTED_SETUP.

I've configured everything, however when I deploy a new awscli pod for testing I see the errors below.

kube-apiserver logs

W1214 23:48:33.374702 1 dispatcher.go:170] Failed calling webhook, failing open pod-identity-webhook.amazonaws.com: failed calling webhook "pod-identity-webhook.amazonaws.com": Post "https://pod-identity-webhook.kube-system.svc:443/mutate?timeout=30s": x509: certificate signed by unknown authority
E1214 23:48:33.375314 1 dispatcher.go:171] failed calling webhook "pod-identity-webhook.amazonaws.com": Post "https://pod-identity-webhook.kube-system.svc:443/mutate?timeout=30s": x509: certificate signed by unknown authority

pod-identity-webhook logs

W1214 23:47:26.502020 1 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I1214 23:47:26.512708 1 store.go:63] Fetched secret: kube-system/pod-identity-webhook
I1214 23:47:26.512961 1 main.go:195] Creating server
I1214 23:47:26.513130 1 main.go:215] Listening on :9999 for metrics and healthz
I1214 23:47:26.513242 1 main.go:209] Listening on :443
2021/12/14 23:48:33 http: TLS handshake error from 10.42.0.0:47782: remote error: tls: bad certificate

I've attached the kube-apiserver.yaml file with specific changes needed for IRSA. Is it possible that RKE2 is configured to not accept certificates created like this?

What you expected to happen:
AWS env variables to be injected into the awscli pod on creation.

How to reproduce it (as minimally and precisely as possible):

Stand up an RKE2 Cluster (or using the terraform method here) and follow the SELF_HOSTED_SETUP.

Anything else we need to know?:
The deployment files here were modified to deploy into the kube-system namespace instead of the default one. Same issue occurs in the default namespace.

Environment:

  • AWS Region: us-gov-west-1
  • EKS Platform version (if using EKS, run aws eks describe-cluster --name <name> --query cluster.platformVersion): N/A
  • Kubernetes version (if using EKS, run aws eks describe-cluster --name <name> --query cluster.version): v1.19.7+rke2r1"
  • Webhook Version: v0.2.0
@kdalporto kdalporto changed the title RKE2 support for IAM Roles for Service Accounts? RKE2 support for IAM Roles for Service Accounts Dec 15, 2021
@brandond
Copy link
Member

E1214 23:48:33.375314 1 dispatcher.go:171] failed calling webhook "pod-identity-webhook.amazonaws.com": Post "https://pod-identity-webhook.kube-system.svc:443/mutate?timeout=30s": x509: certificate signed by unknown authority

The webhook's certificate is not trusted by the apiserver. In the steps at https://github.com/aws/amazon-eks-pod-identity-webhook#in-cluster what process did you use to

  • Approve the CSR that the deployment created for its TLS serving certificate

This should have been the step that ensured that the webhook service has a certificate that's trusted by the rest of the cluster.

@brandond
Copy link
Member

brandond commented Dec 15, 2021

Also, the instructions at https://github.com/aws/amazon-eks-pod-identity-webhook/blob/master/SELF_HOSTED_SETUP.md are out of date and need to be updated to track with recent releases of Kubernetes. The apiserver itself now hosts its own OIDC discovery endpoint, so the steps that see you create your own document and store it in an S3 bucket are not necessary and will in fact not work. The cluster uses the built-in OIDC discovery endpoint: kubernetes/enhancements#1393

@kdalporto
Copy link
Author

kdalporto commented Dec 15, 2021

@brandond Thanks for the information, the CSR gets approved by the Makefile after it runs, but as you indicated since the documentation is out of date, I will look into reconfiguring it to work with the apiserver OIDC.

@kdalporto
Copy link
Author

Is using the OIDC endpoint on the apiserver common practice for IRSA? I've really only found the S3 OIDC documentation when not hosting on EKS.

@brandond
Copy link
Member

brandond commented Dec 15, 2021

I'd never seen this referred to as IRSA so I had to go look that up lol. I have no idea if it's common practice or not, but it should work. I know that the current in-cluster OIDC stuff does work, at least for other components.

@stale
Copy link

stale bot commented Jun 18, 2022

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Jun 18, 2022
@Lillecarl
Copy link

Still relevant

@stale stale bot removed the status/stale label Jun 20, 2022
@brandond
Copy link
Member

brandond commented Jun 21, 2022

@Lillecarl can you comment on what you feel is still needed here?@kdalporto didn't come back with any additional questions so I'm assuming he was able to work around the out-of-date upstream documentation and get it working. I don't think anything needs to be fixed on the rke2 side so I would prefer to let stalebot close this out instead of keeping it alive without any actual action needed on our part.

@Lillecarl
Copy link

@brandond I've tried following the guides and blogs. But when I did the cluster wouldn't come up again. I'm guessing there's something i did wrong on my end.
I feel like the issue mostly relates to RKE2. The AWS docs, while outdated are easy to follow. But since RKE2 already configures the serviceaccount|api-audience arguments, can I use the configured certificates somehow?

Sadly I can't expose my k8s on the internet and has to go through the S3 process still, can I just scrape the openid endpoint from RKE2, push it to S3 and tell IAM/STS to trust the S3 thumbprint? (Might be partially out of scope, but information here is very sparse).

@bootc
Copy link

bootc commented Aug 3, 2022

I got this working yesterday. With recent versions of Kubernetes/RKE2 it's not too bad, and the kube-apiserver can indeed now host its own metadata which simplifies things if you can expose that part of your cluster publicly.

First I had to add this to my /etc/rancher/rke2/config.yaml:

kube-apiserver-arg:
- anonymous-auth=true
- api-audiences=https://foo.example.com,https://kubernetes.default.svc.cluster.local,rke2
- service-account-issuer=https://foo.example.com
- service-account-jwks-uri=https://foo.example.com/openid/v1/jwks

In the above, foo.example.com is actually an Ingress that exposes the relevant URIs from the kube-apiserver (see below). You need to set anonymous-auth=true because those endpoints have to be accessible without auth. I applied the following to my cluster to expose the endpoints:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: service-account-issuer-discovery
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:service-account-issuer-discovery
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:authenticated
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:unauthenticated

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: sa-iss-disc
  namespace: my-system
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/backend-protocol: HTTPS
spec:
  ingressClassName: nginx
  rules:
  - host: foo.example.com
    http:
      paths:
      - path: /.well-known/openid-configuration
        pathType: Exact
        backend:
          service:
            name: kubernetes-api
            port:
              name: https
      - path: /openid/v1/jwks
        pathType: Exact
        backend:
          service:
            name: kubernetes-api
            port:
              name: https
  tls:
  - hosts:
    - foo.example.com
    secretName: sa-iss-disc-tls

---
kind: Service
apiVersion: v1
metadata:
  name: kubernetes-api
  namespace: my-system
spec:
  ports:
  - name: https
    protocol: TCP
    port: 443
    targetPort: 443
  type: ExternalName
  externalName: kubernetes.default.svc.cluster.local

You can probably skip the Service if you do this in the default namespace and use the pre-existing kubernetes Service to access kube-apiserver. The ClusterRoleBinding is required to make the OIDC discovery endpoints visible.

You then need to create the IAM OIDC Provider and deploy the Pod Identity Webhook as per the usual instructions. With that done it all worked for me first time, though admittedly I already knew what I was doing with IRSA on EKS.

@Lillecarl
Copy link

@bootc could you expand on the Pod Identity Webook as per the usual instructions. (Even though it's out of scope here)

Did you use /var/lib/rancher/rke2/server/tls/service.key with cert-manager?

I've previously just enabled "IRSA" in Terraform and the documentation isn't great.

@bootc
Copy link

bootc commented Aug 10, 2022

@bootc could you expand on the Pod Identity Webook as per the usual instructions. (Even though it's out of scope here)

Basically deploy https://github.com/aws/amazon-eks-pod-identity-webhook/tree/master/deploy into your cluster. The only change you need to make is to fill in the correct IMAGE in deployment-base.yaml. This is what mounts the special service account token into your pods when you set the annotation on the ServiceAccount.

Did you use /var/lib/rancher/rke2/server/tls/service.key with cert-manager?

No, in my case I get cert-manager to issue a certificate from Let's Encrypt and put the relevant hash into the OIDC Provider. Basically A053375BFE84E8B748782C7CEE15827A6AF5A405 is the thumbprint for the R3 intermediate certificate.

I've previously just enabled "IRSA" in Terraform and the documentation isn't great.

https://github.com/aws/amazon-eks-pod-identity-webhook/blob/master/SELF_HOSTED_SETUP.md is the documentation I meant, except most of that focuses on setting up the kube-apiserver correctly and generating/publishing the keys which is now really easy.

@stale
Copy link

stale bot commented Feb 11, 2023

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Feb 11, 2023
@stale stale bot closed this as completed Mar 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants