Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can the dynamic secrets auto renew as long as the pod is alive #213

Closed
tjjosep opened this issue Apr 28, 2023 · 8 comments
Closed

Can the dynamic secrets auto renew as long as the pod is alive #213

tjjosep opened this issue Apr 28, 2023 · 8 comments

Comments

@tjjosep
Copy link

tjjosep commented Apr 28, 2023

Thanks for the great work on this.

I see #90 and #151 is closed. I understood from #90 is that the vault token and dynamic secret leases such as dynamic database secrets with default-ttl and max-ttl will auto renew (with increments of default-ttl up to the max-ttl) its leases as long as the pod is alive. This will help to avoid lease revocation at the default-ttl instead of the max-ttl.

I do not want to synch the csi provided secrets to kubernetes secrets, instead I am mounting the SecretProviderClass object to the pod volume mount. And I am expecting dynamic secrets mounted to pod volume will be renewed by the csi-provider whenever the default ttl is reached.

I have tested with the latest version of the vault-csi-provider. But the leases for the database secrets are not renewing and when the pod restarts a new lease is provided.

@tomhjp
Copy link
Contributor

tomhjp commented Apr 28, 2023

If you're seeing new leases get created when the pod restarts, that's expected behaviour as the Kubernetes token used to authenticate to Vault is bound to both the service account and pod (including the UIDs). Apologies if the changelogs didn't make that clear enough - the docs site could probably do with some updating too to explain the ins and outs in one place.

However, if you're seeing new leases on the same pod before the TTL is up then that sounds like a misconfiguration between the provider and the Vault Agent sidecar. If that's the case, please could you share the deployed pod yaml and the SecretProviderClass spec?

@tjjosep
Copy link
Author

tjjosep commented May 1, 2023

Thanks for the quick response @tomhjp. We dont see new secret leases unless the POD is restarted.

The issue is that the secret lease is NOT renewing (revoked) while the pod alive. When the secret-lease reaches 1hr (3600s) default TTL, dynamic-secret (database) is getting revoked from vault due to the non-renewal.

The expected outcome was to renew secret lease (by the newly added vault agent injector) as long as the pod is alive. And the secret lease will be ultimately revoked due to one of the below scenarios.

  • Same pod is still alive and the lease was revoked when it reached 10hrs (36000s) Max TTL.
  • The pod was terminated and the lease was revoked due to non renewal.
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: ms-demo-ephemeral-pg-secret-csi
  name: ms-demo-ephemeral-pg-secret-csi
  namespace: my-kube-namespace
spec:
  selector:
    matchLabels:
      app: ms-demo-ephemeral-pg-secret-csi
  template:
    metadata:
      labels:
        app: ms-demo-ephemeral-pg-secret-csi
    spec:
      containers:
      - env:
        - name: KUBE_VOLUME_MOUNT_PATH_PG_DB_SECRET
          value: /var/run/secrets/my-service-account-csi-store/my-vault-namespace.database.creds.dynamic_write_role
        image: docker-dev.artifactory.com/ms-demo-ephemeral-pg-secret-csi
        name: ms-demo-ephemeral-pg-secret-csi
        volumeMounts:
        - mountPath: /var/run/secrets/my-service-account-csi-store
          name: my-service-account-csi-store
          readOnly: true
      serviceAccountName: my-service-account  
      volumes:
      - csi:
          driver: secrets-store.csi.k8s.io
          readOnly: true
          volumeAttributes:
            secretProviderClass: vault-csi-secretsync
        name: my-service-account-csi-store
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-service-account  
  namespace: my-kube-namespace

---
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: vault-csi-secretsync
  namespace: my-kube-namespace
spec:
  provider: vault
  parameters:
    vaultAddress: "https://vault-aws.my-company.com/"  
    vaultKubernetesMountPath: "kube/my-cluster"  
    roleName: "my-service-account"  
    objects: |

      # Get raw Vault response
      - objectName: "my-vault-namespace.database.creds.dynamic_write_role"
        secretPath: "my-vault-namespace/database/creds/dynamic_write_role"

Below is the Database Secret setup

vault write database/roles/dynamic_write_role \
    db_name=my-postgres-db \
    creation_statements=@pg_dynamic_user_creation.sql \
    revocation_statements=@pg_dynamic_user_revocation.sql \
    rollback_statements=@pg_dynamic_user_rollback.sql \
    renew_statements=@pg_dynamic_user_renew.sql \
    default_ttl=3600s \
    max_ttl=36000s

@tomhjp
Copy link
Contributor

tomhjp commented May 2, 2023

Thanks for adding those configs. I need to write up some documentation to explain this limitation, but the spec.parameters.vaultAddress value is what's tripping it up here. The helm chart configures the CSI provider to use a unix socket pointed at the Agent as its Vault address, but setting the CSI provider's -vault-addr flag or setting the SPC's vaultAddress parameter will configure it to reach out directly to Vault instead. That means the Agent isn't aware of the dynamic lease that needs renewing, so those Vault address options become a bit of a trip hazard.

I'd like to make it less of a trip hazard, but haven't come up with any plans yet. Let me know if you have thoughts. Perhaps at the very least the CSI provider can log a warning if it detects that an Agent sidecar is being bypassed.

@tjjosep
Copy link
Author

tjjosep commented May 5, 2023

That worked.

We found that if we route the csi traffic through agent, then whenever the default ttl is reached; it will renew the database secrets up to the Max-ttl. So the secret retrieved by the POD will remain valid as long as the PERIOD-TOKEN is enabled on the kube service accounts vault auth role.

So this proves that we can maintain a shorter ttl (e.g 10 mins) on the dynamic db secrets. But we must have the period token (can have shorter ttl such as 10 mins) need to be enabled on the auth token. We still need to have a longer MAX ttl so we could schedule a POD recycle before the DB secret's max ttl expires.

If the period token is not enabled on the auth token, then the lease will be revoked when ever the auth-token ttl is reached.

Type Renewal Expiry
Database Dynamic Secret Default TTL: 10 mins MAX TTL: 8 days (Pod will to recycle at 7th day so Max will never reach)
Auth Token Period Token: 10 mins 32 days (is it called as ttl? is this ignored If a period token specified? Can this be shorter?)

@tomhjp
Copy link
Contributor

tomhjp commented May 8, 2023 via email

@msitworld
Copy link

msitworld commented Aug 22, 2023

That worked.

We found that if we route the csi traffic through agent, then whenever the default ttl is reached; it will renew the database secrets up to the Max-ttl. So the secret retrieved by the POD will remain valid as long as the PERIOD-TOKEN is enabled on the kube service accounts vault auth role.

So this proves that we can maintain a shorter ttl (e.g 10 mins) on the dynamic db secrets. But we must have the period token (can have shorter ttl such as 10 mins) need to be enabled on the auth token. We still need to have a longer MAX ttl so we could schedule a POD recycle before the DB secret's max ttl expires.

If the period token is not enabled on the auth token, then the lease will be revoked when ever the auth-token ttl is reached.

Type Renewal Expiry
Database Dynamic Secret Default TTL: 10 mins MAX TTL: 8 days (Pod will to recycle at 7th day so Max will never reach)
Auth Token Period Token: 10 mins 32 days (is it called as ttl? is this ignored If a period token specified? Can this be shorter?)

Hello,

I tried to pass the CSI traffic throught vault-agent-injector but CSI can not connect to it due to the SSL error. I conducted a search for that to fix it with adding some annotation but it did not work. Did you encounter this error? any idea?

reconciler.go:223] "failed to reconcile spc for pod" err="failed to rotate objects for pod application/app-655fd88bfd-qc7z2, err:
rpc error: code = Unknown desc = error making mount request: couldn't read secret  \"test\": failed to login: Post
\"https://vault-agent-injector-svc/v1/auth/kubernetes/login\": tls: failed to verify certificate: x509: certificate signed by
 unknown authority" spc="test-spc" pod="app-655fd88bfd-qc7z2" controller="rotation"

As a heads up, I've installed the Vault using the official helm chart and enabled the vault-agent-injector in helm values.yaml

@tomhjp
Copy link
Contributor

tomhjp commented Aug 22, 2023

@msitworld that sounds like a missing CA cert somewhere. If you can mount Vault's CA cert into the pod using csi.volumes and csi.volumeMounts, then you can use csi.agent.extraArgs to pass in the path to the CA using -ca-cert=/path/to/ca.pem. It maybe seems like the helm chart could do with some additional options around setting up TLS for the agent as well though, as it doesn't look easy to do with environment variables or custom Agent config currently.

@tjjosep
Copy link
Author

tjjosep commented Nov 17, 2023

The original issue was resolved when the traffic routed to the csi-provider-agent-sidecar with the csi-provider version 1.4.1. Closing the issue.

Appreciate the help.

@tjjosep tjjosep closed this as completed Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants