Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Since the Leader Pod UID does not change, the VaultDynamicSecret is updated after the VSO restarts. #959

Open
lvpeixin opened this issue Oct 31, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@lvpeixin
Copy link

lvpeixin commented Oct 31, 2024

Describe the bug
After a power outage and network disruption, the Kubernetes cluster was restarted. Subsequently, the multi-replica Vault Server Operator (VSO) failed to elect a new leader. The leader pod runtimePodUID remained unchanged, and the lease was still within the renewal window. However, the VaultDynamicSecret unexpectedly updated the secret upon restart. For single-replica VSOs, the expiration of the renewal window is unavoidable after a restart.

To Reproduce
Steps to reproduce the behavior:

  1. Configure Vault Secrets engine and Approle authorization,
  2. Deploy VSO via Helm chart
controller:
  replicas: 1
  kubeRbacProxy:
    image:
      pullPolicy: IfNotPresent
      repository: kubebuilder/kube-rbac-proxy
      tag: v0.15.0
    resources:
      limits:
        cpu: 500m
        memory: 512Mi
      requests:
        cpu: 250m
        memory: 256Mi
  manager:
    image:
      pullPolicy: IfNotPresent
      repository: vault/vault-secrets-operator
      tag: 0.7.0
    logging:
      level: info
      timeEncoding: rfc3339
      stacktraceLevel: panic
    globalTransformationOptions:
      excludeRaw: false
    backoffOnSecretSourceError:
      initialInterval: "5s"
      maxInterval: "60s"
      maxElapsedTime: "0s"
      randomizationFactor: 0.5
      multiplier: 1.5
    clientCache:
      persistenceModel: "direct-unencrypted"
      cacheSize: 10000
      storageEncryption:
        enabled: false
        namespace: "vault"          
        mount: approle
        keyName: vso-client-cache      
    maxConcurrentReconciles: 100
    resources:
      limits:
        cpu: 500m
        memory: 128Mi
      requests:
        cpu: 10m
        memory: 64Mi
  podSecurityContext:
    runAsNonRoot: true
  securityContext:
    allowPrivilegeEscalation: false
  controllerConfigMapYaml:
    health:
      healthProbeBindAddress: :8081
    leaderElection:
      leaderElect: true
      resourceName: b0d477c0.hashicorp.com
    metrics:
      bindAddress: 127.0.0.1:8080
    webhook:
      port: 9443
  kubernetesClusterDomain: cluster.local
  terminationGracePeriodSeconds: 120
  preDeleteHookTimeoutSeconds: 120
metricsService:
  ports:
  - name: https
    port: 8443
    protocol: TCP
    targetPort: https
  type: ClusterIP
defaultVaultConnection:
  enabled: true
  address: "https://vault.example.net" 
  caCertSecret: ""
  tlsServerName: ""
  skipTLSVerify: true
  headers: {}
defaultAuthMethod:
  enabled: true
  namespace: "vault"
  allowedNamespaces: []
  method: appRole
  mount: approle
  appRole:
    roleId: roleId-test                   
    secretRef: vault-approle-secretid       
  kubernetes:
    role: ""
    serviceAccount: default
    tokenAudiences: []    
  params: {}
  headers: {}
telemetry:
  serviceMonitor:
    enabled: false  
tests:
  enabled: false

3.Create VaultAuth in my-app Application namespace

apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultAuth
metadata:
  name: dynamic-auth
  namespace: demo-ns
spec:
  method: appRole
  mount: approle
  appRole:
    roleId: "roleId-test"
    secretRef: vault-approle-secretid
  1. Create VaultDynamicSecret in my-app Application namespace
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultDynamicSecret
metadata:
  name: vso-db-secret
  namespace: demo-ns
spec:
  mount: db
  path: creds/create-user-role
  destination:
    create: true
    name: vso-db-secret                                                                                                                                              
  renewalPercent: 65
  vaultAuthRef: dynamic-auth
  1. See that my-app Kubernetes secret gets created 和 Vault Static role rotation works.
  2. Restart Kubernetes
~# kubectl -n vault get pods
NAME                                                            READY   STATUS    RESTARTS      AGE
vso-vault-secrets-operator-controller-manager-76cfff799-xdp8j   2/2     Running   2 (15m ago)   17m
~# ps -ef | grep vault
65532     391841  384153  0 16:34 ?        00:00:01 /vault-secrets-operator --health-probe-bind-address=:8081 --metrics-bind-address=127.0.0.1:8080 --leader-elect --client-cache-persistence-model=direct-unencrypted --client-cache-size=10000 --max-concurrent-reconciles=100 --backoff-initial-interval=5s --backoff-max-interval=60s --backoff-max-elapsed-time=0s --backoff-multiplier=1.50 --backoff-randomization-factor=0.50 --zap-log-level=debug --zap-time-encoding=rfc3339 --zap-stacktrace-level=panic
root      437810  437430  0 16:48 pts/2    00:00:00 grep --color=auto vault
~# kill -9 391841
~# kubectl -n vault get pods
NAME                                                            READY   STATUS    RESTARTS      AGE
vso-vault-secrets-operator-controller-manager-76cfff799-xdp8j   2/2     Running   3 (12s ago)   18m

Expected behavior
VSO controller restart should not trigger a rollout restart of all applications that use VaultDynamicSecret

Environment

  • Kubernetes version: v1.26.14
    • Distribution or cloud vendor (OpenShift, EKS, GKE, AKS, etc.):
    • Other configuration options or runtime services (istio, etc.):
  • vault: 1.16.2
  • vault-secrets-operator version: hashicorp/vault-secrets-operator: 0.7.0

Additional context
VSO VaultDynamicSecret Events :

Events:
  Type    Reason              Age    From                Message
  ----    ------              ----   ----                -------
  Normal  SecretLeaseRenewal  5m27s  VaultDynamicSecret  Not in renewal window after transitioning to a new leader/pod, lease_id=db/creds/create-user-role/btDm36YIHUIHICvJvOpgkyaq, horizon=1m26.710550449s
  Normal  SecretLeaseRenewal  4m3s   VaultDynamicSecret  Not in renewal window after transitioning to a new leader/pod, lease_id=db/creds/create-user-role/btDm36YIHUIHICvJvOpgkyaq, horizon=3.208340562s
  Normal  SecretLeaseRenewal  4m     VaultDynamicSecret  Lease renewal duration was truncated from 1200s to 241s, requesting new credentials
  Normal  SecretRotated       3m59s  VaultDynamicSecret  Secret synced, lease_id="db/creds/create-user-role/0bsSMbGAKxpOOrZyF991D2nG", horizon=17m50.950484501s, sync_reason="SecretLeaseRenewalError"
  Normal  SecretLeaseRenewal  2m53s  VaultDynamicSecret  Not in renewal window after transitioning to a new leader/pod, lease_id=db/creds/create-user-role/0bsSMbGAKxpOOrZyF991D2nG, horizon=14m41.625011324s
  Normal  SecretLeaseRenewal  2m8s   VaultDynamicSecret  Not in renewal window after transitioning to a new leader/pod, lease_id=db/creds/create-user-role/0bsSMbGAKxpOOrZyF991D2nG, horizon=14m5.399806701s
  Normal  SecretLeaseRenewal  56s    VaultDynamicSecret  Lease renewal duration was truncated from 1200s to 1017s, requesting new credentials
  Normal  SecretRotated       55s    VaultDynamicSecret  Secret synced, lease_id="db/creds/create-user-role/ev4vMeOS42o67fZVqPqumwrz", horizon=17m7.45101962s, sync_reason="SecretLeaseRenewalError"

The restrictions in the code are as follows

doSync := syncReason != ""
leaseID := o.Status.SecretLease.ID
if !doSync && r.runtimePodUID != "" && r.runtimePodUID != o.Status.LastRuntimePodUID {
// don't take part in the thundering herd on start up,
// and the lease is still within the renewal window.
horizon, inWindow := computeRelativeHorizonWithJitter(o, time.Second*1)
logger.Info("Restart check",
"inWindow", inWindow,
"horizon", horizon,
"allowStaticCreds", o.Spec.AllowStaticCreds)
if !o.Spec.AllowStaticCreds {
if !inWindow {
// means that we are not in the lease renewal window.
r.Recorder.Eventf(o, corev1.EventTypeNormal, consts.ReasonSecretLeaseRenewal,
"Not in renewal window after transitioning to a new leader/pod, lease_id=%s, horizon=%s",
leaseID, horizon)
if err := r.updateStatus(ctx, o); err != nil {
return ctrl.Result{}, err
}
return ctrl.Result{RequeueAfter: horizon}, nil
}
} else if inWindow {
// TODO: decouple the static-creds in-window/horizon computation from lease
// renewal. means that we are in the rotation period.
r.Recorder.Eventf(o, corev1.EventTypeNormal, consts.ReasonSecretLeaseRenewal,
"In rotation period after transitioning to a new leader/pod, lease_id=%s, horizon=%s",
leaseID, horizon)
if err := r.updateStatus(ctx, o); err != nil {
return ctrl.Result{}, err
}
return ctrl.Result{RequeueAfter: horizon}, nil
}
}

@lvpeixin lvpeixin added the bug Something isn't working label Oct 31, 2024
@lvpeixin lvpeixin changed the title Restart of the single replica VSO controller manager causes the cache to become unusable. When VSO is restarted, the leader election of the pod is still the previous pod, and the runtimePodUID has not changed, so VaultDynamicSecret is updated during the restart. Oct 31, 2024
@lvpeixin lvpeixin changed the title When VSO is restarted, the leader election of the pod is still the previous pod, and the runtimePodUID has not changed, so VaultDynamicSecret is updated during the restart. VSO Restart and Unexpected VaultDynamicSecret Updates Due to Unchanged Leader Pod UID has not changed. Oct 31, 2024
@lvpeixin lvpeixin changed the title VSO Restart and Unexpected VaultDynamicSecret Updates Due to Unchanged Leader Pod UID has not changed. Since the Leader Pod UID does not change, the VaultDynamicSecret is updated after the VSO restarts. Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant