Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault broken when all nodes went offline like in case of power failure #480

Closed
3 tasks done
Elyytscha opened this issue May 25, 2024 · 3 comments
Closed
3 tasks done
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@Elyytscha
Copy link

Elyytscha commented May 25, 2024

Preflight Checklist

  • I have searched the issue tracker for an issue that matches the one I want to file, without success.
  • I am not looking for support or already pursued the available support channels without success.
  • I agree to follow the Code of Conduct.

Operator Version

1.22.1

Installation Type

Official Helm chart

Bank-Vaults Version

No response

Kubernetes Version

1.28

Kubernetes Distribution/Provisioner

OKD

Expected Behavior

Vault should come back online successfull

Actual Behavior

Vault is broken and stays broken

Steps To Reproduce

  1. Provision a Vault with provided config
  2. Turn off all kubernetes compute nodes (Simulate a power failure)
  3. Turn nodes back on
  4. Vault broken

Configuration

apiVersion: vault.banzaicloud.com/v1alpha1
kind: Vault
metadata:
  name: vault
  namespace: openshift-bank-vault
  labels:
    app.kubernetes.io/name: vault
    vault_cr: vault
spec:
  existingTlsSecretName: selfsigned-cert-vault-tls
  veleroEnabled: false
  size: 3
  serviceMonitorEnabled: true
  unsealConfig:
    kubernetes:
      secretNamespace: openshift-bank-vault
  externalConfig:
    policies:
      - name: allow_secrets
        rules: path "secret/*" { capabilities = ["create", "read", "update", "delete", "list"] }
          path "auth/token/create" { capabilities = [ "update" ] }
    auth:
      - type: kubernetes
        roles:
          # Allow every pod in the default namespace to use the secret kv store
          - name: default
            bound_service_account_names: 
            - default
            - vault-mutating-webhook-vault-secrets-webhook 
            - vault
            bound_service_account_namespaces: 
              - "*"
            policies: 
              - allow_secrets
            ttl: 1h
          - name: secretsmutation
            bound_service_account_names:
              - vault-mutating-webhook-vault-secrets-webhook
              - default
            bound_service_account_namespaces:
              - openshift-bank-vault
            policies:
              - allow_secrets
            ttl: 1h
    secrets:
      - path: secret
        type: kv
        description: General secrets.
        options:
          version: 2
    # Allows writing some secrets to Vault (useful for development purposes).
    # See https://www.vaultproject.io/docs/secrets/kv/index.html for more information.
    startupSecrets:
      - type: kv
        path: secret/data/example/account
        data:
          data:
            USER: secretId
            PASS: s3cr3t
  ingress:
    annotations:
      #nginx.ingress.kubernetes.io/backend-protocol: HTTPS
      route.openshift.io/termination: reencrypt
      route.openshift.io/destination-ca-certificate-secret: selfsigned-cert-vault-tls
    spec:
      ingressClassName: openshift-default
      rules:
      - host: secrets.example.com
        http:
          paths:
          - backend:
              service:
                name: vault
                port:
                  number: 8200
            path: /
            pathType: Prefix
  # In some cases, you have to set permissions for the raft directory.
  # For example in the case of using a local kind cluster, uncomment the lines below.
  vaultInitContainers:
    - name: raft-permission
      image: busybox
      command:
        - /bin/sh
        - -c
        - |
          chown -R 100:1000 /vault/file
      volumeMounts:
        - name: vault-raft
          mountPath: /vault/file
  caNamespaces:
    - "*"
  image: hashicorp/vault:1.14.8

  # Vault Pods , Services and TLS Secret annotations
  vaultAnnotations:
    type/instance: vault

  # Vault Configurer Pods and Services annotations
  vaultConfigurerAnnotations:
    type/instance: vaultconfigurer

  # Specify the ServiceAccount where the Vault Pod and the Bank-Vaults configurer/unsealer is running
  serviceAccount: vault

  # Specify the Service's type where the Vault Service is exposed
  # Please note that some Ingress controllers like https://github.com/kubernetes/ingress-gce
  # forces you to expose your Service on a NodePort
  serviceType: ClusterIP

  # Use local disk to store Vault raft data, see config section.
  volumeClaimTemplates:
    - metadata:
        name: vault-raft
      spec:
        # https://kubernetes.io/docs/concepts/storage/persistent-volumes/#class-1
        # storageClassName: ""
        accessModes:
          - ReadWriteOnce
        volumeMode: Filesystem
        resources:
          requests:
            storage: 1Gi

  config:
    storage:
      raft:
        path: "/vault/file"
    listener:
      tcp:
        address: "0.0.0.0:8200"
        tls_cert_file: /vault/tls/server.crt
        tls_key_file: /vault/tls/server.key
    api_addr: https://vault.openshift-bank-vault.svc:8200
    cluster_addr: "https://${.Env.POD_NAME}:8201"
    ui: true

  statsdDisabled: true

  serviceRegistrationEnabled: true

  resources:
    # A YAML representation of resource ResourceRequirements for vault container
    # Detail can reference: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container
    vault:
      limits: {}
      requests:
        memory: "256Mi"
        cpu: "100m"

Logs

operator logs:
{"level":"info","ts":"2024-05-24T19:50:36Z","logger":"cmd","msg":"no watched namespace found, watching the entire cluster"}
{"level":"info","ts":"2024-05-24T19:50:36Z","logger":"cmd","msg":"registering manager checks"}
{"level":"info","ts":"2024-05-24T19:50:36Z","logger":"cmd","msg":"bootstrapping manager"}
{"level":"info","ts":"2024-05-24T19:50:36Z","logger":"cmd","msg":"starting manager"}
{"level":"info","ts":"2024-05-24T19:50:36Z","logger":"controller-runtime.metrics","msg":"Starting metrics server"}
{"level":"info","ts":"2024-05-24T19:50:36Z","logger":"controller-runtime.metrics","msg":"Serving metrics server","bindAddress":":8383","secure":false}
{"level":"info","ts":"2024-05-24T19:50:36Z","msg":"starting server","kind":"health probe","addr":"[::]:8080"}
I0524 19:50:36.204688       1 leaderelection.go:250] attempting to acquire leader lease openshift-bank-vault/vault-operator-lock...
I0524 19:50:53.462609       1 leaderelection.go:260] successfully acquired lease openshift-bank-vault/vault-operator-lock
{"level":"info","ts":"2024-05-24T19:50:53Z","msg":"Starting EventSource","controller":"vault-controller","source":"kind source: *v1alpha1.Vault"}
{"level":"info","ts":"2024-05-24T19:50:53Z","msg":"Starting Controller","controller":"vault-controller"}
{"level":"info","ts":"2024-05-24T19:50:53Z","msg":"Starting workers","controller":"vault-controller","worker count":1}
{"level":"info","ts":"2024-05-24T19:50:53Z","logger":"controller_vault","msg":"Reconciling Vault","Request.Namespace":"openshift-bank-vault","Request.Name":"vault"}
{"level":"info","ts":"2024-05-24T19:51:04Z","logger":"KubeAPIWarningLogger","msg":"would violate PodSecurity \"restricted:latest\": allowPrivilegeEscalation != false (containers \"config-templating\", \"raft-permission\", \"vault\", \"bank-vaults\" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers \"config-templating\", \"raft-permission\", \"vault\", \"bank-vaults\" must set securityContext.capabilities.drop=[\"ALL\"]; container \"vault\" must not include \"IPC_LOCK\", \"SETFCAP\" in securityContext.capabilities.add), runAsNonRoot != true (pod or containers \"config-templating\", \"raft-permission\", \"vault\", \"bank-vaults\" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers \"config-templating\", \"raft-permission\", \"vault\", \"bank-vaults\" must set securityContext.seccompProfile.type to \"RuntimeDefault\" or \"Localhost\")"}
{"level":"info","ts":"2024-05-24T19:51:12Z","logger":"controller_vault","msg":"Reconciling Vault","Request.Namespace":"openshift-bank-vault","Request.Name":"vault"}
{"level":"info","ts":"2024-05-24T19:51:39Z","logger":"controller_vault","msg":"Reconciling Vault","Request.Namespace":"openshift-bank-vault","Request.Name":"vault"}
{"level":"info","ts":"2024-05-24T19:51:53Z","logger":"controller_vault","msg":"Reconciling Vault","Request.Namespace":"openshift-bank-vault","Request.Name":"vault"}
{"level":"info","ts":"2024-05-24T19:52:52Z","logger":"controller_vault","msg":"Reconciling Vault","Request.Namespace":"openshift-bank-vault","Request.Name":"vault"}
{"level":"info","ts":"2024-05-24T19:53:51Z","logger":"controller_vault","msg":"Reconciling Vault","Request.Namespace":"openshift-bank-vault","Request.Name":"vault"}
{"level":"info","ts":"2024-05-24T19:54:51Z","logger":"controller_vault","msg":"Reconciling Vault","Request.Namespace":"openshift-bank-vault","Request.Name":"vault"}
{"level":"info","ts":"2024-05-24T19:55:50Z","logger":"controller_vault","msg":"Reconciling Vault","Request.Namespace":"openshift-bank-vault","Request.Name":"vault"}
{"level":"info","ts":"2024-05-24T19:56:01Z","logger":"controller_vault","msg":"Reconciling Vault","Request.Namespace":"openshift-bank-vault","Request.Name":"vault"}
{"level":"info","ts":"2024-05-24T19:56:12Z","logger":"controller_vault","msg":"Reconciling Vault","Request.Namespace":"openshift-bank-vault","Request.Name":"vault"}


Vault logs:
==> Vault server configuration:

Administrative Namespace: 
             Api Address: https://vault.openshift-bank-vault.svc:8200
                     Cgo: disabled
         Cluster Address: https://vault-0:8201
   Environment Variables: GODEBUG, GOTRACEBACK, HOME, HOSTNAME, KUBERNETES_PORT, KUBERNETES_PORT_443_TCP, KUBERNETES_PORT_443_TCP_ADDR, KUBERNETES_PORT_443_TCP_PORT, KUBERNETES_PORT_443_TCP_PROTO, KUBERNETES_SERVICE_HOST, KUBERNETES_SERVICE_PORT, KUBERNETES_SERVICE_PORT_HTTPS, NAME, NSS_SDB_USE_CACHE, PATH, PWD, SHLVL, TERM, VAULT_0_PORT, VAULT_0_PORT_8200_TCP, VAULT_0_PORT_8200_TCP_ADDR, VAULT_0_PORT_8200_TCP_PORT, VAULT_0_PORT_8200_TCP_PROTO, VAULT_0_PORT_8201_TCP, VAULT_0_PORT_8201_TCP_ADDR, VAULT_0_PORT_8201_TCP_PORT, VAULT_0_PORT_8201_TCP_PROTO, VAULT_0_PORT_9091_TCP, VAULT_0_PORT_9091_TCP_ADDR, VAULT_0_PORT_9091_TCP_PORT, VAULT_0_PORT_9091_TCP_PROTO, VAULT_0_SERVICE_HOST, VAULT_0_SERVICE_PORT, VAULT_0_SERVICE_PORT_API_PORT, VAULT_0_SERVICE_PORT_CLUSTER_PORT, VAULT_0_SERVICE_PORT_METRICS, VAULT_1_PORT, VAULT_1_PORT_8200_TCP, VAULT_1_PORT_8200_TCP_ADDR, VAULT_1_PORT_8200_TCP_PORT, VAULT_1_PORT_8200_TCP_PROTO, VAULT_1_PORT_8201_TCP, VAULT_1_PORT_8201_TCP_ADDR, VAULT_1_PORT_8201_TCP_PORT, VAULT_1_PORT_8201_TCP_PROTO, VAULT_1_PORT_9091_TCP, VAULT_1_PORT_9091_TCP_ADDR, VAULT_1_PORT_9091_TCP_PORT, VAULT_1_PORT_9091_TCP_PROTO, VAULT_1_SERVICE_HOST, VAULT_1_SERVICE_PORT, VAULT_1_SERVICE_PORT_API_PORT, VAULT_1_SERVICE_PORT_CLUSTER_PORT, VAULT_1_SERVICE_PORT_METRICS, VAULT_2_PORT, VAULT_2_PORT_8200_TCP, VAULT_2_PORT_8200_TCP_ADDR, VAULT_2_PORT_8200_TCP_PORT, VAULT_2_PORT_8200_TCP_PROTO, VAULT_2_PORT_8201_TCP, VAULT_2_PORT_8201_TCP_ADDR, VAULT_2_PORT_8201_TCP_PORT, VAULT_2_PORT_8201_TCP_PROTO, VAULT_2_PORT_9091_TCP, VAULT_2_PORT_9091_TCP_ADDR, VAULT_2_PORT_9091_TCP_PORT, VAULT_2_PORT_9091_TCP_PROTO, VAULT_2_SERVICE_HOST, VAULT_2_SERVICE_PORT, VAULT_2_SERVICE_PORT_API_PORT, VAULT_2_SERVICE_PORT_CLUSTER_PORT, VAULT_2_SERVICE_PORT_METRICS, VAULT_CONFIGURER_PORT, VAULT_CONFIGURER_PORT_9091_TCP, VAULT_CONFIGURER_PORT_9091_TCP_ADDR, VAULT_CONFIGURER_PORT_9091_TCP_PORT, VAULT_CONFIGURER_PORT_9091_TCP_PROTO, VAULT_CONFIGURER_SERVICE_HOST, VAULT_CONFIGURER_SERVICE_PORT, VAULT_CONFIGURER_SERVICE_PORT_METRICS, VAULT_K8S_POD_NAME, VAULT_MUTATING_WEBHOOK_VAULT_SECRETS_WEBHOOK_PORT, VAULT_MUTATING_WEBHOOK_VAULT_SECRETS_WEBHOOK_PORT_443_TCP, VAULT_MUTATING_WEBHOOK_VAULT_SECRETS_WEBHOOK_PORT_443_TCP_ADDR, VAULT_MUTATING_WEBHOOK_VAULT_SECRETS_WEBHOOK_PORT_443_TCP_PORT, VAULT_MUTATING_WEBHOOK_VAULT_SECRETS_WEBHOOK_PORT_443_TCP_PROTO, VAULT_MUTATING_WEBHOOK_VAULT_SECRETS_WEBHOOK_SERVICE_HOST, VAULT_MUTATING_WEBHOOK_VAULT_SECRETS_WEBHOOK_SERVICE_PORT, VAULT_MUTATING_WEBHOOK_VAULT_SECRETS_WEBHOOK_SERVICE_PORT_VAULT_SECRETS_WEBHOOK, VAULT_OPERATOR_PORT, VAULT_OPERATOR_PORT_80_TCP, VAULT_OPERATOR_PORT_80_TCP_ADDR, VAULT_OPERATOR_PORT_80_TCP_PORT, VAULT_OPERATOR_PORT_80_TCP_PROTO, VAULT_OPERATOR_PORT_8383_TCP, VAULT_OPERATOR_PORT_8383_TCP_ADDR, VAULT_OPERATOR_PORT_8383_TCP_PORT, VAULT_OPERATOR_PORT_8383_TCP_PROTO, VAULT_OPERATOR_SERVICE_HOST, VAULT_OPERATOR_SERVICE_PORT, VAULT_OPERATOR_SERVICE_PORT_HTTP, VAULT_OPERATOR_SERVICE_PORT_HTTP_METRICS, VAULT_PORT, VAULT_PORT_8200_TCP, VAULT_PORT_8200_TCP_ADDR, VAULT_PORT_8200_TCP_PORT, VAULT_PORT_8200_TCP_PROTO, VAULT_PORT_8201_TCP, VAULT_PORT_8201_TCP_ADDR, VAULT_PORT_8201_TCP_PORT, VAULT_PORT_8201_TCP_PROTO, VAULT_PORT_9091_TCP, VAULT_PORT_9091_TCP_ADDR, VAULT_PORT_9091_TCP_PORT, VAULT_PORT_9091_TCP_PROTO, VAULT_PORT_9102_TCP, VAULT_PORT_9102_TCP_ADDR, VAULT_PORT_9102_TCP_PORT, VAULT_PORT_9102_TCP_PROTO, VAULT_SERVICE_HOST, VAULT_SERVICE_PORT, VAULT_SERVICE_PORT_API_PORT, VAULT_SERVICE_PORT_CLUSTER_PORT, VAULT_SERVICE_PORT_METRICS, VAULT_SERVICE_PORT_STATSD, VERSION
              Go Version: go1.20.11
              Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "0.0.0.0:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "enabled")
               Log Level: 
                   Mlock: supported: true, enabled: true
           Recovery Mode: false
                 Storage: raft (HA available)
                 Version: Vault v1.14.8, built 2023-12-04T17:45:23Z
             Version Sha: 446f213c47cabf47d52d065647ef666ce4bf8692

==> Vault server started! Log data will stream in below:

2024-05-24T19:55:13.932Z [INFO]  proxy environment: http_proxy="" https_proxy="" no_proxy=""
2024-05-24T19:55:14.300Z [INFO]  core: Initializing version history cache for core
2024-05-24T19:55:14.868Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:14.870Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:14.870Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:14.870Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:55:14.873Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:55:14.873Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:55:16.220Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:16.220Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:16.220Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:55:16.222Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:55:16.222Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:55:18.580Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:18.580Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:18.580Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:55:18.581Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:55:18.581Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:55:19.279Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:19.280Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:19.280Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:19.280Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:55:19.282Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:55:19.282Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:55:20.451Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:20.451Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:20.451Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:55:20.453Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:55:20.453Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:55:22.590Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:22.590Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:22.590Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:55:22.592Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:55:22.592Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:55:22.627Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:23.069Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:24.073Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:27.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:32.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:32.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:37.631Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:39.146Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:39.369Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:39.370Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:39.370Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:39.370Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:55:39.374Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:55:39.374Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:55:40.137Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:40.603Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:40.603Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:40.603Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:55:40.605Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:55:40.605Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:55:42.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:42.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:43.278Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:43.279Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:43.279Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:55:43.280Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:55:43.280Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:55:44.152Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:47.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:52.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:52.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:57.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:58.006Z [INFO]  core: security barrier not initialized
2024-05-24T19:55:58.145Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:02.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:02.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:07.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:09.308Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:10.145Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:10.319Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:10.320Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:10.320Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:10.320Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:56:10.323Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:56:10.323Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:56:11.226Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:11.457Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:11.457Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:11.457Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:56:11.459Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:56:11.459Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:56:12.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:12.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:14.188Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:14.188Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:14.188Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:56:14.189Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:56:14.189Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:56:14.236Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:17.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:20.113Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:22.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:22.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:25.145Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:27.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:32.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:32.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:37.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:40.144Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:42.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:42.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:47.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:52.627Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:52.628Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:53.145Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:57.406Z [INFO]  core: security barrier not initialized
2024-05-24T19:56:57.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:02.630Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:02.630Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:06.145Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:06.371Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:06.403Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:06.404Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:06.404Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:06.404Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:57:06.406Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:57:06.406Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:57:07.419Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:07.419Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:07.419Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:57:07.420Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:57:07.420Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:57:07.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:09.526Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:09.526Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:09.526Z [INFO]  core: attempting to join possible raft leader node: leader_addr=https://vault:8200
2024-05-24T19:57:09.528Z [ERROR] core: failed to get raft challenge: leader_addr=https://vault:8200 error="error during raft bootstrap init call: Put \"https://vault:8200/v1/sys/storage/raft/bootstrap/challenge\": dial tcp 10.10.163.190:8200: connect: connection refused"
2024-05-24T19:57:09.528Z [ERROR] core: failed to join raft cluster: error="failed to get raft challenge"
2024-05-24T19:57:10.382Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:12.625Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:12.626Z [INFO]  core: security barrier not initialized
2024-05-24T19:57:17.625Z [INFO]  core: security barrier not initialized

Additional Information

No response

@Elyytscha Elyytscha added the kind/bug Categorizes issue or PR as related to a bug. label May 25, 2024
@Elyytscha
Copy link
Author

which seems not a problem related to bank vaults as far as I investigated:

i think its https://support.hashicorp.com/hc/en-us/articles/360050756393-How-to-recover-from-permanently-lost-quorum-while-using-Raft-integrated-storage-with-Vault

questions remaining are:

  1. Can we workaround / automate this with the vault operator?
  2. Is it intended to use velero to handle those situations?

@Elyytscha
Copy link
Author

Elyytscha commented Jun 6, 2024

As I found out, this happens if unsealconfig is kubernetes

  unsealConfig:
    options:
      preFlightChecks: true
      storeRootToken: true
      secretShares: 5
      secretThreshold: 3
    kubernetes:
      secretNamespace: vault

with this config, vault does not survive an outage, if you kill all vault pods, vault doesn't come back up by itself

with another config, example, and the same storage backend (raft) vault does survive an outage of all vault nodes, comes back up online successfully without interaction

    google:
      kmsKeyRing: ${kms_keyring}
      kmsCryptoKey: ${kms_crypto_key}
      kmsLocation: ${region}
      kmsProject: ${project}
      storageBucket: ${storage_bucket}
  1. Why this work for google but not Kubernetes unsealConfig?
  2. How its intended to run bv operator operated vault in clusters where no cloud provider is used/available?

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR that has become stale and will be auto-closed. label Aug 11, 2024
@bank-vaults bank-vaults deleted a comment from github-actions bot Aug 11, 2024
@csatib02 csatib02 removed the lifecycle/stale Denotes an issue or PR that has become stale and will be auto-closed. label Aug 11, 2024
@csatib02
Copy link
Member

Hey @Elyytscha,

Thanks for investigating this.

  1. The Vault Operator can automate the unseal process after a cluster outage by interacting with the Kubernetes API to retrieve the necessary secrets. It can be configured to wait until the Kubernetes control plane and the necessary secrets are available before attempting to unseal Vault.
  2. Velero can back up Kubernetes Secrets, ConfigMaps, and other resources that Vault depends on. If a cluster-wide outage occurs, Velero can restore these resources, helping to ensure that Vault has the necessary keys available when it restarts.
    NOTE: You can use Cluster State Restoration:
    In cases where the entire cluster needs to be restored, Velero can be used to bring back the cluster to a known good state, including all the necessary Vault-related resources. After restoration, you may still need to ensure that the Vault Operator (or some other automation) handles the unseal process correctly, as restoring resources does not necessarily mean they will be available in the correct order or timing.
  3. The key difference between the Google KMS and Kubernetes Secret-based configurations is that Google KMS is an external, highly available service that is not affected by the state of your Kubernetes cluster. In contrast, Kubernetes Secrets are tightly coupled to the availability and timing of the Kubernetes control plane, making them less reliable in scenarios where the entire cluster experiences an outage.
  4. There are several things you might try, but I would suggest using Consul for managing backup unseal keys instead of Kubernetes Secrets. Consul can be deployed outside of Kubernetes, or even within Kubernetes, but with better recovery mechanisms compared to etcd. Also there are several KMS stores supported by Bank-Vaults CLI that works in cooperation with the Operator, therefore enabling you to utilise them as external storage for unseal keys.

I'm going to close this because you managed to solve the original issue, if you have anymore question, please open another ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants