Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Talos install error: "failed to verify certificate: x509: certificate signed by unknown authority" #1398

Closed
bojanraic opened this issue Mar 31, 2024 · 12 comments

Comments

@bojanraic
Copy link

Used the template with k3s and I liked the setup for the most part, but I wanted to try Talos, just for comparison.
Trying a Talos 1.6.7 install. Can't figure out why I'm getting a cert error. I ran talos bootstrap tasks individually and it seems fine until the install step, where I get
error executing bootstrap: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate signed by unknown authority"
Maybe I missed something so any pointers would be very welcome. Thanks!

@onedr0p
Copy link
Owner

onedr0p commented Mar 31, 2024

Hi @bojanraic I have seen other people report this issue but we never found the root cause. It appears to be related to their workstation environment. Are you able to post your config.yaml and the rendered talhelper.yaml (with redactions) ? Maybe something in there can shed some light on why this is happening for some people and not others.

@bojanraic
Copy link
Author

Hey, thanks for the quick reply, appreciate it.
Hmmm, very possible that it is indeed related to local setup. I'll try a few more things, including a clean devcontainer approach and will post back later today or tomorrow.

@DavidIlie
Copy link

It seems that the "--insecure" is not being appended, as in the config.yaml in the guidance to fetch information from the particular nodes, you need to append the "--insecure" flag for it to work, and I am having the same issue right now.

@DavidIlie
Copy link

image

config.yaml

---
bootstrap_timezone: "Europe/Bucharest"

bootstrap_distribution: talos

bootstrap_cluster_name: "davidapps"

bootstrap_talos:
  schematic_id: ""
  vlan: ""
  secureboot:
    enabled: false
    encrypt_disk_with_tpm: false
  user_patches: false

bootstrap_node_network: "192.168.100.0/24"

bootstrap_node_default_gateway: "192.168.100.1"

bootstrap_node_inventory:
  - name: "master-01"
    address: ""
    controller: true
    talos_disk: "/dev/sda"
    talos_nic: ""
    ssh_user: "david"
  - name: "master-02"
    address: ""
    controller: true
    talos_disk: "/dev/sda"
    talos_nic: ""
    ssh_user: "david"
  - name: "master-03"
    address: ""
    controller: true
    talos_disk: "/dev/vda"
    talos_nic: ""
    ssh_user: "david"
  - name: "worker-01"
    address: ""
    controller: false
    talos_disk: "/dev/sda"
    talos_nic: ""
    ssh_user: "david"
  - name: "worker-02"
    address: ""
    controller: false
    talos_disk: "/dev/sda"
    talos_nic: ""
    ssh_user: "david"
  - name: "worker-03"
    address: ""
    controller: false
    talos_disk: "/dev/vda"
    talos_nic: ""
    ssh_user: "david"

bootstrap_dns_servers: ["192.168.100.1", "8.8.8.8"]

bootstrap_search_domain: ""

bootstrap_pod_network: "10.69.0.0/16"

bootstrap_service_network: "10.96.0.0/16"

bootstrap_controllers_vip: "192.168.100.169"

bootstrap_tls_sans: []

bootstrap_sops_age_pubkey: ""

bootstrap_github_address: "https://github.com/davidilie/home-cluster"

bootstrap_github_branch: "main"

bootstrap_github_webhook_token: ""

bootstrap_github_private_key: |


bootstrap_cloudflare:
  enabled: true
  domain: "https://davidapps.dev"
  token: ""
  acme:
    email: "david@davidapps.dev"
    production: true
  ingress_vip: "192.168.100.102"
  gateway_vip: "192.168.100.100"
  tunnel:
    id: ""
    account_id: ""
    secret: ""
    ingress_vip: ""

talconfig.yaml

# yaml-language-server: $schema=https://raw.githubusercontent.com/budimanjojo/talhelper/master/pkg/config/schemas/talconfig.json
---
# renovate: datasource=docker depName=ghcr.io/siderolabs/installer
talosVersion: v1.6.7
# renovate: datasource=docker depName=ghcr.io/siderolabs/kubelet
kubernetesVersion: v1.29.3

clusterName: &cluster davidapps
endpoint: https://192.168.100.169:6443
clusterPodNets:
  - "10.69.0.0/16"
clusterSvcNets:
  - "10.96.0.0/16"
additionalApiServerCertSans: &sans
  - "192.168.100.169"
  - 127.0.0.1 # KubePrism
additionalMachineCertSans: *sans
cniConfig:
  name: none

nodes:
  - hostname: "master-01"
    ipAddress: "192.168.100.53"
    installDisk: "/dev/sda"
    talosImageURL: factory.talos.dev/installer/
    controlPlane: true
    networkInterfaces:
      - deviceSelector:
          hardwareAddr: ""
        dhcp: false
        addresses:
          - "192.168.100.53/24"
        mtu: 1500
        routes:
          - network: 0.0.0.0/0
            gateway: "192.168.100.1"
        vip:
          ip: "192.168.100.169"
  - hostname: "master-02"
    ipAddress: "192.168.100.57"
    installDisk: "/dev/sda"
    talosImageURL: factory.talos.dev/installer/
    controlPlane: true
    networkInterfaces:
      - deviceSelector:
          hardwareAddr: ""
        dhcp: false
        addresses:
          - "192.168.100.57/24"
        mtu: 1500
        routes:
          - network: 0.0.0.0/0
            gateway: "192.168.100.1"
        vip:
          ip: "192.168.100.169"
  - hostname: "master-03"
    ipAddress: "192.168.100.54"
    installDisk: "/dev/vda"
    talosImageURL: factory.talos.dev/installer/
    controlPlane: true
    networkInterfaces:
      - deviceSelector:
          hardwareAddr: ""
        dhcp: false
        addresses:
          - "192.168.100.54/24"
        mtu: 1500
        routes:
          - network: 0.0.0.0/0
            gateway: "192.168.100.1"
        vip:
          ip: "192.168.100.169"
  - hostname: "worker-01"
    ipAddress: "192.168.100.58"
    installDisk: "/dev/sda"
    talosImageURL: factory.talos.dev/installer/
    controlPlane: false
    networkInterfaces:
      - deviceSelector:
          hardwareAddr: ""
        dhcp: false
        addresses:
          - "192.168.100.58/24"
        mtu: 1500
        routes:
          - network: 0.0.0.0/0
            gateway: "192.168.100.1"
  - hostname: "worker-02"
    ipAddress: "192.168.100.59"
    installDisk: "/dev/sda"
    talosImageURL: factory.talos.dev/installer/
    controlPlane: false
    networkInterfaces:
      - deviceSelector:
          hardwareAddr: ""
        dhcp: false
        addresses:
          - "192.168.100.59/24"
        mtu: 1500
        routes:
          - network: 0.0.0.0/0
            gateway: "192.168.100.1"
  - hostname: "worker-03"
    ipAddress: "192.168.100.55"
    installDisk: "/dev/vda"
    talosImageURL: factory.talos.dev/installer/
    controlPlane: false
    networkInterfaces:
      - deviceSelector:
          hardwareAddr: ""
        dhcp: false
        addresses:
          - "192.168.100.55/24"
        mtu: 1500
        routes:
          - network: 0.0.0.0/0
            gateway: "192.168.100.1"

patches:
  # Configure containerd
  - |-
    machine:
      files:
        - op: create
          path: /etc/cri/conf.d/20-customization.part
          content: |-
            [plugins."io.containerd.grpc.v1.cri"]
              enable_unprivileged_ports = true
              enable_unprivileged_icmp = true
            [plugins."io.containerd.grpc.v1.cri".containerd]
              discard_unpacked_layers = false
            [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
              discard_unpacked_layers = false

  # Disable search domain everywhere
  - |-
    machine:
      network:
        disableSearchDomain: true

  # Enable cluster discovery
  - |-
    cluster:
      discovery:
        registries:
          kubernetes:
            disabled: false
          service:
            disabled: false

  # Configure kubelet
  - |-
    machine:
      kubelet:
        extraArgs:
          rotate-server-certificates: true
        nodeIP:
          validSubnets: ["192.168.100.0/24"]

  # Force nameserver
  - |-
    machine:
      network:
        nameservers:
          - 192.168.100.1
          - 8.8.8.8

  # Configure NTP
  - |-
    machine:
      time:
        disabled: false
        servers: ["time.cloudflare.com"]

  # Custom sysctl settings
  - |-
    machine:
      sysctls:
        fs.inotify.max_queued_events: 65536
        fs.inotify.max_user_watches: 524288
        fs.inotify.max_user_instances: 8192

  # Mount openebs-hostpath in kubelet
  - |-
    machine:
      kubelet:
        extraMounts:
          - destination: /var/openebs/local
            type: bind
            source: /var/openebs/local
            options: ["bind", "rshared", "rw"]



controlPlane:
  patches:
    # Cluster configuration
    - |-
      cluster:
        allowSchedulingOnControlPlanes: true
        controllerManager:
          extraArgs:
            bind-address: 0.0.0.0
        proxy:
          disabled: true
        scheduler:
          extraArgs:
            bind-address: 0.0.0.0

    # ETCD configuration
    - |-
      cluster:
        etcd:
          extraArgs:
            listen-metrics-urls: http://0.0.0.0:2381
          advertisedSubnets:
            - "192.168.100.0/24"

    # Disable default API server admission plugins.
    - |-
      - op: remove
        path: /cluster/apiServer/admissionControl

    # Enable K8s Talos API Access
    - |-
      machine:
        features:
          kubernetesTalosAPIAccess:
            enabled: true
            allowedRoles: ["os:admin"]
            allowedKubernetesNamespaces: ["system-upgrade"]

@bojanraic
Copy link
Author

@DavidIlie it's possible that insecure is not being applied properly. I tried the setup from scratch a few times using devcontainer and I get similar results to you (and similar to using workstation/non-devcontainer method).
@onedr0p since Talos is proving to be a challenge, I am going back to k3s for time being.
I will keep an eye on this issue in case @DavidIlie discovers something useful, but in terms of whether to close it or not, it's completely up to you guys.

@onedr0p
Copy link
Owner

onedr0p commented Apr 1, 2024

@DavidIlie I think the issue might be you didn't fill out a schematic_id?

This doesn't look right in the generated config...

talosImageURL: factory.talos.dev/installer/

# (Required: Talos) If you need any additional System Extensions, and/or add kernel arguments generate a schematic ID.
# Go to https://factory.talos.dev/ and choose the System Extensions, and/or add kernel arguments.
schematic_id: ""

@onedr0p
Copy link
Owner

onedr0p commented Apr 1, 2024

I have committed a change to the default config to include a default schematic id (the default id defined has no customizations or kernel args). This may or may not be what people want but hopefully the wording makes it so it doesn't catch people off guard.

159c25a

@bojanraic
Copy link
Author

Thanks for the update, @onedr0p! I probably won't have time to try it out until the weekend, but I'm very interested in knowing if this resolves @DavidIlie's issue. If it does, it would definitely nudge me towards giving Talos another serious try.
Cheers!

@DavidIlie
Copy link

What you told me to do resolved that problem! But now I have another one, which errors out every single node during the installation and nothing happens.

Screenshot 2024-04-02 012946

@bojanraic
Copy link
Author

@DavidIlie I've seen this one on my end as well. do you have 127.0.0.1 in the list of cert SANs?
@onedr0p if you consider this to be unrelated to the original issue (or maybe even to the template itself, except perhaps for the documentation), I can close this issue and @DavidIlie can open a separate one.

@DavidIlie
Copy link

I'll create a separate issue, as I keep getting the error and "127.0.0.1" is in the list of cert SANs but I can see that the VIP is not being created

@bojanraic
Copy link
Author

@DavidIlie sounds good! I'm going to close this issue now.
I will keep an eye out on the new one you create as I would also like to try Talos out.
Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants