-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcd : Gen_certs | run cert generation script fails on SSL #2343
Comments
could you try running the command manually:
|
The command worked fine :). What I found out was that when running the ansible multiple times I guess it doesn't create new certificates. I removed the old certificates in |
Ah. So you already had certificates . I see now from the log output. The command failling is The |
This happens possible because of |
Awesome, I guess we can close the issue then :) |
Well. The issue is not really fixed. It should not stop the process |
I have the same problem. I removed the ssl keys ... I reset the cluster ... just fails. Not sure what is going on there. |
[req] For some reason the openssl.conf is getting False as an IP address and this fails to generate the certificate. Not I need to see where that's coming from |
How did you configure
? |
External LB example configapiserver_loadbalancer_domain_name: "k8sapi.a2tz.com" Internal loadbalancers for apiserversloadbalancer_apiserver_localhost: false |
@woopstar Any idea ? |
No, it seems right. Are you running the latest Kubespray from master branch? It seems odd the ip's are there multiple times as you only have two hosts in your inventory file. Are you sure you are running the right inventory file? DNS in the openssl.conf says hostname |
Not sure if this is the cause of your issue, but I ran into the same error as well and it was due to an invalid entry in hosts file. I had specified a DNS name for the api load balancer and it added a hosts file entry that looks like this:
The documentation should probably be updated for how to properly define aws loadbalancers which need to be referenced by name and not ip. |
Same issue, I have no idea why the MASTERS and NODES varibles are both NULL, the "stderr_lines": [
"+ set -o errexit",
"+ set -o pipefail",
"+ (( 4 ))",
"+ case \"$1\" in",
"+ CONFIG=/etc/ssl/etcd/openssl.conf",
"+ shift 2",
"+ (( 2 ))",
"+ case \"$1\"in",
"+ SSLDIR=/etc/ssl/etcd/ssl",
"+ shift 2",
"+ (( 0 ))",
"+ '[' -z /etc/ssl/etcd/openssl.conf ']'",
"+ '[' -z /etc/ssl/etcd/ssl ']'",
"++ mktemp -d /tmp/etcd_cacert.XXXXXX",
"+ tmpdir=/tmp/etcd_cacert.uIu8DZ",
"+ trap 'rm -rf \"${tmpdir}\"' EXIT",
"+ cd /tmp/etcd_cacert.uIu8DZ",
"+ mkdir -p /etc/ssl/etcd/ssl",
"+ '[' -e /etc/ssl/etcd/ssl/ca-key.pem ']'",
"+ cp /etc/ssl/etcd/ssl/ca.pem /etc/ssl/etcd/ssl/ca-key.pem .",
"+ '[' -n ' ' ']'",
"+ '[' -n ' ' ']'",
"+ '[' -e /etc/ssl/etcd/ssl/ca-key.pem ']'",
"+ rm -f ca.pem ca-key.pem",
"+ mv '*.pem' /etc/ssl/etcd/ssl/",
"mv: cannot stat ‘*.pem’: No such file or directory",
"+ rm -rf /tmp/etcd_cacert.uIu8DZ"
], |
Well, found and solved the problem finally. The script WhyKubespray set kubeadm as default deployment mode since v2.8.0, and there is a switch So, as I met this error when I run the playbook
This made the task ran but HowI think the kubespray did not welly support the scaling master nodes feature, as I upgraded my k8s cluster using kubespray from v2.6.0 to v2.8.3, the old deplyment mode and kubeadm mode were both used and this messed up my environment(old files exist and some tasks skiped and some services loaded old style config files) and leads my scaling failed. If I run the playbook without any extra parameters, I would miss the Two ways to solve my problem:
Relate codes:
---
dependencies:
- role: kubernetes/secrets
when: not kubeadm_enabled
tags:
- k8s-secrets
- name: Gen_certs | run cert generation script
command: "bash -x {{ etcd_script_dir }}/make-ssl-etcd.sh -f {{ etcd_config_dir }}/openssl.conf -d {{ etcd_cert_dir }}"
environment:
- MASTERS: "{% for m in groups['etcd'] %}
{% if gen_node_certs[m] %}
{{ m }}
{% endif %}
{% endfor %}"
- HOSTS: "{% for h in (groups['k8s-cluster'] + groups['calico-rr']|default([]))|unique %}
{% if gen_node_certs[h] %}
{{ h }}
{% endif %}
{% endfor %}"
run_once: yes
delegate_to: "{{groups['etcd'][0]}}"
when:
- gen_certs|default(false)
- inventory_hostname == groups['etcd'][0]
notify: set etcd_secret_changed
- name: "Check_certs | Set 'gen_node_certs' to true"
set_fact:
gen_node_certs: |-
{
{% set all_etcd_hosts = groups['k8s-cluster']|union(groups['etcd'])|union(groups['calico-rr']|default([]))|unique|sort -%}
{% set existing_certs = etcdcert_master.files|map(attribute='path')|list|sort %}
{% for host in all_etcd_hosts -%}
{% set host_cert = "%s/node-%s-key.pem"|format(etcd_cert_dir, host) %}
{% if host_cert in existing_certs -%}
"{{ host }}": False,
{% else -%}
"{{ host }}": True,
{% endif -%}
{% endfor %}
}
run_once: true |
In our case just executing the workarounds above was not enough. Even after deleting and regenerating the certificates the error remained. Then we figured out that there is a docker container which runs etcd and that does not have its certificates updated when we re-execute the playbook. So if the error remains, delete the etcd container and rerun the playbook. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
I am also running into this, or something similar, saying file does not exist. When looking on node manually, directory
|
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG
Environment:
Cloud provider or hardware configuration:
hardware
OS (
printf "$(uname -srm)\n$(cat /etc/os-release)\n"
):Linux 3.10.0-693.2.2.el7.x86_64 x86_64
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
ansible --version
):ansible 2.4.2.0
Kubespray version (commit) (
git rev-parse --short HEAD
):v2.4.0
Network plugin used:
calico
Copy of your inventory file:
nc-kub-m01 ansible_ssh_host=10.0.55.165 ip=10.0.55.165
nc-kub-s01 ansible_ssh_host=10.0.55.163 ip=10.0.55.163
[kube-master]
nc-kub-m01
[etcd]
nc-kub-m01
[kube-node]
nc-kub-s01
[k8s-cluster:children]
kube-node
kube-master
Command used to invoke ansible:
ansible-playbook -i inventory/hosts.ini cluster.yml -b -K -v --user=kubernetes --private-key=~/.ssh/kubernetes.pem --ask-sudo-pass
Output of ansible run:
Anything else do we need to know:
Also related to issue #1445.
The text was updated successfully, but these errors were encountered: