Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to take etcd snapshot exit code [1]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. #875

Closed
zhuzhongshu opened this issue Aug 23, 2018 · 3 comments
Labels
Milestone

Comments

@zhuzhongshu
Copy link

RKE version:
v0.1.9
Docker version: (docker version,docker info preferred)
17.03.1-ce
Operating system and kernel: (cat /etc/os-release, uname -r preferred)
Centos 3.10.0-693.el7.x86_64
Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
Bare-metal
cluster.yml file:
nodes:

  • address: 192.168.11.179 # hostname or IP to access nodes
    user: docker # root user (usually 'root')
    role: [controlplane,etcd,worker] # K8s roles for node
    ssh_key_path: ~/.ssh/id_rsa # path to PEM file
  • address: 192.168.11.189 # hostname or IP to access nodes
    user: docker # root user (usually 'root')
    role: [controlplane,etcd,worker] # K8s roles for node
    ssh_key_path: ~/.ssh/id_rsa # path to PEM file
  • address: 192.168.11.93 # hostname or IP to access nodes
    user: docker # root user (usually 'root')
    role: [controlplane,etcd,worker] # K8s roles for node
    ssh_key_path: ~/.ssh/id_rsa # path to PEM file
  • address: 192.168.11.94 # hostname or IP to access nodes
    user: docker # root user (usually 'root')
    role: [worker] # K8s roles for node
    ssh_key_path: ~/.ssh/id_rsa # path to PEM file

addons: |-

kind: Namespace
apiVersion: v1
metadata:
name: cattle-system

kind: ServiceAccount
apiVersion: v1
metadata:
name: cattle-admin
namespace: cattle-system

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: cattle-crb
namespace: cattle-system
subjects:

  • kind: ServiceAccount
    name: cattle-admin
    namespace: cattle-system
    roleRef:
    kind: ClusterRole
    name: cluster-admin
    apiGroup: rbac.authorization.k8s.io

apiVersion: v1
kind: Secret
metadata:
name: cattle-keys-server
namespace: cattle-system
type: Opaque
data:
cacerts.pem: MIIDmDCCAoACCQCxtZxFi/KC2TANBgkqhkiG9w0BAQsFADCBjTELMAkGA1UEBhMCQ04xEDAOBgNVBAgMB1NpY2h1YW4xEDAOBgNVBAcMB0NoZW5nZHUxEjAQBgNVBAoMCWluZm8tdGVjaDESMBAGA1UECwwJaW5mby10ZWNoMRcwFQYDVQQDDA5tZS5yYW5jaGVyLmNvbTEZMBcGCSqGSIb3DQEJARYKMjIyQHBwLmNvbTAeFw0xODA4MjAwNDQ4NTJaFw0yODA4MTcwNDQ4NTJaMIGNMQswCQYDVQQGEwJDTjEQMA4GA1UECAwHU2ljaHVhbjEQMA4GA1UEBwwHQ2hlbmdkdTESMBAGA1UECgwJaW5mby10ZWNoMRIwEAYDVQQLDAlpbmZvLXRlY2gxFzAVBgNVBAMMDm1lLnJhbmNoZXIuY29tMRkwFwYJKoZIhvcNAQkBFgoyMjJAcHAuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAyf9R0iE+4M/8d1yPBKQYp5aimcSzzS6g3l+a09PQ/WL4re/ej68MaxEEVjLKvqQOuFCGdM7m+Zvna7jVEMQtzmX3sUz97b4GXsfCwaodEhdgjY/9B/4RUzHR0+acgx3OSh/7LChWoQzICckLcNoyS5hqz7MeUkSZqw3bg5Nky3YQED9kndIwlZKmM8Jo2YbIM6ZU3dVSBfnMQZzAURAnT5JoEyGOnTweiZA73ZMxmistMFoMbwIO8mxgTEvLs2f9k9EVd3B+crvrxXwSvMdo36o/lZMLuh/l1fwNK6NuKnEimWAbjlc3OwU9lhWNrepyp2mrAtkL9fo8LPjTCbajUQIDAQABMA0GCSqGSIb3DQEBCwUAA4IBAQCNUlCnFUEWaCCFAti0hfa7uH2lIK/zr3M60G4mUhWP0vdz24mV0KAbNpdoGeLZOUIUzJtoPXwy3+KwuVqvTPhk7Yy8H+t/TeKTfT2FZgMKWmzWzNXA9ZgYySQdMZBw/NVejhyTqjquIGHOike4j/Pva/8hGpCkSSKZo2OZC4HGzEE46UdnOVfrqHoEWPrQ7lKiceNM4NJ5wY1S0dR0g4LTOESyP3tBftsXp6S9LYtjjefvDWf3oor+UbYwIo2DE+Oi1Rs2iUFunezhQeRpB2Y3lO3c61DLd+2/A7ngujCfyblM/nJ2Dn+ophyAShzMaFTUtOrmMntjwfa2tTDZTXoO

apiVersion: v1
kind: Service
metadata:
namespace: cattle-system
name: cattle-service
labels:
app: cattle
spec:
ports:
- port: 80
targetPort: 80
protocol: TCP
name: http
selector:
app: cattle

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
namespace: cattle-system
name: cattle-ingress-http
annotations:
nginx.ingress.kubernetes.io/proxy-connect-timeout: "30"
nginx.ingress.kubernetes.io/proxy-read-timeout: "1800" # Max time in seconds for ws to remain shell window open
nginx.ingress.kubernetes.io/proxy-send-timeout: "1800" # Max time in seconds for ws to remain shell window open
nginx.ingress.kubernetes.io/ssl-redirect: "false" # Disable redirect to ssl
spec:
rules:
- host: me.rancher.com
http:
paths:
- backend:
serviceName: cattle-service
servicePort: 80

kind: Deployment
apiVersion: extensions/v1beta1
metadata:
namespace: cattle-system
name: cattle
spec:
replicas: 1
template:
metadata:
labels:
app: cattle
spec:
serviceAccountName: cattle-admin
containers:
- image: rancher/rancher:latest
imagePullPolicy: Always
name: cattle-server
ports:
- containerPort: 80
protocol: TCP
volumeMounts:
- mountPath: /etc/rancher/ssl
name: cattle-keys-volume
readOnly: true
volumes:
- name: cattle-keys-volume
secret:
defaultMode: 420
secretName: cattle-keys-server
services:
etcd:
snapshot: true # enables recurring etcd snapshots
creation: 6h0s # time increment between snapshots
retention: 24h # time increment before snapshot purge

Steps to Reproduce:

  1. install HA rancher
  2. run ./rke_linux-amd64 etcd snapshot-save --name backup.db --config rancher-cluster.yml
    Results:
    FATA[0051] Failed to take etcd snapshot exit code [1]: Error waiting for container [etcd-snapshot-once] on host [192.168.11.179]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
@alena1108 alena1108 added this to the v0.1.11 milestone Sep 21, 2018
@mitchellmaler
Copy link

I just ran into this same issue. Running rke 0.1.11

WARN[0000] Name of the snapshot is not specified using [rke_etcd_snapshot_2018-11-06T00:31:25-06:00]
INFO[0000] Starting saving snapshot on etcd hosts
INFO[0000] [dialer] Setup tunnel for host [10.144.10.143]
WARN[0000] Unsupported Docker version found [18.06.1-ce], supported versions are [1.11.x 1.12.x 1.13.x 17.03.x]
INFO[0000] [dialer] Setup tunnel for host [10.144.6.136]
WARN[0000] Unsupported Docker version found [18.06.1-ce], supported versions are [1.11.x 1.12.x 1.13.x 17.03.x]
INFO[0000] [dialer] Setup tunnel for host [10.144.2.138]
WARN[0001] Unsupported Docker version found [18.06.1-ce], supported versions are [1.11.x 1.12.x 1.13.x 17.03.x]
INFO[0001] [etcd] Saving snapshot [rke_etcd_snapshot_2018-11-06T00:31:25-06:00] on host [10.144.10.143]
INFO[0001] [etcd] Successfully started [etcd-snapshot-once] container on host [10.144.10.143]
FATA[0051] Failed to take etcd snapshot exit code [1]: Error waiting for container [etcd-snapshot-once] on host [10.144.10.143]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

the logs from the container

$ docker logs etcd-snapshot-once
time="2018-11-06T06:31:27Z" level=info msg="Initializing Rolling Backups" creation=5m0s retention=24h0m0s
time="2018-11-06T06:32:17Z" level=info msg="Created backup" name="rke_etcd_snapshot_2018-11-06T00:31:25-06:00" runtime=49.659437471s

Sometimes I can get it to move to the next node but that one then gets the same error. Is it like a timeout error or something?

@mitchellmaler
Copy link

Ran it again and it made it to the second node

WARN[0000] Name of the snapshot is not specified using [rke_etcd_snapshot_2018-11-06T00:35:21-06:00]
INFO[0000] Starting saving snapshot on etcd hosts
INFO[0000] [dialer] Setup tunnel for host [10.144.10.143]
WARN[0000] Unsupported Docker version found [18.06.1-ce], supported versions are [1.11.x 1.12.x 1.13.x 17.03.x]
INFO[0000] [dialer] Setup tunnel for host [10.144.6.136]
WARN[0000] Unsupported Docker version found [18.06.1-ce], supported versions are [1.11.x 1.12.x 1.13.x 17.03.x]
INFO[0000] [dialer] Setup tunnel for host [10.144.2.138]
WARN[0001] Unsupported Docker version found [18.06.1-ce], supported versions are [1.11.x 1.12.x 1.13.x 17.03.x]
INFO[0001] [etcd] Saving snapshot [rke_etcd_snapshot_2018-11-06T00:35:21-06:00] on host [10.144.10.143]
INFO[0001] [etcd] Successfully started [etcd-snapshot-once] container on host [10.144.10.143]
INFO[0050] [etcd] Saving snapshot [rke_etcd_snapshot_2018-11-06T00:35:21-06:00] on host [10.144.6.136]
INFO[0051] [etcd] Successfully started [etcd-snapshot-once] container on host [10.144.6.136]
FATA[0101] Failed to take etcd snapshot exit code [1]: Error waiting for container [etcd-snapshot-once] on host [10.144.6.136]: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

@alena1108
Copy link

@mitchellmaler please reopen the issue if you see it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants