Skip to content
This repository has been archived by the owner on Oct 11, 2023. It is now read-only.

RancherOS on ESXi 6.5: Waiting for SSH #2283

Closed
ivanfilippov opened this issue Mar 12, 2018 · 10 comments
Closed

RancherOS on ESXi 6.5: Waiting for SSH #2283

ivanfilippov opened this issue Mar 12, 2018 · 10 comments

Comments

@ivanfilippov
Copy link

ivanfilippov commented Mar 12, 2018

RancherOS Version: (ros os version)
v1.2.0

Where are you running RancherOS? (docker-machine, AWS, GCE, baremetal, etc.)
VMWare ESXi 6.5

I'm trying to launch RancherOS on ESXi 6.5 via the Rancher Web UI but I'm getting stuck on "Waiting for SSH". I've tried to launch an instance via docker-machine manually and get the same problem.

I believe the issue is that docker-machine is unable to inject its SSH keys into RancherOS, but I'm not sure where to go from there.

Does anyone have any ideas?

docker-machine command:

docker-machine -D create \
	--engine-registry-mirror "http://???" \
	--driver vmwarevsphere \
	--vmwarevsphere-username ???\
	--vmwarevsphere-password ???\
	--vmwarevsphere-vcenter ??? \
	--vmwarevsphere-network ??? \
	--vmwarevsphere-datastore ??? \
	--vmwarevsphere-disk-size 10000 \
	--vmwarevsphere-cpu-count 2 \
	--vmwarevsphere-memory-size 2048 \
	--vmwarevsphere-boot2docker-url "http://releases.rancher.com/os/latest/rancheros.iso" \
	--vmwarevsphere-cloudinit "https://???" \
	rancheros-test

docker-machine debug log:

Docker Machine Version:  0.14.0, build 89b8332
Found binary path at /usr/local/bin/docker-machine
Launching plugin server for driver vmwarevsphere
Plugin server listening at address 127.0.0.1:37875
() Calling .GetVersion
Using API Version  1
() Calling .SetConfigRaw
() Calling .GetMachineName
(flag-lookup) Calling .GetMachineName
(flag-lookup) Calling .DriverName
(flag-lookup) Calling .GetCreateFlags
Found binary path at /usr/local/bin/docker-machine
Launching plugin server for driver vmwarevsphere
Plugin server listening at address 127.0.0.1:39809
() Calling .GetVersion
Using API Version  1
() Calling .SetConfigRaw
() Calling .GetMachineName
(rancheros-test) Calling .GetMachineName
(rancheros-test) Calling .DriverName
(rancheros-test) Calling .GetCreateFlags
(rancheros-test) Calling .SetConfigFromFlags
Reading certificate data from /root/.docker/machine/certs/ca.pem
Decoding PEM data...
Parsing certificate...
Reading certificate data from /root/.docker/machine/certs/cert.pem
Decoding PEM data...
Parsing certificate...
Running pre-create checks...
(rancheros-test) Calling .PreCreateCheck
(rancheros-test) DBG | Connecting to vSphere for pre-create checks...
(rancheros-test) Calling .GetConfigRaw
Creating machine...
(rancheros-test) Calling .Create
(rancheros-test) Downloading /root/.docker/machine/cache/boot2docker.iso from http://releases.rancher.com/os/latest/rancheros.iso...
(rancheros-test) 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
(rancheros-test) Generating SSH Keypair...
(rancheros-test) Creating VM...
(rancheros-test) Uploading Boot2docker ISO ...
(rancheros-test) adding network: vlan45-docker
(rancheros-test) Reconfiguring VM
(rancheros-test) setting guestinfo.cloud-init.data.url to https://???
(rancheros-test) 
(rancheros-test) Waiting for VMware Tools to come online...
(rancheros-test) Provisioning certs and ssh keys...
(rancheros-test) DBG | Creating Tar key bundle...
(rancheros-test) Calling .GetConfigRaw
(rancheros-test) Calling .DriverName
(rancheros-test) Calling .DriverName
Waiting for machine to be running, this may take a few minutes...
(rancheros-test) Calling .GetState
Detecting operating system of created instance...
Waiting for SSH to be available...
Getting to WaitForSSH function...
(rancheros-test) Calling .GetSSHHostname
(rancheros-test) Calling .GetSSHPort
(rancheros-test) Calling .GetSSHKeyPath
(rancheros-test) Calling .GetSSHKeyPath
(rancheros-test) Calling .GetSSHUsername
Using SSH client type: external
Using SSH private key: /root/.docker/machine/machines/rancheros-test/id_rsa (-rw-------)
&{[-F /dev/null -o PasswordAuthentication=no -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none docker@172.16.???.??? -o IdentitiesOnly=yes -i /root/.docker/machine/machines/rancheros-test/id_rsa -p 22] /usr/bin/ssh <nil>}
About to run SSH command:
exit 0
SSH cmd err, output: exit status 255: 
Error getting ssh command 'exit 0' : ssh command error:
command : exit 0
err     : exit status 255
output  : 
...
this repeats for a few minutes
...
Error creating machine: Error detecting OS: Too many retries waiting for SSH to be available.  Last error: Maximum number of retries (60) exceeded
notifying bugsnag: [Error creating machine: Error detecting OS: Too many retries waiting for SSH to be available.  Last error: Maximum number of retries (60) exceeded]
@niusmallnan
Copy link
Contributor

niusmallnan commented Apr 27, 2018

From Darren:

we really need ros working with docker-machine with the vsphere driver. so ros works with docker-machine on your laptop and ros works on vsphere, but ros + docker-machine + vsphere does not work. it provisions but fails to set the SSH keys (which is done through the vmtools). I'd like to default vsphere driver in rancher to ros. Currently it is boot2docker which is multiple levels of bad.

Maybe ros also should work with vmwarefusion driver.

@Jason-ZW Please help to track this.

@omniproc
Copy link

Same issue here (altought vSphere 6.7). Will be looking into this abit more and let you know if I find out anything more then what's been already posted.

@niusmallnan
Copy link
Contributor

niusmallnan commented May 3, 2018

For vSphere and vmfusion, we do still need to do some extra work:

  1. Need to build a standalone ISO, built-in open-vm-tools.

  2. The vsphere/vmfusion drivers need to copy userdata.tar to vm, and run some scripts to make sure the ssh key is in effect. But RancherOS runs open-vm-tools in a container, we should make sure these scripts can work well.
    This part may involve modifying the docker-machine.

  3. We have some b2d magic logic, it needs to be compatible with vmware. The autoformat logic should work fine.

@Jason-ZW
Copy link

Jason-ZW commented May 3, 2018

#2348
rancher/os-services#157
To solve the docker-machine SSH problem, we also need to modify the docker-machine rancheros provision code.
When docker-machine using VMWare related driver, the open-vm-tools service is enable by default.
In order to solved Device or resource busy problem when rancheros provision restart user-docker container we need to add a logic to stop the open-vm-tools container in docker-machine rancheros provision.
@JacieChao We can refer to the following code to add a stop logic when you have time: https://github.com/docker/machine/blob/master/libmachine/provision/utils.go#L117
@niusmallnan We also need to re-publish open-vm-tools image when these pull request has been merged.

@xayangjing
Copy link

I have met ssh time out problem when I tested docker-machine + rancheros v1.1.2 and 1.3.0

@xayangjing
Copy link

vSphere verison 6.0u3

@xayangjing
Copy link

Error creating machine: Error detecting OS: Too many retries waiting for SSH to be available. Last error: Maximum number of retries (60) exceeded

@niusmallnan
Copy link
Contributor

@xayangjing this issue is in progress, please be patient.

@Jason-ZW
Copy link

Jason-ZW commented May 18, 2018

These PRs will fix share-folder mount problem when machine-driver is vmwarefusion.
rancher/machine#9
docker/machine#4480

@niusmallnan niusmallnan added this to the v1.4.0 milestone May 20, 2018
@kingsd041
Copy link
Contributor

Fixed in rancheros v1.4.0-rc2, I have verified on ESXi 6.7 available

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants