Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bosh Director not responding after redeployment #396

Closed
devtecher opened this issue Jul 16, 2020 · 2 comments
Closed

Bosh Director not responding after redeployment #396

devtecher opened this issue Jul 16, 2020 · 2 comments
Labels

Comments

@devtecher
Copy link

Was renewing Bosh Director certs and after removing all the expired certs (to let bosh regenerate them) and redeploy, the Bosh Director VM gets recreated but the deployment keeps failing with the error below. Seems that the bosh agent is not responding and eventually timeout.

CLI] 2020/07/16 23:58:28 ERROR - Deploying: Creating instance 'bosh/0': Waiting until instance is ready: Post https://mbus:<redacted>@10.x.x.x:6868/agent: dial tcp 10.x.x.x:6868: getsockopt: connection refused

Deploying:
  Creating instance 'bosh/0':
    Waiting until instance is ready:
      Post https://mbus:<redacted>@10.x.x.x:6868/agent: dial tcp 10.x.x.x:6868: getsockopt: connection refused
Exit code 1

I've tried redeploying many times but it is still failing with the same error. My infra is running on VSphere (v6.0) and im using the latest bosh-deployment repo (stemcell 621.76, bosh-vsphere-cpi 54.1.0). Documentation indicates its still compatible with VSphere 6.0.

I tried to connect (ssh) to the Bosh Directory VM to check the logs but its not accepting the key for some reason. The ssh service is up but its simply not accepting the "jumpbox" user key (tried setting a password instead but didnt work either). Enabled DEBUG_LOG_LEVEL but cant find anything useful there.

@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/173837211

The labels on this github issue will be updated when the story is started.

@devtecher
Copy link
Author

Manage to fix it after i removed all the certs block and some creds from the "creds.yml" file (--vars-store) and rerun bosh create-env.

  1. Following is the initial creds.yml file content when i was attempting to let bosh regenerate the expired certs but the deployment kept hitting the error i reported above:
admin_password: <redacted>
blobstore_agent_password: <redacted>
blobstore_ca:
  ca: |
  certificate: |
  private_key: |
blobstore_director_password: <redacted>
blobstore_secret: <redacted>
blobstore_server_tls:
  ca: |
  certificate: |
  private_key: |
credhub_admin_client_secret: <redacted>
credhub_ca:
  ca: |
  certificate: |
  private_key: |
credhub_cli_user_password: <redacted>
credhub_encryption_password: <redacted>
credhub_tls:
  ca: |
  certificate: |
  private_key: |
default_ca:
  ca: |
  certificate: |
  private_key: |
director_ssl:
  ca: |
  certificate: |
  private_key: |
hm_password: <redacted>
jumpbox_ssh:
  private_key: |
    <redacted>
  public_key: |
    <redacted>
mbus_bootstrap_password: <redacted>
mbus_bootstrap_ssl:
  ca: |
  certificate: |
  private_key: |
nats_ca:
  ca: |
  certificate: |
  private_key: |
nats_clients_director_tls:
  ca: |
  certificate: |
  private_key: |
nats_clients_health_monitor_tls:
  ca: |
  certificate: |
  private_key: |
nats_password: <redacted>
nats_server_tls:
  ca: |
  certificate: |
  private_key: |
postgres_password: <redacted>
uaa_admin_client_secret: <redacted>
uaa_clients_director_to_credhub: <redacted>
uaa_encryption_key_1: <redacted>
uaa_jwt_signing_key:
  private_key: |
    <redacted>
  public_key: |
    <redacted>
uaa_login_client_secret: <redacted>
uaa_service_provider_ssl:
  ca: |
  certificate: |
  private_key: |
uaa_ssl:
  ca: |
  certificate: |
  private_key: |
  1. Fixed version - the Bosh Director was successful recreated and is working (credhub data accessible) after i rerun bosh create-env with following updated creds.yml content below. All other creds (in no.1 above) were removed, only the following content remained in the file (although i doubt there will be any issue if i removed jumpbox_ssh block too).
admin_password: <redacted>
blobstore_agent_password: <redacted>
blobstore_director_password: <redacted>
blobstore_secret: <redacted>
credhub_admin_client_secret: <redacted>
credhub_cli_user_password: <redacted>
credhub_encryption_password: <redacted>
hm_password: <redacted>
jumpbox_ssh:
  private_key: |
    <redacted>
  public_key: |
    <redacted>
postgres_password: <redacted>
uaa_admin_client_secret: <redacted>
uaa_clients_director_to_credhub: <redacted>
uaa_encryption_key_1: <redacted>
uaa_jwt_signing_key:
  private_key: |
    <redacted>
  public_key: |
    <redacted>
uaa_login_client_secret: <redacted>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants