forked from pivotal-cf/docs-pcf-install
-
Notifications
You must be signed in to change notification settings - Fork 1
/
trouble-advanced.html.md.erb
375 lines (273 loc) · 12.9 KB
/
trouble-advanced.html.md.erb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
---
title: Advanced Troubleshooting with the BOSH CLI
owner: Ops Manager
---
<strong><%= modified_date %></strong>
_This page assumes you are running cf CLI v6._
You must log into the BOSH Director and run specific commands using the BOSH
Command Line Interface (CLI) to perform advanced troubleshooting.
The BOSH Director runs on the virtual machine (VM) that Ops Manager deploys on the first install of the Ops Manager Director tile.
BOSH Director diagnostic commands have access to information about your entire
[Pivotal Cloud Foundry®](https://network.pivotal.io/products/pivotal-cf)
(PCF) installation.
<p class="note"><strong>Note</strong>: For more troubleshooting information, refer to the <a href="./troubleshooting.html">Troubleshooting Guide</a>.</p>
## <a id='prepare'></a>Prepare to Use the BOSH CLI ##
This section guides you through preparing to use the BOSH CLI.
### <a id='gather'></a>Gather Information ###
Before you begin troubleshooting with the BOSH CLI, collect the information you
need from the Ops Manager interface.
You must log out of the Ops Manager interface to use the BOSH CLI.
1. Open the Ops Manager interface. Ensure that there are no installations or
updates in progress.
1. Click the **Ops Manager Director** tile and select the **Status** tab.
1. Record the IP address for the Ops Manager Director job. This is the IP
address of the VM where the BOSH Director runs.
<%= image_tag("ops-mgr-job-ip.png") %>
1. Select the **Credentials** tab.
1. Record the BOSH Director credentials.
<%= image_tag("bosh-creds.png") %>
1. Return to the **Installation Dashboard**.
1. **(Optional)** To prepare to troubleshoot the job VM for any other product,
click the product tile and repeat the procedure above to record the IP address
and VM credentials for that job VM.
1. Log out of Ops Manager.
### <a id='ssh'></a>SSH into Ops Manager ###
Use SSH to connect to the Ops Manager web application VM.
To SSH into the Ops Manager VM:
**vSphere:**
You will need the credentials used to import the PCF .ova or .ovf file into
your virtualization system.
1. From a command line, run `ssh ubuntu@OPS-MANAGER-IP-ADDRESS`.
1. When prompted, enter the password that you set during the .ova deployment
into vCenter:
<pre class='terminal'>
$ ssh ubuntu@10.0.0.6
Password: ***********
</pre>
**AWS and OpenStack:**
1. Locate the Ops Manager public IP on the AWS EC2 instances page or the
OpenStack Access & Security page.
1. Change the permissions on the `.pem` file to be more restrictive:
<pre class="terminal">
$ chmod 600 ops_mgr.pem
</pre>
1. Run the `ssh` command:
<pre class="terminal">
ssh -i ops_mgr.pem ubuntu@OPS-MGR-IP
</pre>
### <a id='log-in'></a>Log into the BOSH Director ###
1. To use the BOSH CLI without installing the BOSH CLI gem, set the
`BUNDLE_GEMFILE` environment variable to point to the BOSH Gemfile for the Ops
Manager interface. The following command creates an alias that sets
`BUNDLE_GEMFILE` and calls `bundle exec bosh`. This alias allows you to run
`bosh COMMAND` instead of `bundle exec bosh COMMAND`.
<pre class='terminal'>
$ alias bosh="BUNDLE_GEMFILE=/home/tempest-web/tempest/web/bosh.Gemfile bundle exec bosh"
</pre>
<p class="note"><strong>Note</strong>: This alias and the manual change to the <code>BUNDLE_GEMFILE</code> environment variable last only until the end of the current session.</p>
1. Verify that no BOSH processes are running on the Ops Manager VM. You should
not proceed with troubleshooting until all BOSH processes have completed or you
have ended them.
1. Run `bosh target OPS-MANAGER-DIRECTOR-IP-ADDRESS` to target your Ops Manager
VM using the BOSH CLI.
1. Log in using the BOSH Director credentials:
<pre class='terminal'>
$ bosh target 10.0.0.6
Target set to 'Ops Manager'
Your username: director
Enter password: *****
Logged in as 'director'
</pre>
### <a id='product'></a>Select a Product Deployment to Troubleshoot ###
When you import and install a product using Ops Manager, you deploy an instance
of the product described by a YAML file. Examples of available products include
Elastic Runtime, MySQL, or any other service that you imported and installed.
Perform the following steps to select a product deployment to troubleshoot:
1. Identify the YAML file that describes the deployment you want to
troubleshoot.
You identify the YAML file that describes a deployment by its filename.
For example, to identify Elastic Runtime deployments, run the following
command:
`find /var/tempest/workspaces/default/deployments -name cf-*.yml`
The table below shows the naming conventions for deployment files.
<table border="1" class="nice" >
<tr>
<th>Product</th><th>Deployment Filename Convention</th>
</tr>
<tr>
<td>Elastic Runtime</td>
<td>cf-<20-character_random_string>.yml</td>
</tr>
<tr>
<td>MySQL Dev</td>
<td>cf_services-<20-character_random_string>.yml</td>
</tr>
<tr>
<td>Other</td>
<td><20-character_random_string>.yml</td>
</tr>
</table>
<p class="note"><strong>Note</strong>: Where there is more than one installation of the same product, record the release number shown on the product tile in Operations Manager. Then, from the YAML files for that product, find the deployment that specifies the same release version as the product tile.</p>
1. Run `bosh status` and record the UUID value.
1. Open the `DEPLOYMENT-FILENAME.yml` file in a text editor and compare the
`director_uuids` value in this file with the UUID value that you recorded. If
the values do not match, perform the following steps:
1. Replace the `director_uuids` value with the UUID value.
1. Run `bosh deployment DEPLOYMENT-FILENAME.yml` to reset the file for your
deployment.
1. Run `bosh deployment DEPLOYMENT-FILENAME.yml` to instruct the BOSH Director
to apply BOSH CLI commands against the deployment described by the YAML file that you identified:
<pre class='terminal'>
$ bosh deployment /var/tempest/workspaces/default/deployments/cf-cca1234abcd.yml
</pre>
## <a id='cli'></a>Use the BOSH CLI for Troubleshooting##
This section describes three BOSH CLI commands commonly used during
troubleshooting.
* **VMS**: Lists all VMs in a deployment
* **Cloudcheck**: Runs a cloud consistency check and interactive repair
* **SSH**: Starts an interactive session or executes commands with a VM
### <a id='vms'></a>BOSH VMS ###
`bosh vms` provides an overview of the virtual machines that BOSH manages as part of the current deployment.
<pre class='terminal'>
$ bosh vms
Deployment `cf-66987630724a2c421061'
Director task 99
Task 99 done
+----------------------+---------+--------------------+---------------+
| Job/index | State | Resource Pool | IPs |
+----------------------+---------+--------------------+---------------+
| unknown/unknown | running | push-apps-manager | 192.168.86.13 |
| unknown/unknown | running | smoke-tests | 192.168.86.14 |
| cloud_controller/0 | running | cloud_controller | 192.168.86.23 |
| collector/0 | running | collector | 192.168.86.25 |
| consoledb/0 | running | consoledb | 192.168.86.29 |
| dea/0 | running | dea | 192.168.86.47 |
| health_manager/0 | running | health_manager | 192.168.86.20 |
| loggregator/0 | running | loggregator | 192.168.86.31 |
| loggregator_router/0 | running | loggregator_router | 192.168.86.32 |
| nats/0 | running | nats | 192.168.86.19 |
| nfs_server/0 | running | nfs_server | 192.168.86.21 |
| router/0 | running | router | 192.168.86.16 |
| saml_login/0 | running | saml_login | 192.168.86.28 |
| syslog/0 | running | syslog | 192.168.86.24 |
| uaa/0 | running | uaa | 192.168.86.27 |
| uaadb/0 | running | uaadb | 192.168.86.26 |
+----------------------+---------+--------------------+---------------+
VMs total: 15
Deployment `cf_services-2c3c918a135ab5f91ee1'
Director task 100
Task 100 done
+-----------------+---------+---------------+---------------+
| Job/index | State | Resource Pool | IPs |
+-----------------+---------+---------------+---------------+
| mysql_gateway/0 | running | mysql_gateway | 192.168.86.52 |
| mysql_node/0 | running | mysql_node | 192.168.86.53 |
+-----------------+---------+---------------+---------------+
VMs total: 2
</pre>
When troubleshooting an issue with your deployment, `bosh vms` may show a VM in
an **unknown** state.
Run [bosh cloudcheck](#cck) on a VM in an **unknown** state to instruct BOSH to
diagnose problems with the VM.
You can also run `bosh vms` to identify VMs in your deployment, then use the
[bosh ssh](#ssh) command to SSH into an identified VM for further
troubleshooting.
`bosh vms` supports the following arguments:
* `--details`: Report also includes Cloud ID, Agent ID, and whether or not the BOSH Resurrector has been enabled for each VM
* `--vitals`: Report also includes load, CPU, memory usage, swap usage, system disk usage, ephemeral disk usage, and persistent disk usage for each VM
* `--dns`: Report also includes the DNS A record for each VM
<p class="note"><strong>Note</strong>: The <strong>Status</strong> tab of the <strong>Elastic Runtime</strong> product tile displays information similar to the <code>bosh vms</code> output.</p>
### <a id='cck'></a>BOSH Cloudcheck ###
Run the `bosh cloudcheck` command to instruct BOSH to detect differences
between the VM state database maintained by the BOSH Director and the actual
state of the VMs. For each difference detected, `bosh cloudcheck` can offer the
following repair options:
* `Reboot VM`: Instructs BOSH to reboot a VM. Rebooting can resolve many
transient errors.
* `Ignore problem`: Instructs BOSH to do nothing.
You may want to ignore a problem in order to run `bosh ssh` and attempt
troubleshooting directly on the machine.
* `Reassociate VM with corresponding instance`: Updates the BOSH Director state
database.
Use this option if you believe that the BOSH Director state database is in
error and that a VM is correctly associated with a job.
* `Recreate VM using last known apply spec`: Instructs BOSH to destroy the
server and recreate it from the deployment manifest that the installer
provides. Use this option if a VM is corrupted.
* `Delete VM reference`: Instructs BOSH to delete a VM reference in the
Director state database. If a VM reference exists in the state database, BOSH
expects to find an agent running on the VM.
Select this option only if you know that this reference is in error.
Once you delete the VM reference, BOSH can no longer control the VM.
####Example Scenarios ####
**Unresponsive Agent**
<pre class='terminal'>
$ bosh cloudcheck
ccdb/0 (vm-3e37133c-bc33-450e-98b1-f86d5b63502a) is not responding:
- Ignore problem
- Reboot VM
- Recreate VM using last known apply spec
- Delete VM reference (DANGEROUS!)
</pre>
**Missing VM**
<pre class='terminal'>
$ bosh cloudcheck
VM with cloud ID `vm-3e37133c-bc33-450e-98b1-f86d5b63502a' missing:
- Ignore problem
- Recreate VM using last known apply spec
- Delete VM reference (DANGEROUS!)
</pre>
**Unbound Instance VM**
<pre class='terminal'>
$ bosh cloudcheck
VM `vm-3e37133c-bc33-450e-98b1-f86d5b63502a' reports itself as `ccdb/0' but does not have a bound instance:
- Ignore problem
- Delete VM (unless it has persistent disk)
- Reassociate VM with corresponding instance
</pre>
**Out of Sync VM**
<pre class='terminal'>
$ bosh cloudcheck
VM `vm-3e37133c-bc33-450e-98b1-f86d5b63502a' is out of sync:
expected `cf-d7293430724a2c421061: ccdb/0', got `cf-d7293430724a2c421061: nats/0':
- Ignore problem
- Delete VM (unless it has persistent disk)
</pre>
### <a id='bosh-ssh'></a>BOSH SSH ###
Use `bosh ssh` to SSH into the VMs in your deployment.
Follows the steps below to use `bosh ssh`:
1. Run `ssh-keygen -t rsa` to provide BOSH with the correct public key.
1. Accept the defaults.
1. Run `bosh ssh`.
1. Select a VM to access.
1. Create a password for the temporary user that the `bosh ssh` command creates.
Use this password if you need sudo access in this session.
Example:
<pre class='terminal'>
$ bosh ssh
1. ha_proxy/0
2. nats/0
3. etcd_and_metrics/0
4. etcd_and_metrics/1
5. etcd_and_metrics/2
6. health_manager/0
7. nfs_server/0
8. ccdb/0
9. cloud_controller/0
10. clock_global/0
11. cloud_controller_worker/0
12. router/0
13. uaadb/0
14. uaa/0
15. login/0
16. consoledb/0
17. dea/0
18. loggregator/0
19. loggregator_traffic_controller/0
20. push-apps-manager/0
21. smoke-tests/0
Choose an instance: 17
Enter password (use it to sudo on remote host): *******
Target deployment `cf_services-2c3c918a135ab5f91ee1'
Setting up ssh artifacts
</pre>