forked from elastic/elasticsearch
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
(Doc+) Capture Elasticsearch diagnostic
- Loading branch information
1 parent
a561958
commit 3f83dd0
Showing
1 changed file
with
115 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
[[diagnostic]] | ||
=== Diagnostic | ||
++++ | ||
<titleabbrev>Capturing Diagnostic</titleabbrev> | ||
++++ | ||
:keywords: Elasticsearch diagnostic, diagnostics | ||
|
||
An https://github.com/elastic/support-diagnostics[{es} diagnostic] allows | ||
you to capture a point-in-time snapshot of cluster statistics and most settings. | ||
It works against all {es} versions and requires JRE/JDK ≥v1.8. It is | ||
useful when escalting to https://support.elastic.co[Elastic Support] or | ||
https://discuss.elastic.co[Elastic Discuss] to minimize turnaround time. | ||
It's point-in-time view is also useful when troubleshooting, see | ||
https://www.elastic.co/blog/why-does-elastic-support-keep-asking-for-diagnostic-files[this | ||
for examples]. | ||
|
||
[TIP] | ||
==== | ||
The {es} diagnostic is included as a sub-library within Elastic's platforms: | ||
* {ece} which you can pull under {ece} > Deployment > Operations > | ||
Prepare Bundle > {es}. | ||
* {eck}'s https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-take-eck-dump.html[diagnostic] | ||
pulls this by default. | ||
==== | ||
|
||
[discrete] | ||
[[diagnostic-capture]] | ||
==== Capture | ||
|
||
To capture an {es} diagnostic: | ||
|
||
. Download latest `diagnostics-X.X.X-dist.zip` (_not_ the "source code") file | ||
from https://github.com/elastic/support-diagnostics/releases/latest[its | ||
latest releases]. We will reference the unzipped execution file below as | ||
`./diagnostics.sh` below which is for Unix-based systems though Windows will | ||
replace this for `.\diagnostics.bat`. | ||
|
||
. There's https://github.com/elastic/support-diagnostics#diagnostic-types[three | ||
available `type`'s'] to capture your {es} diagnostic. | ||
|
||
** `local` (default, **recommended**): polls the <<rest-apis,{es} API>>, | ||
gathers Operating System info, and captures cluster and GC logs. | ||
Alternatively, you can use `remote` which will establish an ssh session | ||
to the applicable target server to pull the same info. | ||
|
||
** `api` polls the <<rest-apis,{es} API>> but all other data must be | ||
collected manually. | ||
|
||
. Verify network and user permissions are sufficient to connect to your {es} | ||
cluster by checking its <<cluster-health,Cluster Health>>. For example, | ||
for `host:localhost`, `port:9200`, and `username:elastic` this would curl as: | ||
+ | ||
[source,sh] | ||
--- | ||
curl -X GET -k -u elastic -p https://localhost:9200/_cluster/health | ||
--- | ||
|
||
. You're expecting an HTTP 200 `OK` response that reports the cluster's | ||
`status`. If you can't successfully curl your {es} host, please | ||
pause and review the resulting error as the diagnostic will potentially | ||
not have the expected results. Outlining common errors and their next steps: | ||
|
||
** HTTP 401 `UNAUTHENTICATED`: the error will usually tell you either | ||
that your `username:password` pair is invalid or that your `.security` | ||
index is unavailable and you'll need to setup a temporary | ||
<<file-realm,file-based realm>> user with `role:superuser` to authenticate. | ||
|
||
** HTTP 403 `UNAUTHORIZED`: your attempted `username` is recognized but | ||
has insufficient permissions to run the diagnostic. Either use a different | ||
username or elevate this user's privileges. | ||
|
||
** HTTP 429 `TOO_MANY_REQUESTS` (for example `circuit_breaking_exception`): | ||
your username authenticated and authorized but the cluster is under | ||
sufficiently high strain that it's not responding to API calls. These | ||
responses are usually hit and miss, so potentially indicate that you can | ||
proceed with running the diagnostic (which will pull what it can). | ||
|
||
** HTTP 504 `BAD_GATEWAY`: your network is experiencing issues reaching | ||
the cluster (for example because of proxy or firewall). You might | ||
change where you attempt from, confirm your port, or attempt targeting | ||
the host's IP instead of its URL domain. | ||
|
||
** HTTP 503 `SERVICE_UNAVAILABLE` (for example `master_not_discovered_exception`): | ||
your cluster does not currently have an elected master node (which is | ||
required for it to be API-responsive). This may be temporary while master | ||
node rotates. Otherwise, do not run Step#5 but pivot towards investigating | ||
and first resolve <<cluster-fault-detection,cluster fault detection>> | ||
before proceeding. | ||
|
||
. Once you have a working curl request, use those same parameters to fill-in | ||
the https://github.com/elastic/support-diagnostics#standard-options[diagnostic | ||
parameters]. From our example, most common results will appear: | ||
+ | ||
[source,sh] | ||
--- | ||
sudo ./diagnostics.sh --type local --host localhost --port 9200 -u elastic -p --bypassDiagVerify --ssl --noVerify | ||
--- | ||
|
||
. Once this script has completed, verify no errors emitted in the | ||
`diagnostic.log`. Common errors to resolve: | ||
|
||
** `Error: Could not find or load main class com.elastic.support.diagnostics.DiagnosticApp` | ||
indicates that you accidentally downloaded the "source code" file | ||
instead of the diagnostic in Step#1 above. | ||
|
||
** `Could not retrieve the {es} version due to a system or network error - unable to continue.` | ||
indicates an issue for the diagnostic to curl the cluster. You should | ||
expect either Step#3 failed or there's a parameter disconnect between | ||
Step#3 and Step#5 above. | ||
|
||
** `security_exception` with `is unauthorized for user` suggests | ||
insufficient admin permissions to run the diagnostic tool and another | ||
user should be used or current user granted `role:superuser` privileges | ||
to run diagnostic. |