docs: health api docs first pass

elastic · Oct 7, 2024 · aa6ede4 · aa6ede4
1 parent db6017c
commit aa6ede4
Show file tree

Hide file tree

Showing 2 changed files with 158 additions and 6 deletions.
diff --git a/docs/static/monitoring/monitoring-apis.asciidoc b/docs/static/monitoring/monitoring-apis.asciidoc
@@ -2,13 +2,13 @@
 [[monitoring]]
 == APIs for monitoring {ls}
 
-{ls} provides monitoring APIs for retrieving runtime metrics
-about {ls}:
+{ls} provides monitoring APIs for retrieving runtime information about {ls}:
 
 * <<node-info-api>>
 * <<plugins-api>>
 * <<node-stats-api>>
 * <<hot-threads-api>>
+* <<logstash-health-report-api>>
 
 
 You can use the root resource to retrieve general information about the Logstash instance, including
@@ -1184,3 +1184,155 @@ Example of a human-readable response:
 	 org.jruby.internal.runtime.NativeThread.join(NativeThread.java:75)
 
 --------------------------------------------------
+
+
+[[logstash-health-report-api]]
+=== Health report API
+
+An API that reports the health status of Logstash.
+
+[source,js]
+--------------------------------------------------
+curl -XGET 'localhost:9600/_health_report?pretty'
+--------------------------------------------------
+
+==== Description
+
+The health API returns a report with the health status of Logstash and the pipelines that are running inside of it.
+The report contains a list of indicators that compose Logstash functionality.
+
+Each indicator has a health status of: `green`, `unknown`, `yellow`, or `red`.
+The indicator will provide an explanation and metadata describing the reason for its current health status.
+
+The top-level status is controlled by the worst indicator status.
+
+In the event that an indicator's status is non-green, a list of impacts may be present in the indicator result which detail the functionalities that are negatively affected by the health issue.
+Each impact carries with it a severity level, an area of the system that is affected, and a simple description of the impact on the system.
+
+Some health indicators can determine the root cause of a health problem and prescribe a set of steps that can be performed in order to improve the health of the system.
+The root cause and remediation steps are encapsulated in a `diagnosis`.
+A diagnosis contains a cause detailing a root cause analysis, an action containing a brief description of the steps to take to fix the problem, and the URL for detailed troubleshooting help.
+
+NOTE: The health indicators perform root cause analysis of non-green health statuses.
+      This can be computationally expensive when called frequently.
+
+==== Response body
+
+`status`::
+(Optional, string) Health status of {ls}, based on the aggregated status of all indicators. Statuses are:
+
+`green`:::
+{ls} is healthy.
+
+`unknown`:::
+The health of {ls} could not be determined.
+
+`yellow`:::
+The functionality of {ls} is in a degraded state and may need remediation to avoid the health becoming `red`.
+
+`red`:::
+{ls} is experiencing an outage or certain features are unavailable for use.
+
+`indicators`::
+(object) Information about the health of the {ls} indicators.
+
++
+.Properties of `indicators`
+[%collapsible%open]
+====
+`<indicator>`::
+(object) Contains health results for an indicator.
++
+.Properties of `<indicator>`
+[%collapsible%open]
+=======
+`status`::
+(string) Health status of the indicator. Statuses are:
+
+`green`:::
+The indicator is healthy.
+
+`unknown`:::
+The health of the indicator could not be determined.
+
+`yellow`:::
+The functionality of an indicator is in a degraded state and may need remediation to avoid the health becoming `red`.
+
+`red`:::
+The indicator is experiencing an outage or certain features are unavailable for use.
+
+`symptom`::
+(string) A message providing information about the current health status.
+
+`details`::
+(Optional, object) An object that contains additional information about the indicator that has lead to the current health status result.
+Each indicator has <<logstash-health-api-response-details, a unique set of details>>.
+
+`impacts`::
+(Optional, array) If a non-healthy status is returned, indicators may include a list of impacts that this health status will have on {ls}.
++
+.Properties of `impacts`
+[%collapsible%open]
+========
+`severity`::
+(integer) How important this impact is to the functionality of {ls}.
+A value of 1 is the highest severity, with larger values indicating lower severity.
+
+`description`::
+(string) A description of the impact on {ls}.
+
+`impact_areas`::
+(array of strings) The areas {ls} functionality that this impact affects.
+Possible values are:
++
+--
+* `pipeline_execution`
+--
+
+========
+
+`diagnosis`::
+(Optional, array) If a non-healthy status is returned, indicators may include a list of diagnosis that encapsulate the cause of the health issue and an action to take in order to remediate the problem.
++
+.Properties of `diagnosis`
+[%collapsible%open]
+========
+`cause`::
+(string) A description of a root cause of this health problem.
+
+`action`::
+(string) A brief description the steps that should be taken to remediate the problem.
+A more detailed step-by-step guide to remediate the problem is provided by the `help_url` field.
+
+`help_url`::
+(string) A link to the troubleshooting guide that'll fix the health problem.
+========
+=======
+====
+
+[role="child_attributes"]
+[[logstash-health-api-response-details]]
+==== Indicator Details
+
+Each health indicator in the health API returns a set of details that further explains the state of the system.
+The details have contents and a structure that is unique to each indicator.
+
+[[logstash-health-api-response-details-pipeline]]
+===== Pipeline Indicator Details
+
+`+pipelines/indicators/<pipeline_id>/details+`::
+(object) Information about the specified pipeline.
++
+.Properties of `+pipelines/indicators/<pipeline_id>/details+`
+[%collapsible%open]
+====
+`status`::
+(object) Details related to the pipeline's current status and run-state.
++
+.Properties of `status`
+[%collapsible%open]
+========
+`state`::
+(string) The current state of the pipeline, including whether it is `loading`, `running`, `finished`, or `terminated`.
+========
+====
diff --git a/docs/static/troubleshoot/health-pipeline-status.asciidoc b/docs/static/troubleshoot/health-pipeline-status.asciidoc
@@ -4,7 +4,7 @@
 The Pipeline indicator has a `status` probe that is capable of producing one of several diagnoses about the pipeline's lifecycle, indicating whether the pipeline is currently running.
 
 [[health-report-pipeline-status-diagnosis-loading]]
-==== [[loading]] Loading Pipeline
+==== Loading Pipeline
 
 A pipeline that is loading is not yet processing data, and is considered a temporarily-degraded pipeline state.
 Some plugins perform actions or pre-validation that can delay the starting of the pipeline, such as when a plugin pre-establishes a connection to an external service before allowing the pipeline to start.
@@ -13,7 +13,7 @@ When these plugins take significant time to start up, the whole pipeline can rem
 If your pipeline does not come up in a reasonable amount of time, consider checking the Logstash logs to see if the plugin shows evidence of being caught in a retry loop.
 
 [[health-report-pipeline-status-diagnosis-finished]]
-==== [[finished]] Finished Pipeline
+==== Finished Pipeline
 
 A logstash pipeline whose input plugins have all completed will be shut down once events have finished processing.
 
@@ -22,15 +22,15 @@ A finished pipeline will not produce or process any more events until it is rest
 If you wish to keep your pipeline runing, consider configuring its input to run on a schedule or otherwise listen for new events.
 
 [[health-report-pipeline-status-diagnosis-terminated]]
-==== [[terminated]] Terminated Pipeline
+==== Terminated Pipeline
 
 When a Logstash pipeline's filter or output plugins crash, the entire pipeline is terminated and intervention is required.
 
 A terminated pipeline will not produce or process any more events until it is restarted, which will occur if the pipeline's definition is changed and pipeline reloads are enabled.
 Check the logs to determine the cause of the crash, and report the issue to the plugin maintainers.
 
 [[health-report-pipeline-status-diagnosis-unknown]]
-==== [[unknown]] Unknown Pipeline
+==== Unknown Pipeline
 
 When a Logstash pipeline either cannot be created or has recently been deleted the health report doesn't know enough to produce a meaningful status.