From b0a00b298bf5e57b67ea9513bd8fa6c184edf65e Mon Sep 17 00:00:00 2001 From: Ry Biesemeyer Date: Mon, 7 Oct 2024 21:14:27 +0000 Subject: [PATCH 1/3] docs: health report first pass --- .../monitoring/monitoring-apis.asciidoc | 156 +++++++++++++++++- .../health-pipeline-status.asciidoc | 38 +++++ .../troubleshoot/troubleshooting.asciidoc | 1 + 3 files changed, 193 insertions(+), 2 deletions(-) create mode 100644 docs/static/troubleshoot/health-pipeline-status.asciidoc diff --git a/docs/static/monitoring/monitoring-apis.asciidoc b/docs/static/monitoring/monitoring-apis.asciidoc index 897507d1e22..63aff91e1fe 100644 --- a/docs/static/monitoring/monitoring-apis.asciidoc +++ b/docs/static/monitoring/monitoring-apis.asciidoc @@ -2,13 +2,13 @@ [[monitoring]] == APIs for monitoring {ls} -{ls} provides monitoring APIs for retrieving runtime metrics -about {ls}: +{ls} provides monitoring APIs for retrieving runtime information about {ls}: * <> * <> * <> * <> +* <> You can use the root resource to retrieve general information about the Logstash instance, including @@ -1184,3 +1184,155 @@ Example of a human-readable response: org.jruby.internal.runtime.NativeThread.join(NativeThread.java:75) -------------------------------------------------- + + +[[logstash-health-report-api]] +=== Health report API + +An API that reports the health status of Logstash. + +[source,js] +-------------------------------------------------- +curl -XGET 'localhost:9600/_health_report?pretty' +-------------------------------------------------- + +==== Description + +The health API returns a report with the health status of Logstash and the pipelines that are running inside of it. +The report contains a list of indicators that compose Logstash functionality. + +Each indicator has a health status of: `green`, `unknown`, `yellow`, or `red`. +The indicator will provide an explanation and metadata describing the reason for its current health status. + +The top-level status is controlled by the worst indicator status. + +In the event that an indicator's status is non-green, a list of impacts may be present in the indicator result which detail the functionalities that are negatively affected by the health issue. +Each impact carries with it a severity level, an area of the system that is affected, and a simple description of the impact on the system. + +Some health indicators can determine the root cause of a health problem and prescribe a set of steps that can be performed in order to improve the health of the system. +The root cause and remediation steps are encapsulated in a `diagnosis`. +A diagnosis contains a cause detailing a root cause analysis, an action containing a brief description of the steps to take to fix the problem, and the URL for detailed troubleshooting help. + +NOTE: The health indicators perform root cause analysis of non-green health statuses. + This can be computationally expensive when called frequently. + +==== Response body + +`status`:: +(Optional, string) Health status of {ls}, based on the aggregated status of all indicators. Statuses are: + +`green`::: +{ls} is healthy. + +`unknown`::: +The health of {ls} could not be determined. + +`yellow`::: +The functionality of {ls} is in a degraded state and may need remediation to avoid the health becoming `red`. + +`red`::: +{ls} is experiencing an outage or certain features are unavailable for use. + +`indicators`:: +(object) Information about the health of the {ls} indicators. + ++ +.Properties of `indicators` +[%collapsible%open] +==== +``:: +(object) Contains health results for an indicator. ++ +.Properties of `` +[%collapsible%open] +======= +`status`:: +(string) Health status of the indicator. Statuses are: + +`green`::: +The indicator is healthy. + +`unknown`::: +The health of the indicator could not be determined. + +`yellow`::: +The functionality of an indicator is in a degraded state and may need remediation to avoid the health becoming `red`. + +`red`::: +The indicator is experiencing an outage or certain features are unavailable for use. + +`symptom`:: +(string) A message providing information about the current health status. + +`details`:: +(Optional, object) An object that contains additional information about the indicator that has lead to the current health status result. +Each indicator has <>. + +`impacts`:: +(Optional, array) If a non-healthy status is returned, indicators may include a list of impacts that this health status will have on {ls}. ++ +.Properties of `impacts` +[%collapsible%open] +======== +`severity`:: +(integer) How important this impact is to the functionality of {ls}. +A value of 1 is the highest severity, with larger values indicating lower severity. + +`description`:: +(string) A description of the impact on {ls}. + +`impact_areas`:: +(array of strings) The areas {ls} functionality that this impact affects. +Possible values are: ++ +-- +* `pipeline_execution` +-- + +======== + +`diagnosis`:: +(Optional, array) If a non-healthy status is returned, indicators may include a list of diagnosis that encapsulate the cause of the health issue and an action to take in order to remediate the problem. ++ +.Properties of `diagnosis` +[%collapsible%open] +======== +`cause`:: +(string) A description of a root cause of this health problem. + +`action`:: +(string) A brief description the steps that should be taken to remediate the problem. +A more detailed step-by-step guide to remediate the problem is provided by the `help_url` field. + +`help_url`:: +(string) A link to the troubleshooting guide that'll fix the health problem. +======== +======= +==== + +[role="child_attributes"] +[[logstash-health-api-response-details]] +==== Indicator Details + +Each health indicator in the health API returns a set of details that further explains the state of the system. +The details have contents and a structure that is unique to each indicator. + +[[logstash-health-api-response-details-pipeline]] +===== Pipeline Indicator Details + +`+pipelines/indicators//details+`:: +(object) Information about the specified pipeline. ++ +.Properties of `+pipelines/indicators//details+` +[%collapsible%open] +==== +`status`:: +(object) Details related to the pipeline's current status and run-state. ++ +.Properties of `status` +[%collapsible%open] +======== +`state`:: +(string) The current state of the pipeline, including whether it is `loading`, `running`, `finished`, or `terminated`. +======== +==== diff --git a/docs/static/troubleshoot/health-pipeline-status.asciidoc b/docs/static/troubleshoot/health-pipeline-status.asciidoc new file mode 100644 index 00000000000..5c18e365fc5 --- /dev/null +++ b/docs/static/troubleshoot/health-pipeline-status.asciidoc @@ -0,0 +1,38 @@ +[[health-report-pipeline-status]] +=== Health Report Pipeline Status + +The Pipeline indicator has a `status` probe that is capable of producing one of several diagnoses about the pipeline's lifecycle, indicating whether the pipeline is currently running. + +[[health-report-pipeline-status-diagnosis-loading]] +==== Loading Pipeline + +A pipeline that is loading is not yet processing data, and is considered a temporarily-degraded pipeline state. +Some plugins perform actions or pre-validation that can delay the starting of the pipeline, such as when a plugin pre-establishes a connection to an external service before allowing the pipeline to start. +When these plugins take significant time to start up, the whole pipeline can remain in a loading state for an extended time. + +If your pipeline does not come up in a reasonable amount of time, consider checking the Logstash logs to see if the plugin shows evidence of being caught in a retry loop. + +[[health-report-pipeline-status-diagnosis-finished]] +==== Finished Pipeline + +A logstash pipeline whose input plugins have all completed will be shut down once events have finished processing. + +Many plugins can be configured to run indefinitely, either by listening for new inbound events or by polling for events on a schedule. +A finished pipeline will not produce or process any more events until it is restarted, which will occur if the pipeline's definition is changed and pipeline reloads are enabled. +If you wish to keep your pipeline runing, consider configuring its input to run on a schedule or otherwise listen for new events. + +[[health-report-pipeline-status-diagnosis-terminated]] +==== Terminated Pipeline + +When a Logstash pipeline's filter or output plugins crash, the entire pipeline is terminated and intervention is required. + +A terminated pipeline will not produce or process any more events until it is restarted, which will occur if the pipeline's definition is changed and pipeline reloads are enabled. +Check the logs to determine the cause of the crash, and report the issue to the plugin maintainers. + +[[health-report-pipeline-status-diagnosis-unknown]] +==== Unknown Pipeline + +When a Logstash pipeline either cannot be created or has recently been deleted the health report doesn't know enough to produce a meaningful status. + +Check the logs to determine if the pipeline crashed during creation, and report the issue to the plugin maintainers. + diff --git a/docs/static/troubleshoot/troubleshooting.asciidoc b/docs/static/troubleshoot/troubleshooting.asciidoc index b4c8ee7a0d7..66bb60f45e5 100644 --- a/docs/static/troubleshoot/troubleshooting.asciidoc +++ b/docs/static/troubleshoot/troubleshooting.asciidoc @@ -28,3 +28,4 @@ include::ts-logstash.asciidoc[] include::ts-plugins-general.asciidoc[] include::ts-plugins.asciidoc[] include::ts-other-issues.asciidoc[] +include::health-pipeline-status.asciidoc[] From 65ca1dd3d301f27bd7e4fcbd94c354172eece451 Mon Sep 17 00:00:00 2001 From: Ry Biesemeyer Date: Tue, 8 Oct 2024 16:05:16 +0000 Subject: [PATCH 2/3] docs: add secondary one-word anchors for pipeline status diagnosis --- docs/static/troubleshoot/health-pipeline-status.asciidoc | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/docs/static/troubleshoot/health-pipeline-status.asciidoc b/docs/static/troubleshoot/health-pipeline-status.asciidoc index 5c18e365fc5..095ef85f950 100644 --- a/docs/static/troubleshoot/health-pipeline-status.asciidoc +++ b/docs/static/troubleshoot/health-pipeline-status.asciidoc @@ -4,7 +4,7 @@ The Pipeline indicator has a `status` probe that is capable of producing one of several diagnoses about the pipeline's lifecycle, indicating whether the pipeline is currently running. [[health-report-pipeline-status-diagnosis-loading]] -==== Loading Pipeline +==== [[loading]]Loading Pipeline A pipeline that is loading is not yet processing data, and is considered a temporarily-degraded pipeline state. Some plugins perform actions or pre-validation that can delay the starting of the pipeline, such as when a plugin pre-establishes a connection to an external service before allowing the pipeline to start. @@ -13,7 +13,7 @@ When these plugins take significant time to start up, the whole pipeline can rem If your pipeline does not come up in a reasonable amount of time, consider checking the Logstash logs to see if the plugin shows evidence of being caught in a retry loop. [[health-report-pipeline-status-diagnosis-finished]] -==== Finished Pipeline +==== [[finished]]Finished Pipeline A logstash pipeline whose input plugins have all completed will be shut down once events have finished processing. @@ -22,7 +22,7 @@ A finished pipeline will not produce or process any more events until it is rest If you wish to keep your pipeline runing, consider configuring its input to run on a schedule or otherwise listen for new events. [[health-report-pipeline-status-diagnosis-terminated]] -==== Terminated Pipeline +==== [[terminated]]Terminated Pipeline When a Logstash pipeline's filter or output plugins crash, the entire pipeline is terminated and intervention is required. @@ -30,9 +30,8 @@ A terminated pipeline will not produce or process any more events until it is re Check the logs to determine the cause of the crash, and report the issue to the plugin maintainers. [[health-report-pipeline-status-diagnosis-unknown]] -==== Unknown Pipeline +==== [[unknown]]Unknown Pipeline When a Logstash pipeline either cannot be created or has recently been deleted the health report doesn't know enough to produce a meaningful status. Check the logs to determine if the pipeline crashed during creation, and report the issue to the plugin maintainers. - From c88a2c3fda52959d3c3751dcca8193e2ee0544c5 Mon Sep 17 00:00:00 2001 From: Ry Biesemeyer Date: Tue, 8 Oct 2024 12:28:54 -0700 Subject: [PATCH 3/3] Remove plus-for-passthrough --- docs/static/monitoring/monitoring-apis.asciidoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/static/monitoring/monitoring-apis.asciidoc b/docs/static/monitoring/monitoring-apis.asciidoc index 63aff91e1fe..68b4a0b8378 100644 --- a/docs/static/monitoring/monitoring-apis.asciidoc +++ b/docs/static/monitoring/monitoring-apis.asciidoc @@ -1320,10 +1320,10 @@ The details have contents and a structure that is unique to each indicator. [[logstash-health-api-response-details-pipeline]] ===== Pipeline Indicator Details -`+pipelines/indicators//details+`:: +`pipelines/indicators//details`:: (object) Information about the specified pipeline. + -.Properties of `+pipelines/indicators//details+` +.Properties of `pipelines/indicators//details` [%collapsible%open] ==== `status`::