Skip to content

Commit

Permalink
Feature: health report api (#16520)
Browse files Browse the repository at this point in the history
* [health] bootstrap HealthObserver from agent to API (#16141)

* [health] bootstrap HealthObserver from agent to API

* specs: mocked agent needs health observer

* add license headers

* Merge `main` into `feature/health-report-api` (#16397)

* Add GH vault plugin bot to allowed list (#16301)

* regenerate webserver test certificates (#16331)

* correctly handle stack overflow errors during pipeline compilation (#16323)

This commit improves error handling when pipelines that are too big hit the Xss limit and throw a StackOverflowError. Currently the exception is printed outside of the logger, and doesn’t even show if log.format is json, leaving the user to wonder what happened.

A couple of thoughts on the way this is implemented:

* There should be a first barrier to handle pipelines that are too large based on the PipelineIR compilation. The barrier would use the detection of Xss to determine how big a pipeline could be. This however doesn't reduce the need to still handle a StackOverflow if it happens.
* The catching of StackOverflowError could also be done on the WorkerLoop. However I'd suggest that this is unrelated to the Worker initialization itself, it just so happens that compiledPipeline.buildExecution is computed inside the WorkerLoop class for performance reasons. So I'd prefer logging to not come from the existing catch, but from a dedicated catch clause.

Solves #16320

* Doc: Reposition worker-utilization in doc (#16335)

* settings: add support for observing settings after post-process hooks (#16339)

Because logging configuration occurs after loading the `logstash.yml`
settings, deprecation logs from `LogStash::Settings::DeprecatedAlias#set` are
effectively emitted to a null logger and lost.

By re-emitting after the post-process hooks, we can ensure that they make
their way to the deprecation log. This change adds support for any setting
that responds to `Object#observe_post_process` to receive it after all
post-processing hooks have been executed.

Resolves: #16332

* fix line used to determine ES is up (#16349)

* add retries to snyk buildkite job (#16343)

* Fix 8.13.1 release notes (#16363)

make a note of the fix that went to 8.13.1: #16026

Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com>

* Update logstash_releases.json (#16347)

* [Bugfix] Resolve the array and char (single | double quote) escaped values of ${ENV} (#16365)

* Properly resolve the values from ENV vars if literal array string provided with ENV var.

* Docker acceptance test for persisting  keys and use actual values in docker container.

* Review suggestion.

Simplify the code by stripping whitespace before `gsub`, no need to check comma and split.

Co-authored-by: João Duarte <jsvd@users.noreply.github.com>

---------

Co-authored-by: João Duarte <jsvd@users.noreply.github.com>

* Doc: Add SNMP integration to breaking changes (#16374)

* deprecate java less-than 17 (#16370)

* Exclude substitution refinement on pipelines.yml (#16375)

* Exclude substitution refinement on pipelines.yml (applies on ENV vars and logstash.yml where env2yaml saves vars)

* Safety integration test for pipeline config.string contains ENV .

* Doc: Forwardport 8.15.0 release notes to main (#16388)

* Removing 8.14 from ci/branches.json as we have 8.15. (#16390)

---------

Co-authored-by: ev1yehor <146825775+ev1yehor@users.noreply.github.com>
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com>
Co-authored-by: Andrea Selva <selva.andre@gmail.com>
Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>

* Squashed merge from 8.x

* Failure injector plugin implementation. (#16466)

* Test purpose only failure injector integration (filter and output) plugins implementation. Add unit tests and include license notes.

* Fix the degrate method name typo.

Co-authored-by: Andrea Selva <selva.andre@gmail.com>

* Add explanation to the config params and rebuild plugin gem.

---------

Co-authored-by: Andrea Selva <selva.andre@gmail.com>

* Health report integration tests bootstrapper and initial tests implementation (#16467)

* Health Report integration tests bootstrapper and initial slow start scenario implementation.

* Apply suggestions from code review

Renaming expectation check method name.

Co-authored-by: kaisecheng <69120390+kaisecheng@users.noreply.github.com>

* Changed to branch concept, YAML structure simplified as changed to Dict.

* Apply suggestions from code review

Reflect `help_url` to the integration test.

---------

Co-authored-by: kaisecheng <69120390+kaisecheng@users.noreply.github.com>

* health api: expose `GET /_health_report` with pipelines/*/status probe (#16398)

Adds a `GET /_health_report` endpoint with per-pipeline status probes, and wires the
resulting report status into the other API responses, replacing their hard-coded `green`
with a meaningful status indication.

---------

Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>

* docs: health report API, and diagnosis links (feature-targeted) (#16518)

* docs: health report API, and diagnosis links

* Remove plus-for-passthrough markers

Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>

---------

Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>

* merge 8.x into feature branch... (#16519)

* Add GH vault plugin bot to allowed list (#16301)

* regenerate webserver test certificates (#16331)

* correctly handle stack overflow errors during pipeline compilation (#16323)

This commit improves error handling when pipelines that are too big hit the Xss limit and throw a StackOverflowError. Currently the exception is printed outside of the logger, and doesn’t even show if log.format is json, leaving the user to wonder what happened.

A couple of thoughts on the way this is implemented:

* There should be a first barrier to handle pipelines that are too large based on the PipelineIR compilation. The barrier would use the detection of Xss to determine how big a pipeline could be. This however doesn't reduce the need to still handle a StackOverflow if it happens.
* The catching of StackOverflowError could also be done on the WorkerLoop. However I'd suggest that this is unrelated to the Worker initialization itself, it just so happens that compiledPipeline.buildExecution is computed inside the WorkerLoop class for performance reasons. So I'd prefer logging to not come from the existing catch, but from a dedicated catch clause.

Solves #16320

* Doc: Reposition worker-utilization in doc (#16335)

* settings: add support for observing settings after post-process hooks (#16339)

Because logging configuration occurs after loading the `logstash.yml`
settings, deprecation logs from `LogStash::Settings::DeprecatedAlias#set` are
effectively emitted to a null logger and lost.

By re-emitting after the post-process hooks, we can ensure that they make
their way to the deprecation log. This change adds support for any setting
that responds to `Object#observe_post_process` to receive it after all
post-processing hooks have been executed.

Resolves: #16332

* fix line used to determine ES is up (#16349)

* add retries to snyk buildkite job (#16343)

* Fix 8.13.1 release notes (#16363)

make a note of the fix that went to 8.13.1: #16026

Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com>

* Update logstash_releases.json (#16347)

* [Bugfix] Resolve the array and char (single | double quote) escaped values of ${ENV} (#16365)

* Properly resolve the values from ENV vars if literal array string provided with ENV var.

* Docker acceptance test for persisting  keys and use actual values in docker container.

* Review suggestion.

Simplify the code by stripping whitespace before `gsub`, no need to check comma and split.

Co-authored-by: João Duarte <jsvd@users.noreply.github.com>

---------

Co-authored-by: João Duarte <jsvd@users.noreply.github.com>

* Doc: Add SNMP integration to breaking changes (#16374)

* deprecate java less-than 17 (#16370)

* Exclude substitution refinement on pipelines.yml (#16375)

* Exclude substitution refinement on pipelines.yml (applies on ENV vars and logstash.yml where env2yaml saves vars)

* Safety integration test for pipeline config.string contains ENV .

* Doc: Forwardport 8.15.0 release notes to main (#16388)

* Removing 8.14 from ci/branches.json as we have 8.15. (#16390)

* Increase Jruby -Xmx to avoid OOM during zip task in DRA (#16408)

Fix: #16406

* Generate Dataset code with meaningful fields names (#16386)

This PR is intended to help Logstash developers or users that want to better understand the code that's autogenerated to model a pipeline, assigning more meaningful names to the Datasets subclasses' fields.

Updates `FieldDefinition` to receive the name of the field from construction methods, so that it can be used during the code generation phase, instead of the existing incremental `field%n`.
Updates `ClassFields` to propagate the explicit field name down to the `FieldDefinitions`.
Update the `DatasetCompiler` that add fields to `ClassFields` to assign a proper name to generated Dataset's fields.

* Implements safe evaluation of conditional expressions, logging the error without killing the pipeline (#16322)

This PR protects the if statements against expression evaluation errors, cancel the event under processing and log it.
This avoids to crash the pipeline which encounter a runtime error during event condition evaluation, permitting to debug the root cause reporting the offending event and removing from the current processing batch.

Translates the `org.jruby.exceptions.TypeError`, `IllegalArgumentException`, `org.jruby.exceptions.ArgumentError` that could happen during `EventCodition` evaluation into a custom `ConditionalEvaluationError` which bubbles up on AST tree nodes. It's catched in the `SplitDataset` node.
Updates the generation of the `SplitDataset `so that the execution of `filterEvents` method inside the compute body is try-catch guarded and defer the execution to an instance of `AbstractPipelineExt.ConditionalEvaluationListener` to handle such error. In this particular case the error management consist in just logging the offending Event.

---------

Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com>

* Update logstash_releases.json (#16426)

* Release notes for 8.15.1 (#16405) (#16427)

* Update release notes for 8.15.1

* update release note

---------

Co-authored-by: logstashmachine <43502315+logstashmachine@users.noreply.github.com>
Co-authored-by: Kaise Cheng <kaise.cheng@elastic.co>
(cherry picked from commit 2fca7e3)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fix ConditionalEvaluationError to do not include the event that errored in its serialiaxed form, because it's not expected that this class is ever serialized. (#16429) (#16430)

Make inner field of ConditionalEvaluationError transient to be avoided during serialization.

(cherry picked from commit bb7ecc2)

Co-authored-by: Andrea Selva <selva.andre@gmail.com>

* use gnu tar compatible minitar to generate tar artifact (#16432) (#16434)

Using VERSION_QUALIFIER when building the tarball distribution will fail since Ruby's TarWriter implements the older POSIX88 version of tar and paths will be longer than 100 characters.

For the long paths being used in Logstash's plugins, mainly due to nested folders from jar-dependencies, we need the tarball to follow either the 2001 ustar format or gnu tar, which is implemented by the minitar gem.

(cherry picked from commit 69f0fa5)

Co-authored-by: João Duarte <jsvd@users.noreply.github.com>

* account for the 8.x in DRA publishing task (#16436) (#16440)

the current DRA publishing task computes the branch from the version
contained in the version.yml

This is done by taking the major.minor and confirming that a branch
exists with that name.

However this pattern won't be applicable for 8.x, as that branch
currently points to 8.16.0 and there is no 8.16 branch.

This commit falls back to reading the buildkite injected
BUILDKITE_BRANCH variable.

(cherry picked from commit 17dba9f)

Co-authored-by: João Duarte <jsvd@users.noreply.github.com>

* Fixes the issue where LS wipes out all quotes from docker env variables. (#16456) (#16459)

* Fixes the issue where LS wipes out all quotes from docker env variables. This is an issue when running LS on docker with CONFIG_STRING, needs to keep quotes with env variable.

* Add a docker acceptance integration test.

(cherry picked from commit 7c64c73)

Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>

* Known issue for 8.15.1 related to env vars references (#16455) (#16469)

(cherry picked from commit b54caf3)

Co-authored-by: Luca Belluccini <luca.belluccini@elastic.co>

* bump .ruby_version to jruby-9.4.8.0 (#16477) (#16480)

(cherry picked from commit 51cca73)

Co-authored-by: João Duarte <jsvd@users.noreply.github.com>

* Release notes for 8.15.2 (#16471) (#16478)

Co-authored-by: andsel <selva.andre@gmail.com>
Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com>
(cherry picked from commit 01dc76f)

* Change LogStash::Util::SubstitutionVariables#replace_placeholders refine argument to optional (#16485) (#16488)

(cherry picked from commit 8368c00)

Co-authored-by: Edmo Vamerlatti Costa <11836452+edmocosta@users.noreply.github.com>

* Use jruby-9.4.8.0 in exhaustive CIs. (#16489) (#16491)

(cherry picked from commit fd1de39)

Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>

* Don't use an older JRuby with oraclelinux-7 (#16499) (#16501)

A recent PR (elastic/ci-agent-images/pull/932) modernized the VM images
and removed JRuby 9.4.5.0 and some older versions.

This ended up breaking exhaustive test on Oracle Linux 7 that hard coded
JRuby 9.4.5.0.

PR #16489 worked around the
problem by pinning to the new JRuby, but actually we don't
need the conditional anymore since the original issue
jruby/jruby#7579 (comment) has
been resolved and none of our releasable branches (apart from 7.17 which
uses `9.2.20.1`) specify `9.3.x.y` in `/.ruby-version`.

Therefore, this commit removes conditional setting of JRuby for
OracleLinux 7 agents in exhaustive tests (and relies on whatever
`/.ruby-version` defines).

(cherry picked from commit 07c01f8)

Co-authored-by: Dimitrios Liappis <dimitrios.liappis@gmail.com>

* Improve pipeline bootstrap error logs (#16495) (#16504)

This PR adds the cause errors details on the pipeline converge state error logs

(cherry picked from commit e84fb45)

Co-authored-by: Edmo Vamerlatti Costa <11836452+edmocosta@users.noreply.github.com>

* Logstash Health Report Tests Buildkite pipeline setup. (#16416) (#16511)

(cherry picked from commit 5195332)

Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>

* Make health report test runner script executable. (#16446) (#16512)

(cherry picked from commit 2ebf265)

Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>

* Backport PR #16423 to 8.x: DLQ-ing events that trigger an conditional evaluation error. (#16493)

* DLQ-ing events that trigger an conditional evaluation error. (#16423)

When a conditional evaluation encounter an error in the expression the event that triggered the issue is sent to pipeline's DLQ, if enabled for the executing pipeline.

This PR engage with the work done in #16322, the `ConditionalEvaluationListener` that is receives notifications about if-statements evaluation failure, is improved to also send the event to DLQ (if enabled in the pipeline) and not just logging it.

(cherry picked from commit b69d993)

* Fixed warning about non serializable field DeadLetterQueueWriter in serializable AbstractPipelineExt

---------

Co-authored-by: Andrea Selva <selva.andre@gmail.com>

* add deprecation log for `--event_api.tags.illegal` (#16507) (#16515)

- move `--event_api.tags.illegal` from option to deprecated_option
- add deprecation log when the flag is explicitly used
relates: #16356

Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>
(cherry picked from commit a4eddb8)

Co-authored-by: kaisecheng <69120390+kaisecheng@users.noreply.github.com>

---------

Co-authored-by: ev1yehor <146825775+ev1yehor@users.noreply.github.com>
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com>
Co-authored-by: Andrea Selva <selva.andre@gmail.com>
Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>
Co-authored-by: kaisecheng <69120390+kaisecheng@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Luca Belluccini <luca.belluccini@elastic.co>
Co-authored-by: Edmo Vamerlatti Costa <11836452+edmocosta@users.noreply.github.com>
Co-authored-by: Dimitrios Liappis <dimitrios.liappis@gmail.com>

---------

Co-authored-by: ev1yehor <146825775+ev1yehor@users.noreply.github.com>
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
Co-authored-by: Karen Metts <35154725+karenzone@users.noreply.github.com>
Co-authored-by: Andrea Selva <selva.andre@gmail.com>
Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>
Co-authored-by: kaisecheng <69120390+kaisecheng@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Luca Belluccini <luca.belluccini@elastic.co>
Co-authored-by: Edmo Vamerlatti Costa <11836452+edmocosta@users.noreply.github.com>
Co-authored-by: Dimitrios Liappis <dimitrios.liappis@gmail.com>
(cherry picked from commit 7eb5185)
  • Loading branch information
yaauie authored and logstashmachine committed Oct 9, 2024
1 parent a4eddb8 commit de4c875
Show file tree
Hide file tree
Showing 59 changed files with 2,955 additions and 19 deletions.
18 changes: 18 additions & 0 deletions .buildkite/scripts/health-report-tests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
## Description
This package for integration tests of the Health Report API.
Export `LS_BRANCH` to run on a specific branch. By default, it uses the main branch.

## How to run the Health Report Integration test?
### Prerequisites
Make sure you have python installed. Install the integration test dependencies with the following command:
```shell
python3 -mpip install -r .buildkite/scripts/health-report-tests/requirements.txt
```

### Run the integration tests
```shell
python3 .buildkite/scripts/health-report-tests/main.py
```

### Troubleshooting
- If you get `WARNING: pip is configured with locations that require TLS/SSL,...` warning message, make sure you have python >=3.12.4 installed.
Empty file.
101 changes: 101 additions & 0 deletions .buildkite/scripts/health-report-tests/bootstrap.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
"""
Health Report Integration test bootstrapper with Python script
- A script to resolve Logstash version if not provided
- Download LS docker image and spin up
- When tests finished, teardown the Logstash
"""
import os
import subprocess
import util
import yaml


class Bootstrap:
ELASTIC_STACK_VERSIONS_URL = "https://artifacts-api.elastic.co/v1/versions"

def __init__(self) -> None:
f"""
A constructor of the {Bootstrap}.
Returns:
Resolves Logstash branch considering provided LS_BRANCH
Checks out git branch
"""
logstash_branch = os.environ.get("LS_BRANCH")
if logstash_branch is None:
# version is not specified, use the main branch, no need to git checkout
print(f"LS_BRANCH is not specified, using main branch.")
else:
# LS_BRANCH accepts major latest as a major.x or specific branch as X.Y
if logstash_branch.find(".x") == -1:
print(f"Using specified branch: {logstash_branch}")
util.git_check_out_branch(logstash_branch)
else:
major_version = logstash_branch.split(".")[0]
if major_version and major_version.isnumeric():
resolved_version = self.__resolve_latest_stack_version_for(major_version)
minor_version = resolved_version.split(".")[1]
branch = major_version + "." + minor_version
print(f"Using resolved branch: {branch}")
util.git_check_out_branch(branch)
else:
raise ValueError(f"Invalid value set to LS_BRANCH. Please set it properly (ex: 8.x or 9.0) and "
f"rerun again")

def __resolve_latest_stack_version_for(self, major_version: str) -> str:
resolved_version = ""
response = util.call_url_with_retry(self.ELASTIC_STACK_VERSIONS_URL)
release_versions = response.json()["versions"]
for release_version in reversed(release_versions):
if release_version.find("SNAPSHOT") > 0:
continue
if release_version.split(".")[0] == major_version:
print(f"Resolved latest version for {major_version} is {release_version}.")
resolved_version = release_version
break

if resolved_version == "":
raise ValueError(f"Cannot resolve latest version for {major_version} major")
return resolved_version

def install_plugin(self, plugin_path: str) -> None:
util.run_or_raise_error(
["bin/logstash-plugin", "install", plugin_path],
f"Failed to install {plugin_path}")

def build_logstash(self):
print(f"Building Logstash.")
util.run_or_raise_error(
["./gradlew", "clean", "bootstrap", "assemble", "installDefaultGems"],
"Failed to build Logstash")
print(f"Logstash has successfully built.")

def apply_config(self, config: dict) -> None:
with open(os.getcwd() + "/.buildkite/scripts/health-report-tests/config/pipelines.yml", 'w') as pipelines_file:
yaml.dump(config, pipelines_file)

def run_logstash(self, full_start_required: bool) -> subprocess.Popen:
# --config.reload.automatic is to make instance active
# it is helpful when testing crash pipeline cases
config_path = os.getcwd() + "/.buildkite/scripts/health-report-tests/config"
process = subprocess.Popen(["bin/logstash", "--config.reload.automatic", "--path.settings", config_path,
"-w 1"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True, shell=False)
if process.poll() is not None:
print(f"Logstash failed to run, check the the config and logs, then rerun.")
return None

# Read stdout and stderr in real-time
logs = []
for stdout_line in iter(process.stdout.readline, ""):
logs.append(stdout_line.strip())
# we don't wait for Logstash fully start as we also test slow pipeline start scenarios
if full_start_required is False and "Starting pipeline" in stdout_line:
break
if full_start_required is True and "Pipeline started" in stdout_line:
break
if "Logstash shut down" in stdout_line or "Logstash stopped" in stdout_line:
print(f"Logstash couldn't spin up.")
print(logs)
return None

print(f"Logstash is running with PID: {process.pid}.")
return process
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Intentionally left blank
69 changes: 69 additions & 0 deletions .buildkite/scripts/health-report-tests/config_validator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
import yaml
from typing import Any, List, Dict


class ConfigValidator:
REQUIRED_KEYS = {
"root": ["name", "config", "conditions", "expectation"],
"config": ["pipeline.id", "config.string"],
"conditions": ["full_start_required"],
"expectation": ["status", "symptom", "indicators"],
"indicators": ["pipelines"],
"pipelines": ["status", "symptom", "indicators"],
"DYNAMIC": ["status", "symptom", "diagnosis", "impacts", "details"],
"details": ["status"],
"status": ["state"]
}

def __init__(self):
self.yaml_content = None

def __has_valid_keys(self, data: any, key_path: str, repeated: bool) -> bool:
if isinstance(data, str) or isinstance(data, bool): # we reached values
return True

# we have two indicators section and for the next repeated ones, we go deeper
first_key = next(iter(data))
data = data[first_key] if repeated and key_path == "indicators" else data

if isinstance(data, dict):
# pipeline-id is a DYNAMIC
required = self.REQUIRED_KEYS.get("DYNAMIC" if repeated and key_path == "indicators" else key_path, [])
repeated = not repeated if key_path == "indicators" else repeated
for key in required:
if key not in data:
print(f"Missing key '{key}' in '{key_path}'")
return False
else:
dic_keys_result = self.__has_valid_keys(data[key], key, repeated)
if dic_keys_result is False:
return False
elif isinstance(data, list):
for item in data:
list_keys_result = self.__has_valid_keys(item, key_path, repeated)
if list_keys_result is False:
return False
return True

def load(self, file_path: str) -> None:
"""Load the YAML file content into self.yaml_content."""
self.yaml_content: [Dict[str, Any]] = None
try:
with open(file_path, 'r') as file:
self.yaml_content = yaml.safe_load(file)
except yaml.YAMLError as exc:
print(f"Error in YAML file: {exc}")
self.yaml_content = None

def is_valid(self) -> bool:
"""Validate the entire YAML structure."""
if self.yaml_content is None:
print(f"YAML content is empty.")
return False

if not isinstance(self.yaml_content, dict):
print(f"YAML structure is not as expected, it should start with a Dict.")
return False

result = self.__has_valid_keys(self.yaml_content, "root", False)
return True if result is True else False
16 changes: 16 additions & 0 deletions .buildkite/scripts/health-report-tests/logstash_health_report.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
"""
A class to provide information about Logstash node stats.
"""

import util


class LogstashHealthReport:
LOGSTASH_HEALTH_REPORT_URL = "http://localhost:9600/_health_report"

def __init__(self):
pass

def get(self):
response = util.call_url_with_retry(self.LOGSTASH_HEALTH_REPORT_URL)
return response.json()
87 changes: 87 additions & 0 deletions .buildkite/scripts/health-report-tests/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
"""
Main entry point of the LS health report API integration test suites
"""
import glob
import os
import time
import traceback
import yaml
from bootstrap import Bootstrap
from scenario_executor import ScenarioExecutor
from config_validator import ConfigValidator


class BootstrapContextManager:

def __init__(self):
pass

def __enter__(self):
print(f"Starting Logstash Health Report Integration test.")
self.bootstrap = Bootstrap()
self.bootstrap.build_logstash()

plugin_path = os.getcwd() + "/qa/support/logstash-integration-failure_injector/logstash-integration" \
"-failure_injector-*.gem"
matching_files = glob.glob(plugin_path)
if len(matching_files) == 0:
raise ValueError(f"Could not find logstash-integration-failure_injector plugin.")

self.bootstrap.install_plugin(matching_files[0])
print(f"logstash-integration-failure_injector successfully installed.")
return self.bootstrap

def __exit__(self, exc_type, exc_value, exc_traceback):
if exc_type is not None:
print(traceback.format_exception(exc_type, exc_value, exc_traceback))


def main():
with BootstrapContextManager() as bootstrap:
scenario_executor = ScenarioExecutor()
config_validator = ConfigValidator()

working_dir = os.getcwd()
scenario_files_path = working_dir + "/.buildkite/scripts/health-report-tests/tests/*.yaml"
scenario_files = glob.glob(scenario_files_path)

for scenario_file in scenario_files:
print(f"Validating {scenario_file} scenario file.")
config_validator.load(scenario_file)
if config_validator.is_valid() is False:
print(f"{scenario_file} scenario file is not valid.")
return
else:
print(f"Validation succeeded.")

has_failed_scenario = False
for scenario_file in scenario_files:
with open(scenario_file, 'r') as file:
# scenario_content: Dict[str, Any] = None
scenario_content = yaml.safe_load(file)
print(f"Testing `{scenario_content.get('name')}` scenario.")
scenario_name = scenario_content['name']

is_full_start_required = next(sub.get('full_start_required') for sub in
scenario_content.get('conditions') if 'full_start_required' in sub)
config = scenario_content['config']
if config is not None:
bootstrap.apply_config(config)
expectations = scenario_content.get("expectation")
process = bootstrap.run_logstash(is_full_start_required)
if process is not None:
try:
scenario_executor.on(scenario_name, expectations)
except Exception as e:
print(e)
has_failed_scenario = True
process.terminate()
time.sleep(5) # leave some window to terminate the process

if has_failed_scenario:
# intentionally fail due to visibility
raise Exception("Some of scenarios failed, check the log for details.")


if __name__ == "__main__":
main()
8 changes: 2 additions & 6 deletions .buildkite/scripts/health-report-tests/main.sh
Original file line number Diff line number Diff line change
@@ -1,9 +1,5 @@
#!/usr/bin/env bash
set -eo pipefail

# TODO:
# if branch is specified with X.Y, pull branches from ACTIVE_BRANCHES_URL="https://raw.githubusercontent.com/elastic/logstash/main/ci/branches.json", parse and use
# build Logstash from specificed (ex: 8.x -> translates to 8.latest, 8.16) branch, defaults to main
# install requirements of the python package and run main.py


python3 -mpip install -r .buildkite/scripts/health-report-tests/requirements.txt
python3 .buildkite/scripts/health-report-tests/main.py
2 changes: 2 additions & 0 deletions .buildkite/scripts/health-report-tests/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
requests==2.32.3
pyyaml==6.0.2
65 changes: 65 additions & 0 deletions .buildkite/scripts/health-report-tests/scenario_executor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
"""
A class to execute the given scenario for Logstash Health Report integration test
"""
import time
from logstash_health_report import LogstashHealthReport


class ScenarioExecutor:
logstash_health_report_api = LogstashHealthReport()

def __init__(self):
pass

def __has_intersection(self, expects, results):
# we expect expects to be existing in results
for expect in expects:
for result in results:
if result.get('help_url') and "health-report-pipeline-status.html#" not in result.get('help_url'):
return False
if not all(key in result and result[key] == value for key, value in expect.items()):
return False
return True

def __get_difference(self, differences: list, expectations: dict, reports: dict) -> dict:
for key in expectations.keys():

if type(expectations.get(key)) != type(reports.get(key)):
differences.append(f"Scenario expectation and Health API report structure differs for {key}.")
return differences

if isinstance(expectations.get(key), str):
if expectations.get(key) != reports.get(key):
differences.append({key: {"expected": expectations.get(key), "got": reports.get(key)}})
continue
elif isinstance(expectations.get(key), dict):
self.__get_difference(differences, expectations.get(key), reports.get(key))
elif isinstance(expectations.get(key), list):
if not self.__has_intersection(expectations.get(key), reports.get(key)):
differences.append({key: {"expected": expectations.get(key), "got": reports.get(key)}})
return differences

def __is_expected(self, expectations: dict) -> None:
reports = self.logstash_health_report_api.get()
differences = self.__get_difference([], expectations, reports)
if differences:
print("Differences found in 'expectation' section between YAML content and stats:")
for diff in differences:
print(f"Difference: {diff}")
return False
else:
return True

def on(self, scenario_name: str, expectations: dict) -> None:
# retriable check the expectations
attempts = 5
while self.__is_expected(expectations) is False:
attempts = attempts - 1
if attempts == 0:
break
time.sleep(1)

if attempts == 0:
raise Exception(f"{scenario_name} failed.")
else:
print(f"Scenario `{scenario_name}` expectaion meets the health report stats.")
Loading

0 comments on commit de4c875

Please sign in to comment.