Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core Refactoring, Krkn Scenario Plugin API #694

Merged
merged 33 commits into from
Oct 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
f3ca907
relocated shared libraries from `kraken` to `krkn` folder
tsebastiani Sep 25, 2024
db705c8
AbstractScenarioPlugin and ScenarioPluginFactory
tsebastiani Sep 25, 2024
56fdf0b
application_outage porting
tsebastiani Sep 25, 2024
d321f46
arcaflow_scenarios porting
tsebastiani Sep 25, 2024
5cf7af8
managedcluster_scenarios porting
tsebastiani Sep 25, 2024
b550691
network_chaos porting
tsebastiani Sep 25, 2024
95eb355
node_actions porting
tsebastiani Sep 25, 2024
08bd260
plugin_scenarios porting
tsebastiani Sep 25, 2024
fcdc4cf
pvc_scenarios porting
tsebastiani Sep 25, 2024
fa45198
service_disruption porting
tsebastiani Sep 25, 2024
7d06766
service_hijacking porting
tsebastiani Sep 25, 2024
c14408e
cluster_shut_down_scenarios porting
tsebastiani Sep 25, 2024
4b540b1
syn_flood porting
tsebastiani Sep 25, 2024
680536c
time_scenarios porting
tsebastiani Sep 25, 2024
0059c25
zone_outages porting
tsebastiani Sep 25, 2024
f1d2e5e
ScenarioPluginFactory tests
tsebastiani Sep 25, 2024
d7ba0ee
unit tests update
tsebastiani Sep 25, 2024
e1b4c8e
pod_scenarios and post actions deprecated
tsebastiani Sep 25, 2024
fd633f4
funtests and config update
tsebastiani Sep 25, 2024
d3154cb
run_krkn.py update
tsebastiani Sep 25, 2024
168691f
utils porting
tsebastiani Sep 25, 2024
434908c
API Documentation
tsebastiani Sep 25, 2024
c27b644
container_scenarios porting
tsebastiani Sep 26, 2024
8d42392
funtest fix
tsebastiani Sep 26, 2024
4f714fb
document gif update
tsebastiani Sep 26, 2024
fbf2d8e
Documentation + tests update
tsebastiani Sep 26, 2024
ea08b9b
removed example plugin
tsebastiani Oct 1, 2024
967bfe1
global renaming
tsebastiani Oct 2, 2024
42eefdf
config.yaml typos
tsebastiani Oct 2, 2024
8b36550
removed `plugin_scenarios` from NativScenarioPlugin class
tsebastiani Oct 2, 2024
d786abc
pod_network_scenarios type added
tsebastiani Oct 3, 2024
cccd7b6
documentation update
tsebastiani Oct 3, 2024
3827b1f
krkn-lib update
tsebastiani Oct 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CI/tests/test_app_outages.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ function functional_test_app_outage {
yq -i '.application_outage.duration=10' scenarios/openshift/app_outage.yaml
yq -i '.application_outage.pod_selector={"scenario":"outage"}' scenarios/openshift/app_outage.yaml
yq -i '.application_outage.namespace="default"' scenarios/openshift/app_outage.yaml
export scenario_type="application_outages"
export scenario_type="application_outages_scenarios"
export scenario_file="scenarios/openshift/app_outage.yaml"
export post_config=""
envsubst < CI/config/common_test_config.yaml > CI/config/app_outage.yaml
Expand Down
6 changes: 3 additions & 3 deletions CI/tests/test_arca_cpu_hog.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ trap finish EXIT


function functional_test_arca_cpu_hog {
yq -i '.input_list[0].node_selector={"kubernetes.io/hostname":"kind-worker2"}' scenarios/arcaflow/cpu-hog/input.yaml
export scenario_type="arcaflow_scenarios"
export scenario_file="scenarios/arcaflow/cpu-hog/input.yaml"
yq -i '.input_list[0].node_selector={"kubernetes.io/hostname":"kind-worker2"}' scenarios/kube/cpu-hog/input.yaml
export scenario_type="hog_scenarios"
export scenario_file="scenarios/kube/cpu-hog/input.yaml"
export post_config=""
envsubst < CI/config/common_test_config.yaml > CI/config/arca_cpu_hog.yaml
python3 -m coverage run -a run_kraken.py -c CI/config/arca_cpu_hog.yaml
Expand Down
6 changes: 3 additions & 3 deletions CI/tests/test_arca_io_hog.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ trap finish EXIT


function functional_test_arca_io_hog {
yq -i '.input_list[0].node_selector={"kubernetes.io/hostname":"kind-worker2"}' scenarios/arcaflow/io-hog/input.yaml
export scenario_type="arcaflow_scenarios"
export scenario_file="scenarios/arcaflow/io-hog/input.yaml"
yq -i '.input_list[0].node_selector={"kubernetes.io/hostname":"kind-worker2"}' scenarios/kube/io-hog/input.yaml
export scenario_type="hog_scenarios"
export scenario_file="scenarios/kube/io-hog/input.yaml"
export post_config=""
envsubst < CI/config/common_test_config.yaml > CI/config/arca_io_hog.yaml
python3 -m coverage run -a run_kraken.py -c CI/config/arca_io_hog.yaml
Expand Down
6 changes: 3 additions & 3 deletions CI/tests/test_arca_memory_hog.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ trap finish EXIT


function functional_test_arca_memory_hog {
yq -i '.input_list[0].node_selector={"kubernetes.io/hostname":"kind-worker2"}' scenarios/arcaflow/memory-hog/input.yaml
export scenario_type="arcaflow_scenarios"
export scenario_file="scenarios/arcaflow/memory-hog/input.yaml"
yq -i '.input_list[0].node_selector={"kubernetes.io/hostname":"kind-worker2"}' scenarios/kube/memory-hog/input.yaml
export scenario_type="hog_scenarios"
export scenario_file="scenarios/kube/memory-hog/input.yaml"
export post_config=""
envsubst < CI/config/common_test_config.yaml > CI/config/arca_memory_hog.yaml
python3 -m coverage run -a run_kraken.py -c CI/config/arca_memory_hog.yaml
Expand Down
2 changes: 1 addition & 1 deletion CI/tests/test_container.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ function functional_test_container_crash {
yq -i '.scenarios[0].label_selector="scenario=container"' scenarios/openshift/container_etcd.yml
yq -i '.scenarios[0].container_name="fedtools"' scenarios/openshift/container_etcd.yml
export scenario_type="container_scenarios"
export scenario_file="- scenarios/openshift/container_etcd.yml"
export scenario_file="scenarios/openshift/container_etcd.yml"
export post_config=""
envsubst < CI/config/common_test_config.yaml > CI/config/container_config.yaml

Expand Down
4 changes: 2 additions & 2 deletions CI/tests/test_namespace.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ trap error ERR
trap finish EXIT

function funtional_test_namespace_deletion {
export scenario_type="namespace_scenarios"
export scenario_file="- scenarios/openshift/ingress_namespace.yaml"
export scenario_type="service_disruption_scenarios"
export scenario_file="scenarios/openshift/ingress_namespace.yaml"
export post_config=""
yq '.scenarios[0].namespace="^namespace-scenario$"' -i scenarios/openshift/ingress_namespace.yaml
yq '.scenarios[0].wait_time=30' -i scenarios/openshift/ingress_namespace.yaml
Expand Down
2 changes: 1 addition & 1 deletion CI/tests/test_net_chaos.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ function functional_test_network_chaos {
yq -i 'del(.network_chaos.egress.latency)' scenarios/openshift/network_chaos.yaml
yq -i 'del(.network_chaos.egress.loss)' scenarios/openshift/network_chaos.yaml

export scenario_type="network_chaos"
export scenario_type="network_chaos_scenarios"
export scenario_file="scenarios/openshift/network_chaos.yaml"
export post_config=""
envsubst < CI/config/common_test_config.yaml > CI/config/network_chaos.yaml
Expand Down
2 changes: 1 addition & 1 deletion CI/tests/test_service_hijacking.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ TEXT_MIME="text/plain; charset=utf-8"

function functional_test_service_hijacking {

export scenario_type="service_hijacking"
export scenario_type="service_hijacking_scenarios"
export scenario_file="scenarios/kube/service_hijacking.yaml"
export post_config=""
envsubst < CI/config/common_test_config.yaml > CI/config/service_hijacking.yaml
Expand Down
4 changes: 2 additions & 2 deletions CI/tests/test_telemetry.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ function functional_test_telemetry {
yq -i '.performance_monitoring.prometheus_url="http://localhost:9090"' CI/config/common_test_config.yaml
yq -i '.telemetry.run_tag=env(RUN_TAG)' CI/config/common_test_config.yaml

export scenario_type="arcaflow_scenarios"
export scenario_file="scenarios/arcaflow/cpu-hog/input.yaml"
export scenario_type="hog_scenarios"
export scenario_file="scenarios/kube/cpu-hog/input.yaml"
export post_config=""
envsubst < CI/config/common_test_config.yaml > CI/config/telemetry.yaml
retval=$(python3 -m coverage run -a run_kraken.py -c CI/config/telemetry.yaml)
Expand Down
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,12 @@ If adding a new scenario or tweaking the main config, be sure to add in updates
Please read [this file]((CI/README.md#adding-a-test-case)) for more information on updates.


### Scenario Plugin Development

If you're gearing up to develop new scenarios, take a moment to review our
[Scenario Plugin API Documentation](docs/scenario_plugin_api.md).
It’s the perfect starting point to tap into your chaotic creativity!

### Community
Key Members(slack_usernames/full name): paigerube14/Paige Rubendall, mffiedler/Mike Fiedler, tsebasti/Tullio Sebastiani, yogi/Yogananth Subramanian, sahil/Sahil Shah, pradeep/Pradeep Surisetty and ravielluri/Naga Ravi Chaitanya Elluri.
* [**#krkn on Kubernetes Slack**](https://kubernetes.slack.com/messages/C05SFMHRWK1)
Expand Down
47 changes: 25 additions & 22 deletions config/config.yaml
Original file line number Diff line number Diff line change
@@ -1,50 +1,53 @@
kraken:
distribution: kubernetes # Distribution can be kubernetes or openshift
kubeconfig_path: ~/.kube/config # Path to kubeconfig
kubeconfig_path: ~/.kube/config # Path to kubeconfig
exit_on_failure: False # Exit when a post action scenario fails
publish_kraken_status: True # Can be accessed at http://0.0.0.0:8081
signal_state: RUN # Will wait for the RUN signal when set to PAUSE before running the scenarios, refer docs/signal.md for more details
signal_address: 0.0.0.0 # Signal listening address
port: 8081 # Signal port
chaos_scenarios:
# List of policies/chaos scenarios to load
- arcaflow_scenarios:
- scenarios/arcaflow/cpu-hog/input.yaml
- scenarios/arcaflow/memory-hog/input.yaml
- scenarios/arcaflow/io-hog/input.yaml
- application_outages:
- hog_scenarios:
- scenarios/kube/cpu-hog/input.yaml
- scenarios/kube/memory-hog/input.yaml
- scenarios/kube/io-hog/input.yaml
- scenarios/kube/io-hog/input.yaml
- application_outages_scenarios:
- scenarios/openshift/app_outage.yaml
- container_scenarios: # List of chaos pod scenarios to load
- - scenarios/openshift/container_etcd.yml
- plugin_scenarios:
- scenarios/openshift/container_etcd.yml
- pod_network_scenarios:
- scenarios/openshift/network_chaos_ingress.yml
- scenarios/openshift/pod_network_outage.yml
- pod_disruption_scenarios:
- scenarios/openshift/etcd.yml
- scenarios/openshift/regex_openshift_pod_kill.yml
- scenarios/openshift/vmware_node_scenarios.yml
- scenarios/openshift/network_chaos_ingress.yml
- scenarios/openshift/prom_kill.yml
- node_scenarios: # List of chaos node scenarios to load
- scenarios/openshift/aws_node_scenarios.yml
- plugin_scenarios:
- scenarios/openshift/openshift-apiserver.yml
- scenarios/openshift/openshift-kube-apiserver.yml
- vmware_node_scenarios:
- scenarios/openshift/vmware_node_scenarios.yml
- ibmcloud_node_scenarios:
- scenarios/openshift/ibmcloud_node_scenarios.yml
- node_scenarios: # List of chaos node scenarios to load
- scenarios/openshift/aws_node_scenarios.yml
- time_scenarios: # List of chaos time scenarios to load
- scenarios/openshift/time_scenarios_example.yml
- cluster_shut_down_scenarios:
- - scenarios/openshift/cluster_shut_down_scenario.yml
- scenarios/openshift/post_action_shut_down.py
- scenarios/openshift/cluster_shut_down_scenario.yml
- service_disruption_scenarios:
- - scenarios/openshift/regex_namespace.yaml
- - scenarios/openshift/ingress_namespace.yaml
- scenarios/openshift/post_action_namespace.py
- zone_outages:
- scenarios/openshift/regex_namespace.yaml
- scenarios/openshift/ingress_namespace.yaml
- zone_outages_scenarios:
- scenarios/openshift/zone_outage.yaml
- pvc_scenarios:
- scenarios/openshift/pvc_scenario.yaml
- network_chaos:
- network_chaos_scenarios:
- scenarios/openshift/network_chaos.yaml
- service_hijacking:
- service_hijacking_scenarios:
- scenarios/kube/service_hijacking.yaml
- syn_flood:
- syn_flood_scenarios:
- scenarios/kube/syn_flood.yaml

cerberus:
Expand Down
2 changes: 1 addition & 1 deletion config/config_kind.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ kraken:
publish_kraken_status: True # Can be accessed at http://0.0.0.0:8081
signal_state: RUN # Will wait for the RUN signal when set to PAUSE before running the scenarios, refer docs/signal.md for more details
signal_address: 0.0.0.0 # Signal listening address
chaos_scenarios: # List of policies/chaos scenarios to load
chaos_scenarios: # List of policies/chaos scenarios to load
- plugin_scenarios:
- scenarios/kind/scheduler.yml
- node_scenarios:
Expand Down
2 changes: 1 addition & 1 deletion config/config_kubernetes.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ kraken:
signal_state: RUN # Will wait for the RUN signal when set to PAUSE before running the scenarios, refer docs/signal.md for more details
chaos_scenarios: # List of policies/chaos scenarios to load
- container_scenarios: # List of chaos pod scenarios to load
- - scenarios/kube/container_dns.yml
- scenarios/kube/container_dns.yml
- plugin_scenarios:
- scenarios/kube/scheduler.yml

Expand Down
5 changes: 2 additions & 3 deletions config/config_performance.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,14 @@ kraken:
- scenarios/openshift/regex_openshift_pod_kill.yml
- scenarios/openshift/prom_kill.yml
- node_scenarios: # List of chaos node scenarios to load
- scenarios/openshift/node_scenarios_example.yml
- scenarios/openshift/node_scenarios_example.yml
- plugin_scenarios:
- scenarios/openshift/openshift-apiserver.yml
- scenarios/openshift/openshift-kube-apiserver.yml
- time_scenarios: # List of chaos time scenarios to load
- scenarios/openshift/time_scenarios_example.yml
- cluster_shut_down_scenarios:
- - scenarios/openshift/cluster_shut_down_scenario.yml
- scenarios/openshift/post_action_shut_down.py
- scenarios/openshift/cluster_shut_down_scenario.yml
- service_disruption_scenarios:
- scenarios/openshift/regex_namespace.yaml
- scenarios/openshift/ingress_namespace.yaml
Expand Down
136 changes: 136 additions & 0 deletions docs/scenario_plugin_api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Scenario Plugin API:

This API enables seamless integration of Scenario Plugins for Krkn. Plugins are automatically
detected and loaded by the plugin loader, provided they extend the `AbstractPluginScenario`
abstract class, implement the required methods, and adhere to the specified [naming conventions](#naming-conventions).

## Plugin folder:

The plugin loader automatically loads plugins found in the `krkn/scenario_plugins` directory,
relative to the Krkn root folder. Each plugin must reside in its own directory and can consist
of one or more Python files. The entry point for each plugin is a Python class that extends the
[AbstractPluginScenario](../krkn/scenario_plugins/abstract_scenario_plugin.py) abstract class and implements its required methods.

## `AbstractPluginScenario` abstract class:

This [abstract class](../krkn/scenario_plugins/abstract_scenario_plugin.py) defines the contract between the plugin and krkn.
It consists of two methods:
- `run(...)`
- `get_scenario_type()`

Most IDEs can automatically suggest and implement the abstract methods defined in `AbstractPluginScenario`:
![pycharm](scenario_plugin_pycharm.gif)
_(IntelliJ PyCharm)_

### `run(...)`

```python
def run(
self,
run_uuid: str,
scenario: str,
krkn_config: dict[str, any],
lib_telemetry: KrknTelemetryOpenshift,
scenario_telemetry: ScenarioTelemetry,
) -> int:

```

This method represents the entry point of the plugin and the first method
that will be executed.
#### Parameters:

- `run_uuid`:
- the uuid of the chaos run generated by krkn for every single run.
- `scenario`:
- the config file of the scenario that is currently executed
- `krkn_config`:
- the full dictionary representation of the `config.yaml`
- `lib_telemetry`
- it is a composite object of all the [krkn-lib](https://krkn-chaos.github.io/krkn-lib-docs/modules.html) objects and methods needed by a krkn plugin to run.
- `scenario_telemetry`
- the `ScenarioTelemetry` object of the scenario that is currently executed

### Return value:
Returns 0 if the scenario succeeds and 1 if it fails.
> [!WARNING]
> All the exception must be handled __inside__ the run method and not propagated.

### `get_scenario_types()`:

```python def get_scenario_types(self) -> list[str]:```

Indicates the scenario types specified in the `config.yaml`. For the plugin to be properly
loaded, recognized and executed, it must be implemented and must return one or more
strings matching `scenario_type` strings set in the config.
> [!WARNING]
> Multiple strings can map to a *single* `ScenarioPlugin` but the same string cannot map
> to different plugins, an exception will be thrown for scenario_type redefinition.

> [!Note]
> The `scenario_type` strings must be unique across all plugins; otherwise, an exception will be thrown.

## Naming conventions:
A key requirement for developing a plugin that will be properly loaded
by the plugin loader is following the established naming conventions.
These conventions are enforced to maintain a uniform and readable codebase,
making it easier to onboard new developers from the community.

### plugin folder:
- the plugin folder must be placed in the `krkn/scenario_plugin` folder starting from the krkn root folder
- the plugin folder __cannot__ contain the words
- `plugin`
- `scenario`
### plugin file name and class name:
- the plugin file containing the main plugin class must be named in _snake case_ and must have the suffix `_scenario_plugin`:
- `example_scenario_plugin.py`
- the main plugin class must named in _capital camel case_ and must have the suffix `ScenarioPlugin` :
- `ExampleScenarioPlugin`
- the file name must match the class name in the respective syntax:
- `example_scenario_plugin.py` -> `ExampleScenarioPlugin`

### scenario type:
- the scenario type __must__ be unique between all the scenarios.

### logging:
If your new scenario does not adhere to the naming conventions, an error log will be generated in the Krkn standard output,
providing details about the issue:

```commandline
2024-10-03 18:06:31,136 [INFO] 📣 `ScenarioPluginFactory`: types from config.yaml mapped to respective classes for execution:
2024-10-03 18:06:31,136 [INFO] ✅ type: application_outages_scenarios ➡️ `ApplicationOutageScenarioPlugin`
2024-10-03 18:06:31,136 [INFO] ✅ types: [hog_scenarios, arcaflow_scenario] ➡️ `ArcaflowScenarioPlugin`
2024-10-03 18:06:31,136 [INFO] ✅ type: container_scenarios ➡️ `ContainerScenarioPlugin`
2024-10-03 18:06:31,136 [INFO] ✅ type: managedcluster_scenarios ➡️ `ManagedClusterScenarioPlugin`
2024-10-03 18:06:31,137 [INFO] ✅ types: [pod_disruption_scenarios, pod_network_scenario, vmware_node_scenarios, ibmcloud_node_scenarios] ➡️ `NativeScenarioPlugin`
2024-10-03 18:06:31,137 [INFO] ✅ type: network_chaos_scenarios ➡️ `NetworkChaosScenarioPlugin`
2024-10-03 18:06:31,137 [INFO] ✅ type: node_scenarios ➡️ `NodeActionsScenarioPlugin`
2024-10-03 18:06:31,137 [INFO] ✅ type: pvc_scenarios ➡️ `PvcScenarioPlugin`
2024-10-03 18:06:31,137 [INFO] ✅ type: service_disruption_scenarios ➡️ `ServiceDisruptionScenarioPlugin`
2024-10-03 18:06:31,137 [INFO] ✅ type: service_hijacking_scenarios ➡️ `ServiceHijackingScenarioPlugin`
2024-10-03 18:06:31,137 [INFO] ✅ type: cluster_shut_down_scenarios ➡️ `ShutDownScenarioPlugin`
2024-10-03 18:06:31,137 [INFO] ✅ type: syn_flood_scenarios ➡️ `SynFloodScenarioPlugin`
2024-10-03 18:06:31,137 [INFO] ✅ type: time_scenarios ➡️ `TimeActionsScenarioPlugin`
2024-10-03 18:06:31,137 [INFO] ✅ type: zone_outages_scenarios ➡️ `ZoneOutageScenarioPlugin`

2024-09-18 14:48:41,735 [INFO] Failed to load Scenario Plugins:

2024-09-18 14:48:41,735 [ERROR] ⛔ Class: ExamplePluginScenario Module: krkn.scenario_plugins.example.example_scenario_plugin
2024-09-18 14:48:41,735 [ERROR] ⚠️ scenario plugin class name must start with a capital letter, end with `ScenarioPlugin`, and cannot be just `ScenarioPlugin`.
```

>[!NOTE]
>If you're trying to understand how the scenario types in the config.yaml are mapped to
> their corresponding plugins, this log will guide you!
> Each scenario plugin class mentioned can be found in the `krkn/scenario_plugin` folder
> simply convert the camel case notation and remove the ScenarioPlugin suffix from the class name
> e.g `ShutDownScenarioPlugin` class can be found in the `krkn/scenario_plugin/shut_down` folder.

## ExampleScenarioPlugin
The [ExampleScenarioPlugin](../krkn/tests/test_classes/example_scenario_plugin.py) class included in the tests folder can be used as a scaffolding for new plugins and it is considered
part of the documentation.

For any questions or further guidance, feel free to reach out to us on the
[Kubernetes workspace](https://kubernetes.slack.com/) in the `#krkn` channel.
We’re happy to assist. Now, __release the Krkn!__

Binary file added docs/scenario_plugin_pycharm.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading