Skip to content

Latest commit

 

History

History
171 lines (114 loc) · 11.7 KB

write-an-attack-extension.md

File metadata and controls

171 lines (114 loc) · 11.7 KB

How To Write An Attack Extension

This how-to article will teach you how to write an extension using ActionKit that adds new attack capabilities. We will look closely at existing extensions to learn about semantic conventions, best practices, expected behavior and necessary boilerplate.

The article assumes that you have read the overview documentation for the Action API and possibly skimmed over the expected API endpoints. We are leveraging the Go programming language within the examples, but you can use every other language as long as you adhere to the expected API.

Necessary Boilerplate

Every extension needs boilerplate code to start an HTTP server, initialize logging, and register HTTP handlers that comply with the expected API. The following excerpt shows how the go-kubectl example extension does this.

func main() {
extlogging.InitZeroLog()
exthttp.RegisterHttpHandler("/actions", exthttp.GetterAsHandler(getActionList))
exthttp.RegisterHttpHandler("/actions/rollout-restart", exthttp.GetterAsHandler(getRolloutRestartDescription))
exthttp.RegisterHttpHandler("/actions/rollout-restart/prepare", prepareRolloutRestart)
exthttp.RegisterHttpHandler("/actions/rollout-restart/start", startRolloutRestart)
exthttp.RegisterHttpHandler("/actions/rollout-restart/status", rolloutRestartStatus)
exthttp.RegisterHttpHandler("/actions/rollout-restart/stop", stopRolloutRestart)
port := 8083
log.Info().Msgf("Starting go-kubectl server on port %d. Get started via /actions", port)
err := http.ListenAndServe(fmt.Sprintf(":%d", port), nil)
if err != nil {
log.Err(err).Msg("Failed to start server")
}
}

The excerpt above shows an extension leveraging our ExtensionKit, e.g., to register HTTP handlers or initialize the logging system. ExtensionKit makes authoring Steadybit extensions easier through utilities that help you comply with the expected behavior of extensions.

Note the HTTP endpoints' paths. You can choose these paths freely. The Steadybit agent only needs to know about the entry point into the extension. That would be {{origin}}/actions in this case.

Action List

Let us start with the first API implementation: The list of supported actions. This endpoint is expected to provide a list of all actions that the extension supports. Note that an attack is a special kind of action.

UML class diagram depicting that an Attack is also an Action (Attack inherits from Action)

The attack list API endpoint's response body needs a JSON encoded list of HTTP endpoints that the Steadybit agent can call to learn more about each action.

func getActionList() action_kit_api.ActionList {
return action_kit_api.ActionList{
Actions: []action_kit_api.DescribingEndpointReference{
{
"GET",
"/actions/rollout-restart",
},
},
}
}

All paths will be resolved relative to the URL used to register the extension at the agent. For example, if https://extension/some-path was used to register and this endpoint returns /actions/rollout, the agent will make the request to https://extension/some-path/actions/rollout. This allows extensions to run behind reverse proxies, rewriting the path.

Action Description

This is where the fun begins! The action description HTTP endpoint needs to expose information about the UI presentation of the attack, how end-users can configure it and which endpoints to call to prepare/start/stop the attack.

func getRolloutRestartDescription() action_kit_api.ActionDescription {
return action_kit_api.ActionDescription{
Id: "com.steadybit.example.attacks.kubernetes.rollout-restart",
Label: "Rollout Restart Deployment",
Description: "Execute a rollout restart for a Kubernetes deployment",
Icon: extutil.Ptr("data:image/svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%20width%3D%2224%22%20height%3D%2224%22%20viewBox%3D%220%200%2024%2024%22%3E%3Cpath%20d%3D%22M13.95%2013.5h-.23c-.18.11-.26.32-.18.5l.86%202.11c.83-.53%201.46-1.32%201.79-2.25l-2.23-.36h-.01m-3.45.29a.415.415%200%2000-.38-.29h-.08l-2.22.37c.33.92.96%201.7%201.79%202.23l.85-2.07V14c.04-.05.04-.14.04-.21m1.83.81a.378.378%200%2000-.51-.15c-.07.05-.12.08-.15.15h-.01l-1.09%201.97c.78.26%201.62.31%202.43.12.14-.03.29-.07.43-.12l-1.09-1.97h-.01m3.45-4.57L14.1%2011.5l.01.03a.37.37%200%2000-.04.53c.05.06.11.1.18.12l.01.01%202.17.62c.07-.97-.14-1.95-.65-2.78m-3.11.16c.01.21.18.37.39.36.08%200%20.15-.02.21-.05h.01l1.83-1.31a4.45%204.45%200%2000-2.57-1.24l.13%202.24m-1.94.31c.17.11.4.08.52-.09.05-.06.07-.13.08-.21h.01l.12-2.25c-.15.02-.3.05-.46.08-.8.18-1.54.58-2.12%201.16l1.84%201.31h.01m-.99%201.69c.2-.05.32-.26.26-.46%200-.08-.05-.14-.11-.19v-.01L8.21%2010c-.52.86-.74%201.84-.63%202.82l2.16-.62v-.01m1.64.66l.62.3.62-.3.15-.67-.43-.53h-.69l-.43.53.16.67m10.89%201.32L20.5%206.5c-.09-.42-.37-.76-.74-.94l-7.17-3.43c-.37-.17-.81-.17-1.19%200L4.24%205.56c-.37.18-.65.52-.74.94l-1.77%207.67c-.05.2-.05.4%200%20.59.01.06.03.12.05.18.03.09.08.19.13.27.03.04.05.08.09.11l4.95%206.18c.02%200%20.05.04.05.06.1.09.19.16.28.22.12.08.26.14.4.17.11.05.23.05.32.05h8.12c.07%200%20.14-.03.2-.05.05-.01.1-.03.14-.04.04-.02.07-.03.11-.05.05-.02.1-.05.15-.08.12-.08.23-.18.33-.28l.15-.2%204.8-5.98c.1-.12.17-.25.22-.38.02-.06.04-.12.05-.18.05-.19.05-.4%200-.59m-7.43%202.99c.02.06.04.12.07.17-.04.08-.06.17-.03.26.12.24.23.46.38.68.08.11.16.23.24.34%200%20.03.03.08.04.12.12.2.06.46-.15.59s-.47.05-.59-.15c-.01-.03-.02-.05-.03-.08-.02-.03-.04-.09-.06-.09-.05-.15-.09-.28-.12-.41-.09-.25-.17-.49-.3-.72a.375.375%200%2000-.21-.14l-.08-.16c-1.29.48-2.7.48-3.97-.01l-.1.18c-.07.01-.14.04-.19.09-.14.24-.24.49-.33.77-.03.13-.07.26-.12.4-.02%200-.04.07-.06.1a.43.43%200%2001-.81-.29c.01-.03.03-.05.04-.08.04-.03.04-.08.04-.11.09-.12.16-.23.24-.35.16-.21.29-.45.39-.69a.54.54%200%2000-.03-.25l.07-.18a5.611%205.611%200%2001-2.47-3.09l-.2.03a.388.388%200%2000-.23-.09c-.27.05-.51.13-.77.22-.11.06-.24.11-.37.15-.03.01-.07.02-.13.03a.438.438%200%2001-.54-.27c-.07-.23.04-.47.28-.55.02%200%20.05-.01.08-.01v-.01h.01l.11-.02c.14-.04.28-.04.41-.04.26%200%20.52-.06.77-.12.08-.05.14-.11.19-.19l.19-.05c-.21-1.36.1-2.73.86-3.87l-.14-.12c0-.09-.03-.18-.08-.25-.2-.17-.41-.32-.64-.45-.12-.06-.24-.13-.36-.21-.02-.02-.06-.05-.08-.07l-.01-.01c-.2-.16-.25-.42-.11-.63.09-.1.21-.15.35-.15.11.01.21.05.3.12l.09.07c.1.09.19.2.28.3.18.19.37.37.58.52.08.04.17.05.26.03l.15.11c.75-.8%201.73-1.36%202.8-1.6.25-.06.52-.1.78-.12l.01-.18a.45.45%200%2000.14-.23c.01-.26-.01-.52-.05-.77-.03-.13-.05-.27-.06-.41V5.1c-.02-.24.15-.45.39-.48s.44.15.47.38v.22c-.01.14-.03.28-.06.41-.04.25-.06.51-.05.77.02.1.07.17.14.22l.01.19c1.36.12%202.62.73%203.56%201.72l.16-.12c.09.02.18.01.26-.03.21-.15.41-.33.58-.52.09-.1.18-.2.28-.3.03-.02.07-.06.1-.06.17-.18.44-.18.59%200%20.19.16.18.43%200%20.6%200%20.02-.03.04-.06.06a2.495%202.495%200%2001-.44.28c-.23.13-.45.28-.64.45-.06.07-.09.15-.08.24l-.16.14a5.44%205.44%200%2001.88%203.86l.19.05c.04.08.11.14.19.18.25.07.51.11.77.14h.41c.03.03.08.04.12.05.24.03.4.25.37.49-.05.23-.24.4-.48.37-.03-.01-.07-.01-.07-.02v-.01c-.06%200-.1-.01-.14-.02-.13-.04-.25-.09-.36-.15-.26-.1-.5-.17-.77-.21-.09%200-.17%200-.23.08-.07-.01-.13-.02-.19-.03-.41%201.31-1.31%202.41-2.47%203.11z%22%20fill%3D%22currentcolor%22%2F%3E%3C%2Fsvg%3E"),
Version: "1.5.0",
Kind: action_kit_api.Attack,
Category: extutil.Ptr("state"),
TargetType: extutil.Ptr("kubernetes-deployment"),
TimeControl: action_kit_api.Internal,

The excerpt above shows fundamental options for every action. You can learn more about these within the action API documentation page. Within this document, we are focussing on best practices specific to attacks.

  • kind: Must be set to attack. This option controls the visual appearance, grouping and labeling within the Steadybit user interface.
  • targetType: Attacks typically operate on a target. So you almost certainly want to specify a target type in here. You can learn about available target types through the Steadybit user interface via Settings -> Extensions -> Target Types.

Parameters: []action_kit_api.ActionParameter{
{
Label: "Wait for rollout completion",
Name: "wait",
Type: action_kit_api.Boolean,
Advanced: extutil.Ptr(true),
DefaultValue: extutil.Ptr("false"),
},
},

There are no special parameter contracts for attacks. So within this area, you will define them just like any other action. Also, refer to our parameter types documentation to learn more about the supported parameter types.

Prepare: action_kit_api.MutatingEndpointReference{
Method: "POST",
Path: "/actions/rollout-restart/prepare",
},
Start: action_kit_api.MutatingEndpointReference{
Method: "POST",
Path: "/actions/rollout-restart/start",
},
Status: extutil.Ptr(action_kit_api.MutatingEndpointReferenceWithCallInterval{
Method: "POST",
Path: "/actions/rollout-restart/status",
}),
Stop: extutil.Ptr(action_kit_api.MutatingEndpointReference{
Method: "POST",
Path: "/actions/rollout-restart/stop",
}),

The last part of the action description is the list of endpoints to call when preparing, starting, checking and stopping the attack. The following sections will explain each endpoint's responsibility in more detail. For now, understand that you can define arbitrary HTTP endpoint paths.

Action Execution

We assume you have read the more general action API documentation on the action execution phases. If you haven't done so, now would be a good time to read these sections, as we won't repeat this content.

Actions only need to define prepare and start endpoints. The status and stop endpoints are optional. Let's look into the detail for each of those endpoints for attack use cases.

Note that all endpoints are supposed to respond in a maximum of 15 seconds. You can initiate long-running processes within the endpoints, but you should not synchronously wait for them to complete. For example, you should not trigger a redeployment within an attack and wait synchronously for it to come back up. You can use the status endpoint to implement a polling approach if you need to watch the status.

Prepare

In addition to what the action API docs mention, attacks will typically want to prepare the attack execution even further by generating IDs, creating entities in target systems and more. That was pretty abstract. Let us look into examples!

	// Source: https://github.com/steadybit/extension-aws/blob/c3b268b28291024a8e4bed67fe765533367118d5/extec2/instance_attack_state.go#L94-L107
	instanceId := request.Target.Attributes["aws-ec2.instance.id"]
	if instanceId == nil || len(instanceId) == 0 {
		return nil, extension_kit.ToError("Target is missing the 'aws-ec2.instance.id' tag.", nil)
	}

	action := request.Config["action"]
	if action == nil {
		return nil, extension_kit.ToError("Missing attack action parameter.", nil)
	}

	return extutil.Ptr(InstanceStateChangeState{
		InstanceId: instanceId[0],
		Action:     action.(string),
	}), nil

The most fundamental preparation activity is the extraction of attack parameters and target attributes into the action state. This extraction is necessary because start, status and stop only receive the action state. It also helps to keep the other endpoints' implementations more straightforward. Within the excerpt above from the AWS EC2 instance state change attack, we extract the aws-ec2.instance.id target attribute and the action parameter for later use.

	// Source: https://github.com/steadybit/extension-kong/blob/2c2dfbbd98b69c12e033356ae10c95fc38c573e4/services/request_termination_attack.go#L172-L181
	plugin, err := instance.CreatePlugin(&kong.Plugin{
		Name:    utils.String("request-termination"),
		Enabled: utils.Bool(false),
		Tags: utils.Strings([]string{
			"created-by=steadybit",
		}),
		Service:  service,
		Consumer: consumer,
		Config:   config,
	})

Some attacks go even further, as the excerpt above shows. The Kong request termination attack already inserts a piece of configuration into the attacked system. However, note that the configuration is marked as disabled. The attack will only switch the configuration from disabled to enabled within the start endpoint. Such patterns can be applied where possible for comprehensive preparation incorporating, among others, a validation that system modification is possible, i.e., that the attack extension is allowed to modify the system state.

Start

This is where the magic happens! Although quite often, this magic is just glue code. Within the start endpoint, you can finally start to break stuff. Execute shell scripts, trigger downstream HTTP calls, use API clients and whatever else you need to realize the attack. Again, let us look at some open-source attack implementations to see some patterns.

cmd := exec.Command("kubectl",
"rollout",
"restart",
"--namespace",
startAttackRequest.State["Namespace"].(string),
fmt.Sprintf("deployment/%s", startAttackRequest.State["Deployment"].(string)))
cmdOut, cmdErr := cmd.CombinedOutput()
if cmdErr != nil {
log.Err(cmdErr).Msgf("Failed to execute rollout restart: %s", cmdOut)
exthttp.WriteError(w, extension_kit.ToError(fmt.Sprintf("Failed to execute rollout restart: %s", cmdOut), cmdErr))
return
}

The above is an excerpt from the Go kubectl example attack. It leverages the kubectl CLI to implement an attack. Using existing CLI tools is a fairly common pattern that makes it easy to realize an attack quickly.

Next, we have an excerpt from the AWS EC2 instance state change attack. This attack uses the AWS SDK to trigger system state changes.

	// Source: https://github.com/steadybit/extension-aws/blob/c3b268b28291024a8e4bed67fe765533367118d5/extec2/instance_attack_state.go#L159-L162
	in := ec2.TerminateInstancesInput{
		InstanceIds: instanceIds,
	}
	_, err = client.TerminateInstances(ctx, &in)

Like the above, the Kong request termination attack leverages the Kong API client to the configuration of a plugin created through the prepare endpoint.

	// Source: https://github.com/steadybit/extension-kong/blob/2c2dfbbd98b69c12e033356ae10c95fc38c573e4/services/request_termination_attack.go#L230-L233
	_, err := instance.UpdatePlugin(&kong.Plugin{
		ID:      &pluginId,
		Enabled: utils.Bool(true),
	})

Status

The status endpoint is typically used with attacks that leverage an internal time control to implement a polling "are you done yet?"-check. Remember how each endpoint needs to respond within 15 seconds? Operations taking longer than these 15 seconds benefit from the status endpoint. Like before, let us look at an example!

if !attackStatusRequest.State["Wait"].(bool) {
exthttp.WriteBody(w, action_kit_api.StatusResult{
Completed: true,
})
return
}
cmd := exec.Command("kubectl",
"rollout",
"status",
"--watch=false",
"--namespace",
attackStatusRequest.State["Namespace"].(string),
fmt.Sprintf("deployment/%s", attackStatusRequest.State["Deployment"].(string)))
cmdOut, cmdErr := cmd.CombinedOutput()
if cmdErr != nil {
exthttp.WriteError(w, extension_kit.ToError(fmt.Sprintf("Failed to check rollout status: %s", cmdOut), cmdErr))
return
}
cmdOutStr := string(cmdOut)
completed := !strings.Contains(strings.ToLower(cmdOutStr), "waiting")
exthttp.WriteBody(w, action_kit_api.StatusResult{
Completed: completed,
})

The excerpt above shows code from the Go kubectl example attack. This attack supports the execution of kubectl rollout restart..., i.e., a simulated rollout of a Kubernetes deployment. The attack supports two modes:

  1. Just triggering the simulated rollout.
  2. Triggering and waiting for completion of the simulated rollout.

For the first mode, the status endpoint is not necessary. However, for mode two, the status endpoint is necessary. Rollouts routinely take longer than 15 seconds, so waiting must happen outside the start endpoint.

Notice how the attack first checks whether its parameters instructed it to wait for rollout completion. If waiting isn't configured, it will immediately respond with completed: true in the JSON response body. When configured to wait, it will check the rollout status via the kubectl CLI and then respond according to the CLI's output.

The status endpoint is called until it responds with completed: true or until the experiment is canceled.

Stop

Stop is the final endpoint. It is optional. When it is defined, the implementation is expected to revert all system modifications. Meaning: There should not be any evidence that an attack was executed. Not every attack needs a stop implementation. For example, one cannot stop the reboot of an AWS EC2 instance.

	// Source: https://github.com/steadybit/extension-kong/blob/2c2dfbbd98b69c12e033356ae10c95fc38c573e4/services/request_termination_attack.go#L267-L270
	err := instance.DeletePlugin(&pluginId)
	if err != nil {
		return attack_kit_api.Ptr(utils.ToError(fmt.Sprintf("Failed to delete plugin within Kong for plugin ID '%s'", pluginId), err))
	}

The above shows what the Kong request termination attack is doing to revert system modifications. The attack created a configuration within the Kong API gateway within the prepare method. As part of the stop endpoint, the attack deletes the configuration.

Extension Registration

Congratulations, the extension is now completed! This leaves only one last step: Announcing the extension to the Steadybit agents. Read more on this topic within our separate action registration document.