Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: native scripting engine #112

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions adr/0001-runner-migration.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# 2. Runner Migration
# 1. Runner Migration

Date: 1 March 2024
Date: 2024-03-01

## Status

Accepted

## Context
Expand Down
170 changes: 170 additions & 0 deletions adr/0002-scripting-engine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
# 2. Scripting Engine

Date: 2024-06-03

## Status

Draft

## Context

Presently, only action types provided by `maru`[^1] are:

- `cmd:` (i.e. [`BaseAction`](https://github.com/defenseunicorns/maru-runner/blob/main/src/types/actions.go#L23)) - basic shell command execution
- `wait:` (i.e. [`ActionWait`](https://github.com/defenseunicorns/maru-runner/blob/main/src/types/actions.go#L37)), which supports two types of "status checks":
- `cluster:` (i.e. [`ActionWaitCluster`](https://github.com/defenseunicorns/maru-runner/blob/main/src/types/actions.go#L43)) -
perform status checks against K8s cluster resources
- `network:` (i.e. [`ActionWaitNetwork`](https://github.com/defenseunicorns/maru-runner/blob/main/src/types/actions.go#L51)) -
poll arbitrary HTTP/TCP endpoints for a given status code

The `wait` action is really the only abstraction that is provided around
shell scripting. As we seek to enhance or expand built-in capabilities,
we have three high-level options for doing so:

1. expand YAML-based DSL with more configuration options
2. vendor additional "tools" and encourage `./zarf <tool> [...]` pattern
3. **provide a cross-platform scripting engine (with builtins
for common tasks)**

Below is an example of a common use case for HTTP status checks that
is not readily solved by the existing `wait.network:` probe:

```yaml
tasks:
- description: SonarQube UI Status Check
maxRetries: 30
cmd: |
STATUS=$(curl -s 'https://sonarqube.uds.dev/api/system/status' | ./uds zarf tools yq '.status')
echo "SonarQube system status: ${STATUS}"
if [ $STATUS != "UP" ]; then
sleep 10
exit 1
fi
```

### Native Scripting Engine

For the purposes of this proposal, we will scope the evaluation to libraries
that can be embedded natively.

Tools like [`zx`](https://google.github.io/zx/getting-started)
(which provides a JS API on top of shell scripting) are interesting, but
anything that is not written in Go would be difficult to integrate in a way
that actually improves portability of user-defined scripts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
that actually improves portability of user-defined scripts.
that actually improves portability of user-defined scripts without delegating to more complex paradigms like WASM.

(as Jeff Goldbloom says "life finds a way" and zx could be embedded but it would be quite annoying to do so)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the purposes of this proposal, we will scope the evaluation to libraries that can be embedded natively.

anything that is not written in Go would be difficult to integrate

I think the point stands as is. zx could not be integrated using WASM either because it relies on the Node.JS runtime. Your statement presents WASM as "complex" but similar to a native runtime, which it is not. It is just portable bytecode.

My point here was to scope the eval to Go-native runtimes. Otherwise, as you say, "life finds a way" and literally anything could be considered.


[github.com/avelino/awesome-go](https://github.com/avelino/awesome-go#embeddable-scripting-languages)
provides a fairly comprehensive list of embeddable scripting languages.
There are some good options here, including:

- [`starlark-go`](https://github.com/google/starlark-go), Go implementation
of Starlark: Python-like language with deterministic evaluation and hermetic
execution
- [`starlet`](https://github.com/1set/starlet), which
enhances the `starlark` runtime with useful extensions like `http`
- expression languages like [`expr`](https://github.com/expr-lang/expr),
[`cel`](https://github.com/google/cel-go) (which is used by Kubernetes[^4]),
and [`cue`](https://github.com/cue-lang/cue)
- [`otto`](https://github.com/robertkrimen/otto), a JS parser/interpreter
written in Go

These are all great, portable and secure options. However, with the exception
of `starlet`, they are just language runtimes and lack rich APIs that we could
expose directly to users. We would have to build these APIs ourselves from
scratch.
Comment on lines +59 to +73
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one analysis missing here would be "health of ecosystem" - I would summarize that after a little digging as:

starlark-go and cel are both Google backed and have a good history of releases - starlet has one release and is backed by just one contributor - otto is backed by a single developer (San Francisco, CA) but does have a decently sized community / release history - risor is also backed by a single developer (Sterling, VA) and does have a decently sized community / release history.

cel is also not turing complete which could be a big downside for more complicated things one might want to do.

Probably the only real choices based on that are starlark-go, otto and risor and I do agree that raw starlark-go is limiting and while we could create an std lib with otto there would be a lot more wiring (we'd likely need an embed fs with node modules in it)

Copy link
Author

@marshall007 marshall007 Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [`starlark-go`](https://github.com/google/starlark-go), Go implementation
of Starlark: Python-like language with deterministic evaluation and hermetic
execution
- [`starlet`](https://github.com/1set/starlet), which
enhances the `starlark` runtime with useful extensions like `http`
- expression languages like [`expr`](https://github.com/expr-lang/expr),
[`cel`](https://github.com/google/cel-go) (which is used by Kubernetes[^4]),
and [`cue`](https://github.com/cue-lang/cue)
- [`otto`](https://github.com/robertkrimen/otto), a JS parser/interpreter
written in Go
These are all great, portable and secure options. However, with the exception
of `starlet`, they are just language runtimes and lack rich APIs that we could
expose directly to users. We would have to build these APIs ourselves from
scratch.
- [`starlark-go`](https://github.com/google/starlark-go), Go implementation
of Starlark: Python-like language with deterministic evaluation and hermetic
execution
- [`starlet`](https://github.com/1set/starlet), which
enhances the `starlark` runtime with useful extensions like `http`
- expression languages like [`expr`](https://github.com/expr-lang/expr),
[`cel`](https://github.com/google/cel-go) (which is used by Kubernetes[^4]),
and [`cue`](https://github.com/cue-lang/cue)
- [`otto`](https://github.com/robertkrimen/otto), a JS parser/interpreter
written in Go
These are all great, portable and secure options. However, with the exception
of `starlet`, they are just language runtimes and lack rich APIs that we could
expose directly to users. We would have to build these APIs ourselves from
scratch.
`starlark-go` and `cel` are both Google backed and have a good history of
releases, but `cel` is a configuration language and not turing-complete.
`starlet` has one release and is backed by just one contributor. `otto` is
backed by a single developer (San Francisco, CA) but does have a
decently sized community / release history.
For our purposes, the only reasonable choices listed above would be
`starlark-go` and `otto`.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Racer159 does this capture it?


Not mentioned in the list above is [Risor](https://risor.io/), which aims to
be _"fast and flexible scripting for Go developers and DevOps"_, making it an
ideal candidate for scripting within `maru`.

Here is the example above rewritten using [Risor syntax](https://risor.io/docs/syntax):

```yaml
- description: SonarQube UI Status Check
maxRetries: 30
script: |
r := fetch('https://sonarqube.uds.dev/api/system/status').json()
return r['status'] == 'UP'
```

Risor has a bunch of built-in modules for DevOps use cases and is totally
pluggable for our own implementation. These modules in particular come
to mind as potentially useful for delivery:
marshall007 marked this conversation as resolved.
Show resolved Hide resolved

- [`aws`](https://risor.io/docs/modules/aws)
- [`vault`](https://risor.io/docs/modules/vault)
- [`kubernetes`](https://risor.io/docs/modules/kubernetes)
- [`pgx`](https://risor.io/docs/modules/pgx)

**Note:** `starlet` provides [a number of builtins](https://github.com/1set/starlet/tree/master/lib)
like `http` and `csv`, which is comparable to those provided by `risor`. The
big difference is that `starlet` does not aim to support DevOps use cases
directly, so it would likely never include things like `vault` or `aws` integrations.

Risor can be embedded into `maru` as a library, which means that all
script execution would happen natively in the Go runtime. This has
huge advantages for both portability and security. As with the
[vendoring tools](#vendor-tools) approach, we can continue to ship a
single binary with minimal-to-zero external dependencies, but with
the additional advantage of not having to rely on a host-specific shell.
marshall007 marked this conversation as resolved.
Show resolved Hide resolved

### Alternatives

#### YAML-based DSL

For the specific case on network status checks, we could add support
for `jsonpath`-based `condition` checks (and align with existing
`wait.cluster.condition`).

```yaml
- description: SonarQube UI Status Check
maxRetries: 30
wait:
network:
protocol: https
address: sonarqube.uds.dev/api/system/status
code: 200
condition: '{.body.status}'='UP'
```

Getting the API for `wait.network.condition` would be challenging,
though. Do we assume the response body is always JSON? If not, what
sort of expressions would be available for HTML or text responses?

#### Vendor Tools

Note that the original example depended on `curl`. As a result, this
is not necessarily portable. It may work on WSL, but probably not
vanilla Powershell. Though it is widely available, most Linux
distributions also do not ship with `curl`.

```yaml
tasks:
- description: SonarQube UI Status Check
maxRetries: 30
cmd: |
STATUS=$(./uds zarf tools curl -s 'https://sonarqube.uds.dev/api/system/status' | ./uds zarf tools yq '.status')
echo "SonarQube system status: ${STATUS}"
if [ $STATUS != "UP" ]; then
sleep 10
exit 1
fi
```

Speaking of Powershell, we would need to maintain lots of Unix command
rewrites for anything to be reliable. In the above example:

- `sleep 10` -> `Start-Sleep -Seconds 10`[^2]
- `curl ...` -> probably ok for the most part, but is actually aliased
to `Invoke-WebRequest`[^3]

## Decision



## Consequences

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we go with risor i think it will be good to note that this doesnt actually limit the ability of using the other solutions that could be used today and as such is backwards compatible. It does however give us and our user much more flexibility in defining complex logic in the tasks and creating better, more reusable workflows that are more portable and dont rely on system specific tooling.


[^1]: `wait` actions are [implemented in `zarf`](https://github.com/defenseunicorns/zarf/blob/main/src/pkg/utils/wait.go#L32) currently
[^2]: https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/start-sleep
[^3]: https://stackoverflow.com/a/73956607
[^4]: ["The Common Expression Language (CEL) is used in the Kubernetes API to declare validation rules, policy rules, and other constraints or conditions."](https://kubernetes.io/docs/reference/using-api/cel/)
Loading