Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues within initalizer error handling if script is incorrect #435

Open
Phil1602 opened this issue Jul 30, 2024 · 5 comments · May be fixed by #453 or #450
Open

Issues within initalizer error handling if script is incorrect #435

Phil1602 opened this issue Jul 30, 2024 · 5 comments · May be fixed by #453 or #450
Labels
bug Something isn't working

Comments

@Phil1602
Copy link

Phil1602 commented Jul 30, 2024

Brief summary

We realized, that our TestRuns get stuck without any information/logs printed out if the script itself is incorrect.

Cause

We already had a deeper look and realized that this is likely related to the Log message parsing of the k6 inspect executed within the initalizer here:

"mkdir -p $(dirname %s) && k6 archive %s -O %s %s 2> /tmp/k6logs && k6 inspect --execution-requirements %s 2> /tmp/k6logs ; ! cat /tmp/k6logs | grep 'level=error'",

When we execute k6 inspect manually inside the container, we get the following error:

/ # k6 inspect --execution-requirements  /test/<REDACTED>.js
ERRO[0009] could not initialize '/test/<REDACTED>.js': could not load JS test 'file:///test/<REDACTED>.js': unknown executor type 'burst' 

The log parsing mentioned above, basically does a | grep 'level=error, which does not work for the error message we are facing since the log format seems to be different.

Might be related to: grafana/k6-docs#877

k6-operator version or image

ghcr.io/grafana/k6-operator:controller-v0.0.14

Helm chart version (if applicable)

No response

TestRun / PrivateLoadZone YAML

Since we built a custom k6 image to include the script to be used as localfile, it would need some additional effort to make this available.

However, IMO this is not really related to a specific TestRun.

Other environment details (if applicable)

k6 version: k6 v0.51.0 (go1.22.4, linux/amd64)

Steps to reproduce the problem

  • Deploy any TestRun resource with a broken JS script

Expected behaviour

  • Initalizer Pods fails or at least logs the issues which are encountered during k6 inspect
  • TestRun resource is in Failed state

Actual behaviour

  • Initalizer Pod runs but does not print out anything
  • The k6-operator-controller-manager logs errors (see below) about unparsable JSON without any information
  • TestRun resource stuck in initilization phase
Logs of k6-operator
2024-07-30T09:11:00Z    ERROR   controllers.TestRun     unable to marshal: ``   {"namespace": "loadtesting", "name": "<REDACTED>", "reconcileID": "<REDACTED>", "error": "unexpected end of JSON input"}
github.com/grafana/k6-operator/controllers.inspectTestRun
        /workspace/controllers/common.go:105
github.com/grafana/k6-operator/controllers.RunValidations
        /workspace/controllers/k6_initialize.go:55
github.com/grafana/k6-operator/controllers.(*TestRunReconciler).reconcile
        /workspace/controllers/testrun_controller.go:137
github.com/grafana/k6-operator/controllers.(*TestRunReconciler).Reconcile
        /workspace/controllers/testrun_controller.go:80
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:227
2024-07-30T09:11:00Z    ERROR   Reconciler error        {"controller": "testrun", "controllerGroup": "k6.io", "controllerKind": "TestRun", "TestRun": {"name":"<REDACTED>","namespace":"loadtesting"}, "namespace": "loadtesting", "name": "<REDACTED>", "reconcileID": "<REDACTED>", "error": "unexpected end of JSON input"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:227
@Phil1602 Phil1602 added the bug Something isn't working label Jul 30, 2024
@Phil1602 Phil1602 changed the title Issues within rrror handling if script is incorrect Issues within initalizer error handling if script is incorrect Jul 30, 2024
@frittentheke
Copy link
Contributor

This loosely relates to #401 which is about treating initializer errors as error state of the whole TestRun CR.

@yorugac
Copy link
Collaborator

yorugac commented Aug 20, 2024

Hi @Phil1602, as mentioned by @frittentheke, this indeed has been raised and fixed: could you please update k6-operator to the latest version and try again? Thanks!

Also, in general, it is recommended to debug k6 scripts locally before deploying the TestRun 🙂

@Phil1602
Copy link
Author

Hi @yorugac,

We are using k6 verification within our pipeline as a step before creating the TestRun in the meantime. Anyways, IMO it would have been still an issue, if a wrong TestRun is not reported as such.

I will try out the latest release v0.0.16 and verify your assumptions! Thanks for the hints!

frittentheke added a commit to frittentheke/k6-operator that referenced this issue Aug 22, 2024
…alization success

1) Introduce an EmptyDir volume for temporary data
This removes the need for any mkdir and avoids writing to the container filesystem,
allowing for e.g. `readOnlyRootFilesystem` security contexts ([1]).

2) Remove all output redirection and grepping for error indicating strings in favor of
using the exit codes from the commands to indicate success or failure.
This approach is much cleaner in regards to any changes to the output of K6 and the vast
space of error there is.

It also reestablishes the clear contract of the `inspect` command to judge the provided
config.

`k6 inspect --execution-requirements` ([2,3]) while also printing warnings or the JSON
should clearly indicate via `err` if there are issues with the provided tests, which
are then converted to a non-zero RC ([4]).

[1] https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
[2] https://github.com/grafana/k6/blob/c212bd277aeedbf8395b917df11bd9be4da490a4/cmd/inspect.go#L34
[3] https://github.com/grafana/k6/blob/c212bd277aeedbf8395b917df11bd9be4da490a4/cmd/inspect.go#L64
[4] https://github.com/grafana/k6/blob/master/cmd/root.go#L90-L118

Fixes: grafana#435
frittentheke added a commit to frittentheke/k6-operator that referenced this issue Aug 22, 2024
…alization success

1) Introduce an EmptyDir volume for temporary data
This removes the need for any mkdir and avoids writing to the container filesystem,
allowing for e.g. `readOnlyRootFilesystem` security contexts ([1]).

2) Remove all output redirection and grepping for error indicating strings in favor of
using the exit codes from the commands to indicate success or failure.
This approach is much cleaner in regards to any changes to the output of K6 and the vast
space of error there is.

It also reestablishes the clear contract of the `inspect` command to judge the provided
config.

`k6 inspect --execution-requirements` ([2,3]) while also printing warnings or the JSON
should clearly indicate via `err` if there are issues with the provided tests, which
are then converted to a non-zero RC ([4]).

[1] https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
[2] https://github.com/grafana/k6/blob/c212bd277aeedbf8395b917df11bd9be4da490a4/cmd/inspect.go#L34
[3] https://github.com/grafana/k6/blob/c212bd277aeedbf8395b917df11bd9be4da490a4/cmd/inspect.go#L64
[4] https://github.com/grafana/k6/blob/master/cmd/root.go#L90-L118

Fixes: grafana#435
@frittentheke
Copy link
Contributor

I will try out the latest release v0.0.16 and verify your assumptions! Thanks for the hints!

@yorugac while #401 does indeed treat an error of the Initializer Pod as error of the TestRun CR (

returnErr = errors.New("initalizer job has failed")
).

The cause of the issue @Phil1602 reported here is with the exit code (leading to the Pod actually failing) though. If you look at

"mkdir -p $(dirname %s) && k6 archive %s -O %s %s 2> /tmp/k6logs && k6 inspect --execution-requirements %s 2> /tmp/k6logs ; ! cat /tmp/k6logs | grep 'level=error'",
you'll notice that here are multiple commands chained and piped together. While && causes the first command with non-zero exit code to fail (and that code be returned) the second part applying the grep will then actually mask the k6 inspect (the most important bit of this command) - https://github.com/grafana/k6-operator/blob/d9490ded7c3e0cf615e2e9d41e82a842fdae7ac8/pkg/resources/jobs/initializer.go#L79C124-L79C167

I went through the initializer logic some more and just pushed PR #450.
I know this changes a little more than just fixing this issue here. But I strongly believe reducing the interface width (exit code + termination message) allows the Initalizer to really strive and be much more flexible than it is how.

I as a user can then run any image and any (list of) command and the only thing I have to ensure is that a non-zero exit code is used if there is an issue with the test.

frittentheke added a commit to frittentheke/k6-operator that referenced this issue Aug 23, 2024
By running two commands instead of one (the second being the cat | grep), any
failures (non-zero exit code) of the first part (containing `k6 inspect`) will
be lost and masked away.

By chaining them all with `&&` the first non-zero RC will fail the whole command
and return.

Fixes: grafana#435
frittentheke added a commit to frittentheke/k6-operator that referenced this issue Aug 23, 2024
By running two commands instead of one (the second being the cat | grep), any
failures (non-zero exit code) of the first part (containing `k6 inspect`) will
be lost and masked away.

By chaining them all with `&&` the first non-zero RC will fail the whole command
and return.

Fixes: grafana#435
frittentheke added a commit to frittentheke/k6-operator that referenced this issue Aug 23, 2024
By running two commands instead of one (the second being the cat | grep), any
failures (non-zero exit code) of the first part (containing `k6 inspect`) will
be lost and masked away.

By chaining them all with `&&` the first non-zero RC will fail the whole command
and return.

Fixes: grafana#435
@frittentheke
Copy link
Contributor

frittentheke commented Aug 23, 2024

I pushed a bugfix PR in #453, just fixing the issue reported by @Phil1602

^^ @yorugac

frittentheke added a commit to frittentheke/k6-operator that referenced this issue Oct 1, 2024
By running two commands instead of one (the second being the cat | grep), any
failures (non-zero exit code) of the first part (containing `k6 inspect`) will
be lost and masked away.

By chaining them all with `&&` the first non-zero RC will fail the whole command
and return.

Fixes: grafana#435
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants