Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine all initializer commands with && to catch any failing commands #453

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

frittentheke
Copy link
Contributor

By running two commands instead of one (the second being the cat | grep), any failures (non-zero exit code) of the first part (containing k6 inspect) will be lost and masked away.

By chaining them all with && the first non-zero RC will fail the whole command and return.

Fixes: #435

@yorugac
Copy link
Collaborator

yorugac commented Aug 23, 2024

@frittentheke, thank you for the PR! It seems like you have opened 2 PRs for the same issue? Could you please leave only one?

Quickly skimming the code, the change seems reasonable on the first glance, but I need to look up the details there again. Additionally, this change requires testing. I'll be able to fully review this in a couple of weeks approx. Hope that's alright!

@frittentheke
Copy link
Contributor Author

frittentheke commented Aug 23, 2024

@frittentheke, thank you for the PR! It seems like you have opened 2 PRs for the same issue? Could you please leave only one?

I changed one to draft and shall remove the "fixes" reference to this issue. It is simply not "just" a bugfix.

Thanks for looking into and testing #453

@yorugac
Copy link
Collaborator

yorugac commented Sep 23, 2024

@frittentheke, sorry for the delay! I've finally managed to test this PR a bit. So there are 2 parts:

  1. correct error return in common.go: that's a great catch; thank you!
  2. the k6 inspect command: I'm afraid this change makes debugging more complicated than before. The current flow, with ;, allows one to see error in logs of initializer. Example user flow: user sees an initializer job has failed in k6-operator's logs and then can go and see the exact error in initializer logs. But with &&, initializer fails and leaves no logs afterwards. AFAIS, it becomes harder to debug the problem. If you could describe a more specific case of initializer failure that you were looking into, please do! Otherwise, I'd request to omit this part as I don't see how it helps ATM...

@yorugac
Copy link
Collaborator

yorugac commented Sep 30, 2024

Hi @frittentheke, would you be able to modify the PR to leave only the first change, in common.go? We're going to have a release this week, and it'd be preferable to merge that fix before that.
We can continue discussion about the second change at a later point / in another PR, of course.

@frittentheke
Copy link
Contributor Author

@frittentheke, sorry for the delay! I've finally managed to test this PR a bit. So there are 2 parts:

1. correct error return in `common.go`: that's a great catch; thank you!

2. the `k6 inspect` command: I'm afraid this change makes debugging more complicated than before. The current flow, with `;`, allows one to see error in logs of initializer. Example user flow: user sees an `initializer job has failed` in k6-operator's logs and then can go and see the exact error in initializer logs. But with `&&`, initializer fails and leaves no logs afterwards. AFAIS, it becomes harder to debug the problem. If you could describe a more specific case of initializer failure that you were looking into, please do! Otherwise, I'd request to omit this part as I don't see how it helps ATM...

@yorugac sorry for the delay ...

Regarding 1) I created a new PR with just the var typo fixed: #465

As for 2) :

I'd like to argue that the current flow actually only handles certain types of errors - those that go into that very output file /tmp/k6logs and that are then caught by the grep filter level=error- but then misses others: Any issues with the initializer (causing non-zero exit codes) or cause none of those error lines to be emitted are then lost and the initializer actually falsely succeeds as the original return code of the inspect is masked or lost and only the one from the grep command counts for the pod termination. In my case, which made me write up the PR, the inspect command was not able to start (due to some imports being wrong) and the initializer still succeeded as there was not output with level=error and the k6-operator got stuck in the next phase (no JSON output, no jobs, ...).

To me any unexpected non-zero return code should be handled as an error (I know about grafana/k6#2804, #75, ...). A while back I started a bigger PR (#450) completely reworking the way the initializer works through the various tasks and then sends off it's verdict via termination log, the kubernetes-native way of handling this phase (https://kubernetes.io/docs/tasks/debug/debug-application/determine-reason-pod-failure/). But that PR / discussion is for another day I suppose. If you even like the idea to rework the whole thing to make it a lot more robust.

By running two commands instead of one (the second being the cat | grep), any
failures (non-zero exit code) of the first part (containing `k6 inspect`) will
be lost and masked away.

By chaining them all with `&&` the first non-zero RC will fail the whole command
and return.

Fixes: grafana#435
@frittentheke frittentheke force-pushed the failInitOnNonZero_435 branch from a054bf7 to 8008a90 Compare October 1, 2024 12:02
@frittentheke
Copy link
Contributor Author

@yorugac with 0.17.0 released would you consider discussing 2) some more? If you agree that my chain of thought is not totally off and would consider a rework the likes of #450 I'd gladly take in some feedback there to finish the PR.

@yorugac
Copy link
Collaborator

yorugac commented Nov 5, 2024

Hi @frittentheke, apologies for this delay! I've got swamped with internal work. And thank you for your patience! 😂

You described this case:

In my case, which made me write up the PR, the inspect command was not able to start (due to some imports being wrong) and the initializer still succeeded as there was not output with level=error and the k6-operator got stuck in the next phase (no JSON output, no jobs, ...).

AFAIR, when imports are off, there should be logs left in initializer which then should cause an error in k6-operator on JSON unmarshal. It sounds like it wasn't so for you, so no logs despite the import error? If so, could you please share a script how to repeat that situation?

@frittentheke
Copy link
Contributor Author

Sorry @yorugac for again getting back to you late on this one ... But I just remembered when we ran into yet another issue of this kind ;-)

AFAIR, when imports are off, there should be logs left in initializer which then should cause an error in k6-operator on JSON unmarshal.

Yes, that is indeed the case:

2024-12-10T12:02:19Z    ERROR    controllers.TestRun    unable to marshal: ``    
{"namespace": "loadtesting", "name": "sometestname", "reconcileID": "d6353a2a-8b15-4319-85fa-96a63c7be8eb", "error": "unexpected end of JSON input"}
github.com/grafana/k6-operator/controllers.inspectTestRun/workspace/controllers/common.go:112
github.com/grafana/k6-operator/controllers.RunValidations/workspace/controllers/k6_initialize.go:55
github.com/grafana/k6-operator/controllers.(*TestRunReconciler).reconcile/workspace/controllers/testrun_controller.go:13
github.com/grafana/k6-operator/controllers.(*TestRunReconciler).Reconcile/workspace/controllers/testrun_controller.go:80
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:116
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:303
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:263
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:224

but this requires someone to look at the logs and does not contain any real clues to what the issue (of the initializer run) is.
At the very least I'd like to have the initializer job fail (with non-zero return code), thus this very PR combining all commands to have any non-zero return codes make the job fail (read: "initialization failed").

I'd then really love to discuss a redesign the likes of #450 to give the user back the log output (STDOUT + STDERR) for the initializer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issues within initalizer error handling if script is incorrect
2 participants