Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading to v0.0.7 causes load tests to stop after 2 minutes #112

Closed
pears-one opened this issue May 24, 2022 · 3 comments · Fixed by #117
Closed

Upgrading to v0.0.7 causes load tests to stop after 2 minutes #112

pears-one opened this issue May 24, 2022 · 3 comments · Fixed by #117
Labels
bug Something isn't working

Comments

@pears-one
Copy link

Reproduce

  • Install k6 operator v0.0.7
  • Create new k6 custom resource
    • without the argument --out cloud
    • with the option cleanup: post
  • Runner will get cancelled after 2 minutes

Investigation

Initialization

In v0.0.7 of the operator, there is an initialization stage of the load test. This will create a job to run k6 inspect against the script in the k6 spec. The operator then then waits for this job to complete and retrieves metrics on duration and max number of VUs which are unmarshalled into a struct named inspectOutput.

Finishing

At the end of the load test, the k6 operator will change the stage of the load test to finished. It does this when
all of the jobs (i.e., runners) have completed running their tests. To ascertain the status of the runners, they are polled every 5 seconds until either the jobs are all complete or the polling function times out.

The timeout is set at testDuration+time.Minute*2. If the polling function hits this timeout, it will return an error and the stage of the k6 custom resource will be set to finished.

testDuration comes from the aforementioned inspectOutput object.

Cleanup

In the k6 spec, the cleanup option is set to post. This will trigger the operator to delete the k6 custom resource and hence the associated jobs and pods.

The Problem

Many of the steps in the initialization stage will only be executed if the argument --out cloud is present in the k6 spec. This includes running the k6 inspect command against the script and gathering information such as totalDuration. Hence in the finish stage, the operator will only wait 0m+2m=2m before changing the stage of the k6 custom resource to finished.

When the cleanup option is set to post all resources related to the test will be deleted after 2 minutes, ending the test.

Options to mitigate

  • Roll back to version 0.0.7rc4.
  • Unset the cleanup option.
    • Note that the stage will still be set to finished after 2 minutes.
@yorugac yorugac added the bug Something isn't working label May 24, 2022
@yorugac
Copy link
Collaborator

yorugac commented May 24, 2022

Hi @evanfpearson Thanks for spotting this!

The first solution here is to have timing out by duration only for cloud output test. But timing out was added for all kinds of tests as a default to bring it closer to k6 behavior. k6 on its own does not run indefinitely after all but Kubernetes adds another layer of complexity that can result in hard to predict scenarios. Nevertheless, the operator should try to mimic expectations from k6 when possible.

So the second solution is to leave timing out as is but execute k6 inspect step always. It is a very quick command but it does change the flow of deployments.

Right now, I'm inclined towards adding the second solution as a fix. Additional opinions would be welcome 🙂

@pears-one
Copy link
Author

Hi @yorugac, I think always running the k6 inspect step is the best way forward. What was the reasoning behind only running it for --cloud out jobs? Thanks for getting back so quickly.

@yorugac
Copy link
Collaborator

yorugac commented May 25, 2022

What was the reasoning behind only running it for --cloud out jobs?

Mostly not to overcomplicate the flow. At the time it seemed that other types of test runs won't need it, so why add additional job. And the timing out bit appeared much later and because of other set of issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants