[Feature request] Allow automatic resolution of completed tests #50

jachinte · 2022-11-02T18:36:15Z

Is your feature request related to a problem? Please describe.
Since I'm launching tests programmatically, one thing I'm missing from the operator is how to determine whether a test was successful. I created a k8s admission webhook that is notified whenever a master pod is updated. However, since the locust-exporter container never stops running, the tests can never be marked as successful. This is because the master pod doesn't transition to a Complete state when the test ends.

I noticed that the latest version of locust-exporter now has an endpoint to terminate the container (/quitquitquit). Indeed, calling this endpoint makes the test successful because the pod is marked as Complete. However, one would have to call this endpoint manually every time a test ends.

Describe the solution you'd like
An ideal solution would be terminating the locust-exporter once the test is done, so that the master pod is marked as Complete.

Describe alternatives you've considered
Since metrics are scraped, one would have to ensure that Prometheus pulled the metrics before terminating the exporter container.

An alternative here is to set a time-to-live (TTL) period after finished (e.g., Prometheus's scraping frequency), so that metrics are scraped at least once.
Another alternative would be to push metrics to Prometheus using pushgateway, which was designed for ephemeral batch jobs.

The second alternative seems more appropriate, however, it requires a lot more work. The first alternative would require either modifying locust-exporter, so that it terminates itself (e.g., when locust_up = 0) after the TTL period, or adding a sidecar container querying the metrics endpoint, looking for locust_up = 0, to call /quitquitquit eventually.

Additional context

What do you think of these alternatives? I'm willing to implement one of these.

The text was updated successfully, but these errors were encountered:

AbdelrhmanHamouda · 2022-11-06T23:09:31Z

Hello @jachinte,
You are spot on the problem when it comes to detecting if a test has completed or not. In fact the /quitquitquit endpoint was added as a response to an issue I reported against the locust-exporter project.

I can tell you have put good amount of thought in coming up with proposals and I would like to give you a fullfuiling answer for each.

The sidecar container; this will definitely work and get the desired result, however, it does not (at least to my eyes) look like a "clean" solution. One of the things I am very keen on maintaining with the Operator, is to keep it clean and free of hacks / workarounds as much as possible. However, if you want to go down this route, i will have a proposal for you at the end that can help you solve this problem quickly till it is supported out-of-the-box.

TTL; I have my own thoughts on TTL that i will share with you on the other issue since i see you dedicated an issue to that topic, that being said, do you have an idea on the mechanism to set the k8s/job TTL after its creation?

pushgateway, I know about this tool but haven't dug into it. My assumption is that it is something that is set on the cluster and its existence is transparent to the metric source (i.e. pods) -correct me if i am wrong-. However, Performance tests are usually long enough to get meaningful metrics out of them and the Prometheus server for that is usually configured with high enough frequency to guarantee such result.Taking all this into account, a test that is so fast it does not get scraped would sound like an invalid test or miss-configured test. what are your thoughts?

My current plan to address this is to use the recently added endpoint by having the controller register the created pod/container as a secondary resource and subscribe to events related to that specific pod/container. When the locust-master container terminates, the controller should be notified and will call the quit endpoint or execute a graceful shutdown through k8s api. Now, in this plan i have made few assumptions that i need to validate. For example, the controller ability to receive events related to a specific container coming from a pod coming from a job that the controller started and register it as a secondary resource to manage.

The workaround:
Since you can easily generate a locust based docker image, i would propose having a custom locust image that does something like this:

# Dockerfile
from locust ... 
....
entrypoint "start.sh"

#start.sh
...
locust <invocation arguments>
...
curl /quitquitquit

jachinte · 2022-11-10T20:20:52Z

Hello @AbdelrhmanHamouda,

Thank you for taking the time to reply with such a thorough answer.

I totally agree with you on the sidecar solution. It would also require allocating additional resources (i.e., RAM and CPU).

I am not aware of any solution to set the TTL parameter on the jobs after creation. However, I think I did not explain this part correctly. The idea is to set a TTL parameter that will take effect after the locust pod has terminated; this is different from Issue #52.

I think you are right about the push/pull monitoring model for locust tests. In my case, tests can last a few minutes (e.g., 5min), but I think Prometheus can definitely scrape the metrics several times during that period.

The solution you propose makes a lot of sense. I would not know how to do that as of now, but I am guessing that the operator SDK provides some facilities to do it. If you can provide a little guidance on this topic, I could start working on the implementation.

Thank you for the workaround! I'll use that solution in the meantime.

prasanna12510 · 2023-02-01T13:39:56Z

if I use in above workaround as a dockerfile how can I be able to pass the pod/Job Args to customscript ?

as in my case it defaults to default locustfile

AbdelrhmanHamouda · 2023-02-01T15:07:06Z

Hello @prasanna12510,

here is a rough version of files that you may need. keep in mind that you may need to refine this to match your needs.

an example dockerfile to do the above workaround would look like this

FROM locustio/locust:2.13.0

# set the working directory in the container
WORKDIR /prasanna

# Set python path
ENV  PYTHONPATH=/prasanna

# Install needed utility
USER root
RUN  : \
     && apt-get update \
     && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        curl

# Copy and configure start.sh with permissions
COPY start.sh .
RUN chmod 755 start.sh

# Set start.sh as entry point
ENTRYPOINT ["bash", "./start.sh"]

Example start.sh:

#!/bin/bash

function check_exit_code() {
  if [ $? -ne 0 ]; then
    exit_code=1
  fi
}


function run_test() {

  locust_command=$(echo $@)

  echo "locust $locust_command"
  locust $locust_command
  check_exit_code
}

#####################
##### Main logic ####
#####################
# Set default exit code
exit_code=0

run_test $@

# Shutdown metrics exporter container
curl -fsI -XPOST http://localhost:9646/quitquitquit

# Exit with custom exit code
exit $exit_code

prasanna12510 · 2023-02-01T15:22:34Z

Yeah i tried the similar approach but seems the argument we are passing through masterseedcommand doesn't gets appended as locust parameters. I think in operator spec we can also add provision for environment variables similarly how its available in locust helm chart https://github.com/deliveryhero/helm-charts/blob/master/stable/locust/values.yaml#L17

AbdelrhmanHamouda · 2023-02-01T16:56:24Z

that would be an unexpected behavior!

The operator append to the masterCommandSeed and workerCommandSeed but doesn't replace or alter. The above example is actively being used and i know it does manage to pass parameters.

If everything looks correct on your side, could you kindly report a bug and i will look into it?

AbdelrhmanHamouda added Enhancement New feature or request Feature request Something should be improved labels Nov 6, 2022

AbdelrhmanHamouda self-assigned this Feb 16, 2023

AbdelrhmanHamouda mentioned this issue Feb 16, 2023

[BUG] Operator doesn't manage the metric exporter sidecar #72

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Allow automatic resolution of completed tests #50

[Feature request] Allow automatic resolution of completed tests #50

jachinte commented Nov 2, 2022

AbdelrhmanHamouda commented Nov 6, 2022

jachinte commented Nov 10, 2022

prasanna12510 commented Feb 1, 2023

AbdelrhmanHamouda commented Feb 1, 2023

prasanna12510 commented Feb 1, 2023

AbdelrhmanHamouda commented Feb 1, 2023

[Feature request] Allow automatic resolution of completed tests #50

[Feature request] Allow automatic resolution of completed tests #50

Comments

jachinte commented Nov 2, 2022

AbdelrhmanHamouda commented Nov 6, 2022

jachinte commented Nov 10, 2022

prasanna12510 commented Feb 1, 2023

AbdelrhmanHamouda commented Feb 1, 2023

prasanna12510 commented Feb 1, 2023

AbdelrhmanHamouda commented Feb 1, 2023