Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed Vegeta #336

Open
ivanilves opened this issue Sep 17, 2018 · 25 comments
Open

Distributed Vegeta #336

ivanilves opened this issue Sep 17, 2018 · 25 comments

Comments

@ivanilves
Copy link

Hi guys,

First, great piece of software, thank you <3

Second, could you please tell me, is there any recommended way to run distributed Vegeta on a multi node cluster (EC2 ASG, K8s, Mesos or whatever to stress test at a scale) and “queue” attack targets somehow? Is it DIY only for now? ;)

Thank you!

@tsenart
Copy link
Owner

tsenart commented Sep 17, 2018

It's DYI for now indeed, although there are others who've solved the same problem before.

@tsenart
Copy link
Owner

tsenart commented Sep 17, 2018

I have ideas on how to make this easier. But no plans to implement anything for now (not enough time currently).

  • Vegeta K8S Operator
  • Vegeta Serverless (Schedule concurrent attacks on AWS Lambdas, Google Cloud Functions, etc).

@ivanilves
Copy link
Author

Great. Thank you for these tips! 👍🏻👏

@nitishm
Copy link

nitishm commented Oct 25, 2018

How do you envision using operators ?
What would the CRD specify ?

It would be nice to start with a simple master/worker model, deploying workers as a daemonset across nodes. Distributed attack could be using pdsh like you mentioned in your README.md or a synchronizing mechanism like 0mq/redis-pub-sub/etc

@tsenart
Copy link
Owner

tsenart commented Oct 26, 2018

What would the CRD specify ?

I don't know. There are many possibilities, but it'd be nice to have a history of load tests available as well as the results of each.

@MalloZup
Copy link

MalloZup commented Dec 6, 2018

hi all i'm starting the vegeta-operator for k8s project here: https://github.com/MalloZup/vegeta-operator.

Feel free to join effort and help there. 🎅 🤶 🌻

Any contribution/help is welcome. 🌻 i"m open for any idea/interaction and suggestion from community.

At moment i plan to learn the k8s operator api as fist step 😄 🤖 🚀 so let's see :)

Roadmap proposal feel free to write here and help vegeta to become a super-sayan :) (MalloZup/vegeta-operator#2)

@nitishm
Copy link

nitishm commented Dec 29, 2018

My thought is to use a kubernetes operator (acting as the master, driven by CRDs and a worker replicationset for horizontal scalability (via multiple replica pods) model.

The operator will be responsible for executing tests by sending requests (gRPC/HTTP) to the workers and retrieving the results. It will also be responsible for storing the aggregated results, command history, etc., in a key-value DB (pick anyone solution).

@tsenart WDYT ? Is that what you envisioned as well ?

@ivanilves
Copy link
Author

@nitishm (just my 2 cents, nothing more):

Maybe using Kubernetes Job resource would lead us to easier design?
https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#running-an-example-job

We can scale jobs as well with parallelism https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#controlling-parallelism

To me Vegeta tests are naturally more "job" kind of workload rather than a "worker/replicaset" one.

But as well as you I would like to know what @tsenart thinks about it 😉

@tsenart
Copy link
Owner

tsenart commented Dec 30, 2018

To me Vegeta tests are naturally more "job" kind of workload rather than a "worker/replicaset" one.

Agreed.

@nitishm
Copy link

nitishm commented Dec 30, 2018

I agree with jobs as well. My thought process was driven by this article on using locust on GKE https://cloud.google.com/solutions/distributed-load-testing-using-kubernetes.

@MalloZup
Copy link

MalloZup commented Jan 4, 2019

hi all, so since i did some experiment with the vegeta-controller/operator here my 2cents:

  • i agree with @ivanilves to use k8s Job. Indeed i planned to use them for scheduling.

I think as 0.1 version or minimal version this could be the first thing.

This was the API i was thinking: https://github.com/MalloZup/vegeta-operator/blob/master/config/samples/vegeta_v1beta1_vegeta.yaml. ( feel free for suggestions :octocat: )

Concerning logs, for the 0.1 version i think we can be experimental and rely on k8s log. Which is not permanent but it is a 0.1 right? I think we should more focus on API/CRD then the tooling around.

AN user could also add the needed logging via k8s with other operators etc. for working around

Imho later on we could think about storing logging, etc.

About communication, i think we don't need to invent much, we can rely on the kubernetes go client and within we can schedule jobs. We can also then rely on the k8s load-balancer etc.

I think yop, is more a vegeta-controller then operator, meaning there is an open question, which resource should we watch and react with callbacks ? With this design we don't have any resource where we would call reconcile, or i am missing something? 🤔 😄

@tsenart
Copy link
Owner

tsenart commented Jan 6, 2019

I think yop, is more a vegeta-controller then operator, meaning there is an open question, which resource should we watch and react with callbacks ? With this design we don't have any resource where we would call reconcile, or i am missing something? 🤔 😄

Wouldn't we create an attack object with certain parameters that the operator would have to satisfy?

@MalloZup
Copy link

@tsenart from your POV and experience which paramater this? tia

@tsenart
Copy link
Owner

tsenart commented Jan 28, 2019

@tsenart from your POV and experience which paramater this? tia

All the flags that you can define in vegeta attack, essentially.

@domleb
Copy link

domleb commented Jul 15, 2019

FWIW I use k8s jobs to run distributed Vegeta tests (we achieved 1M tps using NLBs). I would say the responsibility of creating / deleting jobs etc shouldn't be with Vegeta. Whatever tooling is used to run any other job can be used. To make this really useful, it would be nice to use a Prometheus client to expose metrics. This is great way to aggregate metrics across all instances in a standard and scalable way. Plus alerts can used to automate testing for CI. Maybe there's already a way to hook in a Prometheus client but after a quick check I can't see an obvious way.

@ghost
Copy link

ghost commented Nov 5, 2019

At my company, we uses a kubernetes custom controller that simply executes vegeta as kubernetes job.
https://github.com/kaidotdev/vegeta-controller

I've developed a project similar to https://github.com/MalloZup/vegeta-operator without any plan, I'm looking forward to your feedback.

@nitishm
Copy link

nitishm commented Nov 5, 2019

@kaidotdev How do you scale the attacks ? Or is a single attack carried out by a single job pod ?

@ghost
Copy link

ghost commented Nov 5, 2019

Yes, each attack runs as a single job pod.
And can be scaled from spec.parallelism, which is passed through to Job spec.parallelism as is.
c.f. https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#jobspec-v1-batch

@nitishm
Copy link

nitishm commented Nov 5, 2019

Parallelism means scale out on performing the task. But how do you synchronize the attacks ? Having 3 jobs start up at different times would lead to a slew and inaccurate results if the attack was launched to be run for a fixed interval. How do you address that ?

@ghost
Copy link

ghost commented Nov 5, 2019

Unfortunately, it is not guaranteed so far.
In the approach using kubernetes job, it will be difficult to guarantee it due to the reconcile loop of kubernetes.

@nitishm
Copy link

nitishm commented Nov 5, 2019

Yea that’s where I got stuck and gave up on the task too. Maybe there is a possibility to sync jobs using specialized workloads deployed as part of the jobs but it’s up for investigation.

@ghost
Copy link

ghost commented Nov 5, 2019

Hmm...
Certainly, if we try to synchronize attack, it seems that we need to fundamentally change the approach.
And then we probably need to combine the its results.

@nitishm
Copy link

nitishm commented Nov 5, 2019

@kaidotdev I create https://github.com/nitishm/vegeta-server as an attempt to address this problem. Just waiting for some time to free up to either use it directly as a pod on k8s or reinstrument some of the code to make it cloud-native. If you are interested in pursuing this further I would love to collaborate!

@dastergon
Copy link

My take on the vegeta-operator https://github.com/dastergon/vegeta-operator. It supports most of Vegeta's features and it has the ability to store the reports in AWS S3 (for now) via rclone. I would love to hear your feedback. Also, pull requests are always welcome! :)

@fgiloux
Copy link

fgiloux commented Apr 5, 2021

I have created another operator for vegeta: https://github.com/fgiloux/vegeta-operator
It is also leveraging the operator-sdk, similar to what @dastergon did, just on a newer version. I did look at what @dastergon wrote but wanted to make a few things differently:

  • it directly uses pods (no jobs) as sub-resources as I did not see the value of a mechanism for dealing with pod failure. IMHO the complete test needs to be relaunched if one of the vegeta attack fails. This may need upper level coordination (CI).
  • it supports object storage and persistent volumes for storing results / reports.
  • it is possible to launch distributed attacks (multiple pods). The report then gets generated in a separate step when all attack pods have completed
  • there is an image bundle for simple installation with OLM

A few things are not implemented:

  • Integration with Prometheus. There is a separate issue for that but it seems to be stuck.
  • Different output formats for the reports (this could however get easily added if there is a need)

Feedback welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants