Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can the operator be scaled? #2034

Open
bosunski opened this issue Aug 12, 2024 · 1 comment
Open

How can the operator be scaled? #2034

bosunski opened this issue Aug 12, 2024 · 1 comment

Comments

@bosunski
Copy link

Hello there 👋🏽,
Thanks for making this awesome tool.

I would like to get some perspective on how the operator scales or how it can be scaled to handle 1k+ HTTP checks (at a 30-second interval or 1k+ rps). I've looked around the docs but didn't see anything about this topic. I also tried deploying multiple replicas but soon realized that this was prevented.

Is there anything extra to be done to do HTTP checks at this scale? Or we shouldn't worry about that?

Thank you!

@moshloop
Copy link
Member

Hi @bosunski

Currently canary-checker does not load balance checks, But provided you give it enough memory (2 - 8GB) and CPU (1 - 2 cores) and something similar for the DB it should scale vertically well.

If you have any specific any issues at that scale we can certainly try and reproduce and improve ( Also created
#2040 for this)

In terms of horizontal scalability there are 2 ways to achieve this:

  1. Deploy multiple instances each watching a subset of canaries via include-namespace or Canary selection improvements #2041
  2. And/or Run multiple instances with only a single instance in operator only mode (--executor=false) and multiple runner instances canary-checker serve - This would require a shared db and Leader Election - Stop/Start background job based on leader state #2042 (or disabling all the background jobs that should be singleton via properties`
  3. "Manual" balancing across multiple clusters or vclusters and then pushing to a centralized instance - This model is supported in our commercial Mission Control stack, where you run an agent that replicates its results to a SaaS or Self-Hosted instance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants