You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello there 👋🏽,
Thanks for making this awesome tool.
I would like to get some perspective on how the operator scales or how it can be scaled to handle 1k+ HTTP checks (at a 30-second interval or 1k+ rps). I've looked around the docs but didn't see anything about this topic. I also tried deploying multiple replicas but soon realized that this was prevented.
Is there anything extra to be done to do HTTP checks at this scale? Or we shouldn't worry about that?
Thank you!
The text was updated successfully, but these errors were encountered:
Currently canary-checker does not load balance checks, But provided you give it enough memory (2 - 8GB) and CPU (1 - 2 cores) and something similar for the DB it should scale vertically well.
If you have any specific any issues at that scale we can certainly try and reproduce and improve ( Also created #2040 for this)
In terms of horizontal scalability there are 2 ways to achieve this:
And/or Run multiple instances with only a single instance in operator only mode (--executor=false) and multiple runner instances canary-checker serve - This would require a shared db and Leader Election - Stop/Start background job based on leader state #2042 (or disabling all the background jobs that should be singleton via properties`
"Manual" balancing across multiple clusters or vclusters and then pushing to a centralized instance - This model is supported in our commercial Mission Control stack, where you run an agent that replicates its results to a SaaS or Self-Hosted instance
Hello there 👋🏽,
Thanks for making this awesome tool.
I would like to get some perspective on how the operator scales or how it can be scaled to handle 1k+ HTTP checks (at a 30-second interval or 1k+ rps). I've looked around the docs but didn't see anything about this topic. I also tried deploying multiple replicas but soon realized that this was prevented.
Is there anything extra to be done to do HTTP checks at this scale? Or we shouldn't worry about that?
Thank you!
The text was updated successfully, but these errors were encountered: