-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chart & Operator support for replica configuration #47
Comments
This is a ticket to track tech debt that was pointed out in a review of an old PR that had been merged. |
It seems the consideration here is to have a failover/redundancy mechanism for BRO. If we aim to accomplish redudancy, then adding a leader election to the BRO operator is a 3 line code change, and a trivial CRD change to deploy the number of replicas. However, I don't see this as useful. Although a nice improvement, we haven't seen any of these issues in production. The scenarios where redundancy is useful as a failure mechanism is if the pod fails critically due to a sporadic software bug (in which case the software bug should be fixed, and redundancy will only speed up recovering) or if resource saturation is reached. When resource saturation is reached, either the cluster is having problems which is outside the scope in which BRO can fix itself or the resource limits are improperly configured, becoming a user configuration issue. A useful use case of redundancy is in the case of node failures and to deploy BRO as a daemon-set, but a singleton will still attempt to recover on healthy nodes, so the only gain here is speeding up failure recovery scenarios. If the idea here is instead to scale up the BRO operator (optimize), then redundancy isn't useful unless we shard the backup and restore workloads, but that isn't incredibly useful as the workloads are IO (and thus CPU) bound. The only speed up feasible I see with sharding the operator workloads is to shard across node resources, but that requires more sophisticated techniques beyond kubernetes leases, controller runtime leader election or any kind of network load balancing. I believe sharding will be the last resort for optimizing these types of workloads anyways, so even in this case I don't see it being useful. |
Context: Today BRO chart and operator logic isn't designed or intended to have multiple replicas. This is a known aspect of the BRO design and for now it's intended to be a "singleton" pod.
#41 (comment)
To consider:
The text was updated successfully, but these errors were encountered: