You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Using the pick_first load balancer will default to sending data to the same backend. This can potentially cause problems in resource allocation. Having one connection to the same IP address or backend would cause throttling. For scalable services, if this same backend is overloaded, it cannot accept anymore data sent and data can then be dropped instead of sending to another backend, especially if all are being used at max limit. Also, pick_first does no actual load balancing (link), and instead just tries each address from the name resolver and connects to the one that works.
Describe the solution you'd like
Updating the load balancer to use the round_robin policy will allow data to be sent to different backends, and therefore more evenly allocate resources. Round robin only picks ready connections, and so is better in a typical cloud-compute setup, where clients are sending data into a scalable service with multiple workers. This is because it allows for connections to alternate between backends, and therefore when resources on one backend are being used, the data is then moved to another available backend based on availability. In round_robin, users that want to send data to only one address can create connections against the address more than once to ensure that multiple connections can be made (link). This would not let the connections be throttled.
In this case, pick_first would resolve to that one address and can only accept data when that connection is available, therefore causing throttling. When having more than one connection to that one address, like what round_robin is able to do, throttling is less of an issue since data can be sent to multiple locations that all send to that one address.
Describe alternatives you've considered
The alternative is to leave pick_first the default and recommend users make a choice. This is not an adequate solution because users expect reliable delivery by default and round_robin is substantially more reliable for minimal additional cost. We have restricted our choices to round_robin and pick_first because these two are registered by default, other custom load balancers would have to be registered by the user.
Additional context
I tested these two different load balancers in the Lightstep/SNCO dashboard. Envoy-edge, a proxy that load balances data, being sent to our service spaningest, which as its name implies, ingests OTLP spans using the arrow format.
In the first image, we see that when sending data from our service, envoy to spaningest arrow, at first, looks pretty even. This is currently using round-robin load balancing, as it distributes traffic in rotation, to different k8s pods, and therefore the resource allocation is even.
After the time the change to pick_first is made, we see that each pod changes how much data(spans) is sent drastically, as some pods get more spans than others.
Comparing the two, we see that the differences using round robin and pick first load balancers are pretty apparent. Since pick first sends to the first pod that is available, data is sent to that pod and all the other pods are left using no resources.
The text was updated successfully, but these errors were encountered:
These are the two options supported by default by gRPC. Other load balancers do exist but the exact set varies by language and anything else would have to be registered by the user as a custom load balancer and involves implementing a load balancer interface. (link)
Is your feature request related to a problem? Please describe.
Using the pick_first load balancer will default to sending data to the same backend. This can potentially cause problems in resource allocation. Having one connection to the same IP address or backend would cause throttling. For scalable services, if this same backend is overloaded, it cannot accept anymore data sent and data can then be dropped instead of sending to another backend, especially if all are being used at max limit. Also, pick_first does no actual load balancing (link), and instead just tries each address from the name resolver and connects to the one that works.
Describe the solution you'd like
Updating the load balancer to use the round_robin policy will allow data to be sent to different backends, and therefore more evenly allocate resources. Round robin only picks ready connections, and so is better in a typical cloud-compute setup, where clients are sending data into a scalable service with multiple workers. This is because it allows for connections to alternate between backends, and therefore when resources on one backend are being used, the data is then moved to another available backend based on availability. In round_robin, users that want to send data to only one address can create connections against the address more than once to ensure that multiple connections can be made (link). This would not let the connections be throttled.
In this case, pick_first would resolve to that one address and can only accept data when that connection is available, therefore causing throttling. When having more than one connection to that one address, like what round_robin is able to do, throttling is less of an issue since data can be sent to multiple locations that all send to that one address.
Describe alternatives you've considered
The alternative is to leave pick_first the default and recommend users make a choice. This is not an adequate solution because users expect reliable delivery by default and round_robin is substantially more reliable for minimal additional cost. We have restricted our choices to round_robin and pick_first because these two are registered by default, other custom load balancers would have to be registered by the user.
Additional context
I tested these two different load balancers in the Lightstep/SNCO dashboard. Envoy-edge, a proxy that load balances data, being sent to our service spaningest, which as its name implies, ingests OTLP spans using the arrow format.



In the first image, we see that when sending data from our service, envoy to spaningest arrow, at first, looks pretty even. This is currently using round-robin load balancing, as it distributes traffic in rotation, to different k8s pods, and therefore the resource allocation is even.
After the time the change to pick_first is made, we see that each pod changes how much data(spans) is sent drastically, as some pods get more spans than others.
Comparing the two, we see that the differences using round robin and pick first load balancers are pretty apparent. Since pick first sends to the first pod that is available, data is sent to that pod and all the other pods are left using no resources.
The text was updated successfully, but these errors were encountered: