Skip to content

Commit

Permalink
Add annotations to configure retries, max connections to a service
Browse files Browse the repository at this point in the history
These are needed in order to scale a service up as well as help cope
with when a pod for a kubernetes service disappears unexpectedly. By
default Envoy has a limit of 1024 simultaneous connections to a cluster
and doesn't do any retries. A single http server backend of a service
can handle tens of thousands of requests per second however, and in
environments where that happens the low limit of envoy causes envoy to
circuit break those services rather than "flood" an unsuspecting backend
with requests.

By default, envoy doesn't retry requests, which means if a request gets
made to a pod which is the process of exiting / just stopped responding
to connections (Say it segfaulted) the request gets dropped and the
client sees an error rather than a retry. The retry annotations allow
changing this policy as appropriate in the environment

Signed-off-by: Cody Maloney <cody@emeraldcloudlab.com>
  • Loading branch information
Cody Maloney committed Feb 20, 2018
1 parent 69dbe02 commit c1d1f3f
Show file tree
Hide file tree
Showing 4 changed files with 124 additions and 19 deletions.
12 changes: 12 additions & 0 deletions docs/annotations.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,15 @@ Contour supports a couple of standard kubernetes ingress annotations, as well as
## Contour Specific Ingress Annotations

- `contour.heptio.com/request-timeout`: Set the [envoy HTTP route timeout](https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/route/route.proto.html#envoy-api-field-route-routeaction-timeout) to the given value, specified as a [golang duration](https://golang.org/pkg/time/#ParseDuration). By default envoy has a 15 second timeout for a backend service to respond. Set this to `infinity` to specify envoy should never timeout the connection to the backend. Note the value `0s` / zero has special semantics to envoy.
- `contour.heptio.com/retry-on`: Specify under which conditions Envoy should retry a request. See [Envoy retry_on](https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/route/route.proto#envoy-api-field-route-routeaction-retrypolicy-retry-on) for basic description, as well as [possible values and their meanings](https://www.envoyproxy.io/docs/envoy/latest/configuration/http_filters/router_filter.html#config-http-filters-router-x-envoy-retry-on)
- `contour.heptio.com/num-retries`: Specify the [maximum number of retries](https://www.envoyproxy.io/docs/envoy/latest/configuration/http_filters/router_filter.html#config-http-filters-router-x-envoy-max-retries) Envoy should make before abandoning and returning an error to the client. Only applies if `contour.heptio.com/retry-on` is specified.
- `contour.heptio.com/per-try-timeout`: Specify the [timeout per retry attempt](https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/route/route.proto#envoy-api-field-route-routeaction-retrypolicy-retry-on), if there should be one. Only applies if `contour.heptio.com/retry-on` is specified.

## Contour Specific Service Annotations

A [Kubernetes Service](https://kubernetes.io/docs/concepts/services-networking/service/) maps to an [Envoy Cluster](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/terminology). Envoy clusters have many settings to control specific behaviors. These annotations allow access to some of those settings.

- `contour.heptio.com/max-connections`: [The maximum number of connections](https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/cluster/circuit_breaker.proto#envoy-api-field-cluster-circuitbreakers-thresholds-max-connections) that a single Envoy instance will allow to the Kubernetes service; defaults to 1024.
- `contour.heptio.com/max-pending-requests`: [The maximum number of pending requests](https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/cluster/circuit_breaker.proto#envoy-api-field-cluster-circuitbreakers-thresholds-max-pending-requests) that a single Envoy instance will allow to the Kubernetes service; defaults to 1024.
- `contour.heptio.com/max-requests`: [The maximum parallel requests](https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/cluster/circuit_breaker.proto#envoy-api-field-cluster-circuitbreakers-thresholds-max-requests) a single Envoy instance will allow to the Kubernetes service; defaults to 1024
- `contour.heptio.com/max-retries` : [The maximum number of parallel retries](https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/cluster/circuit_breaker.proto#envoy-api-field-cluster-circuitbreakers-thresholds-max-retries) a single Envoy instance will allow to the kubernetes service; defaults to 1024. This is independent of the per-kubernetes ingress number of retries (`contour.heptio.com/num-retries`) and retry-on (`contour.heptio.com/retry-on`), which control whether or not retries are attempted, as well as how many times a single request can retry at most.
31 changes: 25 additions & 6 deletions internal/contour/cluster.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,17 @@ import (
"time"

"github.com/envoyproxy/go-control-plane/envoy/api/v2"
v2cluster "github.com/envoyproxy/go-control-plane/envoy/api/v2/cluster"
"k8s.io/api/core/v1"
)

const (
annotationMaxConnections = "contour.heptio.com/max-connections"
annotationMaxPendingRequests = "contour.heptio.com/max-pending-requests"
annotationMaxRequests = "contour.heptio.com/max-requests"
annotationMaxRetries = "contour.heptio.com/max-retries"
)

// ClusterCache manage the contents of the gRPC SDS cache.
type ClusterCache struct {
clusterCache
Expand Down Expand Up @@ -79,19 +87,18 @@ func (cc *ClusterCache) recomputeService(oldsvc, newsvc *v1.Service) {
// p.Name will be blank on the condition that there is a single serviceport
// entry in this service spec.
config := edsconfig("contour", servicename(newsvc.ObjectMeta, p.Name))

if p.Name != "" {
// service port is named, so we must generate both a cluster for the port name
// and a cluster for the port number.
c := edscluster(hashname(60, newsvc.ObjectMeta.Namespace, newsvc.ObjectMeta.Name, p.Name), config)
c := edscluster(newsvc, p.Name, config)
cc.Add(c)
// it is safe to use p.Name as the key because the API server enforces
// the invariant that Name will only be blank if there is a single port
// in the service spec. This there will only be one entry in the map,
// { "": p }
named[p.Name] = p
}
c := edscluster(hashname(60, newsvc.ObjectMeta.Namespace, newsvc.ObjectMeta.Name, strconv.Itoa(int(p.Port))), config)
c := edscluster(newsvc, strconv.Itoa(int(p.Port)), config)
cc.Add(c)
unnamed[p.Port] = p
default:
Expand All @@ -116,14 +123,26 @@ func (cc *ClusterCache) recomputeService(oldsvc, newsvc *v1.Service) {
}
}

func edscluster(name string, config *v2.Cluster_EdsClusterConfig) *v2.Cluster {
return &v2.Cluster{
Name: name,
func edscluster(svc *v1.Service, portString string, config *v2.Cluster_EdsClusterConfig) *v2.Cluster {
cluster := &v2.Cluster{
Name: hashname(60, svc.ObjectMeta.Namespace, svc.ObjectMeta.Name, portString),
Type: v2.Cluster_EDS,
EdsClusterConfig: config,
ConnectTimeout: 250 * time.Millisecond,
LbPolicy: v2.Cluster_ROUND_ROBIN,
}
thresholds := &v2cluster.CircuitBreakers_Thresholds{
MaxConnections: parseAnnotationUInt32(svc.Annotations, annotationMaxConnections),
MaxPendingRequests: parseAnnotationUInt32(svc.Annotations, annotationMaxPendingRequests),
MaxRequests: parseAnnotationUInt32(svc.Annotations, annotationMaxRequests),
MaxRetries: parseAnnotationUInt32(svc.Annotations, annotationMaxRetries),
}
if thresholds.MaxConnections != nil || thresholds.MaxPendingRequests != nil ||
thresholds.MaxRequests != nil || thresholds.MaxRetries != nil {
cluster.CircuitBreakers = &v2cluster.CircuitBreakers{Thresholds: []*v2cluster.CircuitBreakers_Thresholds{thresholds}}
}

return cluster
}

func edsconfig(source, name string) *v2.Cluster_EdsClusterConfig {
Expand Down
40 changes: 34 additions & 6 deletions internal/contour/virtualhost.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,12 @@ package contour

import (
"sort"
"strconv"
"strings"
"time"

"github.com/envoyproxy/go-control-plane/envoy/api/v2/route"
"github.com/gogo/protobuf/types"
"k8s.io/api/extensions/v1beta1"
)

Expand All @@ -30,7 +32,10 @@ type VirtualHostCache struct {
}

const (
requestTimeout = "contour.heptio.com/request-timeout"
annotationRequestTimeout = "contour.heptio.com/request-timeout"
annotationRetryOn = "contour.heptio.com/retry-on"
annotationNumRetries = "contour.heptio.com/num-retries"
annotationPerTryTimeout = "contour.heptio.com/per-try-timeout"

// By default envoy applies a 15 second timeout to all backend requests.
// The explicit value 0 turns off the timeout, implying "never time out"
Expand All @@ -42,13 +47,12 @@ const (
// value. If the value is not present, false is returned and the timeout value should be
// ignored. If the value is present, but malformed, the timeout value is valid, and represents
// infinite timeout.
func getRequestTimeout(annotations map[string]string) (time.Duration, bool) {
timeoutStr, ok := annotations[requestTimeout]
func parseAnnotationTimeout(annotations map[string]string, annotation string) (time.Duration, bool) {
timeoutStr, ok := annotations[annotationRequestTimeout]
// Error or unspecified is interpreted as no timeout specified, use envoy defaults
if !ok || timeoutStr == "" {
return 0, false
}

// Interpret "infinity" explicitly as an infinite timeout, which envoy config
// expects as a timeout of 0. This could be specified with the duration string
// "0s" but want to give an explicit out for operators.
Expand All @@ -66,6 +70,19 @@ func getRequestTimeout(annotations map[string]string) (time.Duration, bool) {
return timeoutParsed, true
}

func parseAnnotationUInt32(annotations map[string]string, annotation string) *types.UInt32Value {
uint32Str, ok := annotations[annotation]
// Error or unspecified is interpreted as use envoy defaults
if !ok || uint32Str == "" {
return nil
}
uint32value, err := strconv.ParseUint(uint32Str, 10, 32)
if err != nil {
return nil
}
return &types.UInt32Value{Value: uint32(uint32value)}
}

// recomputevhost recomputes the ingress_http (HTTP) and ingress_https (HTTPS) record
// from the vhost from list of ingresses supplied.
func (v *VirtualHostCache) recomputevhost(vhost string, ingresses map[metadata]*v1beta1.Ingress) {
Expand Down Expand Up @@ -141,7 +158,7 @@ func (v *VirtualHostCache) recomputevhost(vhost string, ingresses map[metadata]*
}
}

// action computes the cluster route action, a *v2.Route_route for the
// action computes the cluster route action, a *route.Route_route for the
// supplied ingress and backend.
func action(i *v1beta1.Ingress, be *v1beta1.IngressBackend) *route.Route_Route {
name := ingressBackendToClusterName(i, be)
Expand All @@ -152,9 +169,20 @@ func action(i *v1beta1.Ingress, be *v1beta1.IngressBackend) *route.Route_Route {
},
},
}
if timeout, ok := getRequestTimeout(i.Annotations); ok {
if timeout, ok := parseAnnotationTimeout(i.Annotations, annotationRequestTimeout); ok {
ca.Route.Timeout = &timeout
}

if retryOn, ok := i.Annotations[annotationRetryOn]; ok {
ca.Route.RetryPolicy = &route.RouteAction_RetryPolicy{
RetryOn: retryOn,
NumRetries: parseAnnotationUInt32(i.Annotations, annotationNumRetries),
}
if perTryTimeout, ok := parseAnnotationTimeout(i.Annotations, annotationPerTryTimeout); ok {
ca.Route.RetryPolicy.PerTryTimeout = &perTryTimeout
}
}

return &ca
}

Expand Down
60 changes: 53 additions & 7 deletions internal/contour/virtualhost_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,13 @@
package contour

import (
"math"
"reflect"
"testing"
"time"

"github.com/envoyproxy/go-control-plane/envoy/api/v2/route"
"github.com/gogo/protobuf/types"
"github.com/sirupsen/logrus"
"k8s.io/api/extensions/v1beta1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
Expand Down Expand Up @@ -622,7 +624,7 @@ func TestValidTLSSpecforVhost(t *testing.T) {
}
}

func TestGetRequestTimeout(t *testing.T) {
func TestParseAnnotationTimeout(t *testing.T) {
tests := map[string]struct {
a map[string]string
want time.Duration
Expand All @@ -634,32 +636,76 @@ func TestGetRequestTimeout(t *testing.T) {
ok: false,
},
"empty": {
a: map[string]string{requestTimeout: ""}, // not even sure this is possible via the API
a: map[string]string{annotationRequestTimeout: ""}, // not even sure this is possible via the API
want: 0,
ok: false,
},
"infinity": {
a: map[string]string{requestTimeout: "infinity"},
a: map[string]string{annotationRequestTimeout: "infinity"},
want: 0,
ok: true,
},
"10 seconds": {
a: map[string]string{requestTimeout: "10s"},
a: map[string]string{annotationRequestTimeout: "10s"},
want: 10 * time.Second,
ok: true,
},
"invalid": {
a: map[string]string{requestTimeout: "10"}, // 10 what?
a: map[string]string{annotationRequestTimeout: "10"}, // 10 what?
want: 0,
ok: true,
},
}

for name, tc := range tests {
t.Run(name, func(t *testing.T) {
got, ok := getRequestTimeout(tc.a)
got, ok := parseAnnotationTimeout(tc.a, annotationRequestTimeout)
if got != tc.want || ok != tc.ok {
t.Fatalf("getRequestTimeout(%q): want: %v, %v, got: %v, %v", tc.a, tc.want, tc.ok, got, ok)
t.Fatalf("parseAnnotationTimeout(%q): want: %v, %v, got: %v, %v", tc.a, tc.want, tc.ok, got, ok)
}
})
}
}

func TestParseAnnotationUInt32(t *testing.T) {
tests := map[string]struct {
a map[string]string
want uint32
isNil bool
}{
"nada": {
a: nil,
isNil: true,
},
"empty": {
a: map[string]string{annotationRequestTimeout: ""}, // not even sure this is possible via the API
isNil: true,
},
"smallest": {
a: map[string]string{annotationRequestTimeout: "0"},
want: 0,
},
"middle value": {
a: map[string]string{annotationRequestTimeout: "20"},
want: 20,
},
"biggest": {
a: map[string]string{annotationRequestTimeout: "4294967295"},
want: math.MaxUint32,
},
"invalid": {
a: map[string]string{annotationRequestTimeout: "10seconds"}, // not a duration
isNil: true,
},
}

for name, tc := range tests {
t.Run(name, func(t *testing.T) {
got := parseAnnotationUInt32(tc.a, annotationRequestTimeout)
full := types.UInt32Value{Value: tc.want}

if ((got == nil) != tc.isNil) || (got != nil && *got != full) {
t.Fatalf("parseAnnotationUInt32(%q): want: %v, isNil: %v, got: %v", tc.a, tc.want, tc.isNil, got)
}
})
}
Expand Down

0 comments on commit c1d1f3f

Please sign in to comment.