Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce HAProxy reloads - adds support to use the haproxy dynamic config api #19073

Merged
merged 9 commits into from
Jul 16, 2018

Conversation

ramr
Copy link
Contributor

@ramr ramr commented Mar 23, 2018

Add support to the template router to dynamically configure haproxy. This reduces the reloads
needed when the router configuration changes due to routes/endpoints changing.

Note that HAProxy doesn't have support to dynamically change certificates, so we need to pre-allocate pools of "blueprint" routes to dynamically bring new routes online.

  • We now add/remove servers from a backend on endpoint changes (for scale up/down, service addition/removal/update, node drain/evictions etc).
    Each haproxy backend (for a route) has a set of dynamic servers added named
    _dynamic-pod-[1-5] - which are used when endpoints change (example for scale up).
    Scaling down endpoints just sets the existing servers in a haproxy backend to "maint" or
    non-serving state.

    The infra router options - command line flags or environment variables named
    ROUTER_DYNAMIC_SERVER_PREFIX and ROUTER_MAX_DYNAMIC_SERVERS (default: 5)
    controls these options.

  • We dynamically add and remove routes if a route matches a pre-allocated/configured blueprint route.
    The default config has blueprint routes for insecure/http, edge terminated and passthrough routes.
    Re-encrypt routes are an issue as they could have custom certificates so is not currently in that
    default blueprint set (commented out).
    Route removal just removes the map entries, marks the servers in maint state but leaves the
    backend config in haproxy intact. The backend is not referenced from anywhere, so is unused.

    Route addition checks if it can bring the route online without a router reload - it searches for a
    matching blueprint and then a free slot from the pre-configured pool of routes for that blueprint.
    If the new route can't be dynamically configured, then the behavior is as today and a reload occurs.

    A route might not match a blueprint if it has information that causes a custom haproxy config.
    Examples of that could be a route that has custom certificate information or custom CA information
    to validate re-encrypted routes or it could even be for custom config that is generated for the
    route annotations. Those would fail to find a matching blueprint (and hence cause a reload).

    However that said, the template router does also provide the ability to add custom blueprints.
    These are basically just routes in a namespace which serve as templates or blueprints - similar
    to the default blueprints (for insecure/http, edge terminated and passthrough routes).
    So one can easily add a re-encrypt route blueprint if needed.
    Also, the add route code would search the custom blueprints (in the same fashion that it does
    the default blueprints - custom ones are just added to the list).

    The infra router options
    ROUTER_BLUEPRINT_ROUTE_NAMESPACE and
    ROUTER_BLUEPRINT_ROUTE_POOL_SIZE (default: 10)
    provide some knobs for these custom blueprints.

    Tests will automatically use this new functionality (extended ones needed a config change) but the
    others use the router image, so automatically check with the haproxy router with the dynamic config
    manager enabled.

    Again, a point to note here is that if we exhaust the pool of dynamic servers or blueprint routes or
    don't find a matching blueprint, the template router will need/do a reload.

Associated trello card: https://trello.com/c/W21FD0v0/585-13-implement-dynamic-changes-to-the-router-routerscaledemo

/cc @knobunc @zhaozhanqi fyi

@openshift-ci-robot
Copy link

@ramr: GitHub didn't allow me to request PR reviews from the following users: fyi.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Add support to the template router to dynamically configure haproxy. This reduces the reloads
needed when the router configuration changes due to routes/endpoints changing.

Note that HAProxy doesn't have support to dynamically change certificates, so we need to pre-allocate pools of "blueprint" routes to dynamically bring new routes online.

  • We now add/remove servers from a backend on endpoint changes (for scale up/down, service addition/removal/update, node drain/evictions etc).
    Each haproxy backend (for a route) has a set of dynamic servers added named
    _dynamic-pod-[1-5] - which are used when endpoints change (example for scale up).
    Scaling down endpoints just sets the existing servers in a haproxy backend to "maint" or
    non-serving state.

    The infra router options - command line flags or environment variables named
    ROUTER_DYNAMIC_SERVER_PREFIX and ROUTER_MAX_DYNAMIC_SERVERS (default: 5)
    controls these options.

  • We dynamically add and remove routes if a route matches a pre-allocated/configured blueprint route.
    The default config has blueprint routes for insecure/http, edge terminated and passthrough routes.
    Re-encrypt routes are an issue as they could have custom certificates so is not currently in that
    default blueprint set (commented out).
    Route removal just removes the map entries, marks the servers in maint state but leaves the
    backend config in haproxy intact. The backend is not referenced from anywhere, so is unused.

    Route addition checks if it can bring the route online without a router reload - it searches for a
    matching blueprint and then a free slot from the pre-configured pool of routes for that blueprint.
    If the new route can't be dynamically configured, then the behavior is as today and a reload occurs.

    A route might not match a blueprint if it has information that causes a custom haproxy config.
    Examples of that could be a route that has custom certificate information or custom CA information
    to validate re-encrypted routes or it could even be for custom config that is generated for the
    route annotations. Those would fail to find a matching blueprint (and hence cause a reload).

    However that said, the template router does also provide the ability to add custom blueprints.
    These are basically just routes in a namespace which serve as templates or blueprints - similar
    to the default blueprints (for insecure/http, edge terminated and passthrough routes).
    So one can easily add a re-encrypt route blueprint if needed.
    Also, the add route code would search the custom blueprints (in the same fashion that it does
    the default blueprints - custom ones are just added to the list).

    The infra router options
    ROUTER_BLUEPRINT_ROUTE_NAMESPACE and
    ROUTER_BLUEPRINT_ROUTE_POOL_SIZE (default: 10)
    provide some knobs for these custom blueprints.

Tests will automatically use this new functionality (extended ones needed a config change) but the
others use the router image, so automatically check with the haproxy router with the dynamic config
manager enabled.

Again, a point to note here is that if we exhaust the pool of dynamic servers or blueprint routes or
don't find a matching blueprint, the template router will need/do a reload.

Associated trello card: https://trello.com/c/W21FD0v0/585-13-implement-dynamic-changes-to-the-router-routerscaledemo

/cc @knobunc @zhaozhanqi fyi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 23, 2018
@openshift-merge-robot openshift-merge-robot added the vendor-update Touching vendor dir or related files label Mar 23, 2018
glide.yaml Outdated
version: release-1.9.1
repo: git@github.com:openshift/kubernetes-apiserver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo this reformatting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's all done by glide. I couldn't run glide on my local repo - it doesn't run for me, times out all the time, so @rajatchopra ran it for me and he didn't do anything other than glide get && ./hack/update-deps.sh. Is something that's glide version specific causing this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve never had glide time out - are you on an old version? Bad internet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

glide version is v0.13.1 and its a fast[ish] pipe - 200mb+ consistently (and also tried it on a T3 connection) - so i suspect its not the speed. Plus did try it on 3 different vms as well (all fedora 27 though), so something else. Anyway, will look at it at some other time.

@@ -194,9 +194,6 @@ func (b *configMapVolumeMounter) SetUpAt(dir string, fsGroup *int64) error {
if err := wrapped.SetUpAt(dir, fsGroup); err != nil {
return err
}
if err := volumeutil.MakeNestedMountpoints(b.volName, dir, b.pod); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix this, you reverted changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe because we did the glide get and update-deps.sh a few weeks back - this might be outdated. Will do.

@@ -1324,6 +1324,10 @@ func createAndStartRouterContainerExtended(dockerCli *dockerClient.Client, maste
fmt.Sprintf("ROUTER_BIND_PORTS_AFTER_SYNC=%s", strconv.FormatBool(bindPortsAfterSync)),
fmt.Sprintf("ROUTER_ENABLE_INGRESS=%s", strconv.FormatBool(enableIngress)),
fmt.Sprintf("NAMESPACE_LABELS=%s", namespaceLabels),
fmt.Sprintf("ROUTER_CONFIG_MANAGER=haproxy-manager"),
fmt.Sprintf("ROUTER_DYNAMIC_SERVER_PREFIX=_test-dynamic"),
fmt.Sprintf("ROUTER_MAX_DYNAMIC_SERVERS=3"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should have reasonable defaults. I'd prefer if your testing required only ROUTER_CONFIG_MANAGER unless there's a good reason not to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - i'll set it to only the defaults.

Initialize(router RouterInterface, certPath string)

// Register registers an id to be associated with a route.
Register(id string, route *routeapi.Route, wildcard bool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you passing wildcard here when it should be on the route?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already had boolean deciphered based on the policy so I just passed the boolean but yeah we could recompute it in register/add.


// ConfigManager is used by the router to make configuration changes using
// the template router's dynamic configuration API (if any).
type ConfigManager interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please specify the threading guarantees this code requires with the router controller here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@@ -0,0 +1,209 @@
package haproxy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give this its own unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a map_test.go in there. Or did you mean something else?

}

// NewCSVConverter returns a new CSVConverter.
func NewCSVConverter(headers string, out interface{}, fn ByteConverterFunc) CSVConverter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a pointer unless you have a good reason not to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really - ok will change.


// Convert runs a haproxy dynamic config API command.
func (c CSVConverter) Convert(data []byte) ([]byte, error) {
glog.V(4).Infof("CSV converter input data bytes: %s", string(data))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much data is this? If it's going to regularly be more than a few hundred bytes it should be at v5.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it might be more than a few hundred - will set to v5.

if c.converterFunc != nil {
convertedBytes, err := c.converterFunc(data)
if err != nil {
glog.Errorf("CSV converter error: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't log errors like this, wrap or log when you eat it with utilruntime.HandleError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was a remnant from the logs I was generating during dev - will remove this as its not needed - error are anyways propagated up and handled by the caller.

// fixupHeaders fixes up haproxy API responses that don't contain any CSV
// header information. This allows us to easily parse the data and marshal
// into an array of native golang structs.
func (c CSVConverter) fixupHeaders(data []byte) ([]byte, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't make this a method, just a function that takes headers and data.

Give this file unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

}

// isRetryable checks if a haproxy command can be retried.
func isRetryable(err error, cmd string) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isRetriable

// Backends returns the list of configured haproxy backends.
func (c *Client) Backends() ([]*Backend, error) {
if len(c.backends) == 0 {
if backends, err := GetHAProxyBackends(c); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these methods don't need to be public (GetHAProxyBackends) they shouldn't be public.

// if the error for the command is a retryable error.
func (c *Client) runCommandWithRetries(cmd string, limit int) (*bytes.Buffer, error) {
retryAttempt := 0
for {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use util.ExponentialBackoff or util.PollImmediate instead of inventing your own code for it.

env["ROUTER_METRICS_TLS_CERT_FILE"] = "/etc/pki/tls/metrics/tls.crt"
env["ROUTER_METRICS_TLS_KEY_FILE"] = "/etc/pki/tls/metrics/tls.key"
if cfg.Type == "haproxy-router" {
env["ROUTER_CONFIG_MANAGER"] = "haproxy-manager"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If everyone is going to get these just make these the new default on the command. Don't force people to change their config and make a pointless choice. Otherwise you'll have to add this to the upgrade scripts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will do.

Copy link
Contributor Author

@ramr ramr Mar 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, so because the template router is also used for nginx, this needs to be either set here since its haproxy specific or we need to pass down the type of router to the infra command (that's not done today). I can either pass down this new setting or then the type and set the defaults based on that in the infra command ... but in either case it still means upgrading existing haproxy-router deployments. So which do you prefer?
Edited

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the config manager works, there’s no reason we wouldn’t have it on by default for haproxy. However, we will likely default it off at first until we’re comfortable that we’ve caught all the edge cases and then default it on. So ideally we just have a Boolean that has an internal default and if the user specifies it we obey their rule. That avoids end users ending up with the flag in their config.

// blueprintRoutes returns all the routes in the blueprint namespace.
func (o *TemplateRouterOptions) blueprintRoutes(routeclient *routeinternalclientset.Clientset) ([]*routeapi.Route, error) {
blueprints := make([]*routeapi.Route, 0)
if len(o.BlueprintRouteNamespace) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like an entire namespace of route blueprints. Require them to be identified with a label or associated metadata within a specific namespace, or require them to be specified by name directly.

Whole namespaces devoted to this is pretty ugly, although the idea of pulling it from the route is good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will do.


var cfgManager templateplugin.ConfigManager
if o.ConfigManagerName == "haproxy-manager" {
blueprintRoutes, err := o.blueprintRoutes(routeclient)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're going to have to refresh these, they shouldn't be static or require restart. Make that dynamic (add a controller plugin wrapper that triggers reload of the blueprints).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's going to be some issues with blueprint refresh that I need to think about - basically vis-a-vis replacement and deletion and how the allocations/config changes, so can we do this in a follow-up PR?

flag.DurationVar(&o.CommitInterval, "commit-interval", getIntervalFromEnv("COMMIT_INTERVAL", defaultCommitInterval), "Controls how often to commit (to the actual config) all the changes made using the router specific dynamic configuration manager.")
flag.StringVar(&o.BlueprintRouteNamespace, "blueprint-route-namespace", util.Env("ROUTER_BLUEPRINT_ROUTE_NAMESPACE", ""), "Specifies the namespace which contains the routes that serve as blueprints for the dynamic configuration manager.")
flag.IntVar(&o.BlueprintRoutePoolSize, "blueprint-route-pool-size", int(util.EnvInt("ROUTER_BLUEPRINT_ROUTE_POOL_SIZE", 10, 1)), "Specifies the size of the pre-allocated pool for each route blueprint managed by the router specific dynamic configuration manager. This can be overriden by an annotation router.openshift.io/pool-size on an individual route.")
flag.StringVar(&o.DynamicServerPrefix, "dynamic-server-prefix", util.Env("ROUTER_DYNAMIC_SERVER_PREFIX", ""), "Specifies the prefix for dynamic servers added to router backends. These dynamic servers are handled by the router specific dynamic configuration manager.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this is configurable. Seems pointless?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this.

flag.BoolVar(&o.BindPortsAfterSync, "bind-ports-after-sync", util.Env("ROUTER_BIND_PORTS_AFTER_SYNC", "") == "true", "Bind ports only after route state has been synchronized")
flag.StringVar(&o.MaxConnections, "max-connections", util.Env("ROUTER_MAX_CONNECTIONS", ""), "Specifies the maximum number of concurrent connections.")
flag.StringVar(&o.Ciphers, "ciphers", util.Env("ROUTER_CIPHERS", ""), "Specifies the cipher suites to use. You can choose a predefined cipher set ('modern', 'intermediate', or 'old') or specify exact cipher suites by passing a : separated list.")
flag.BoolVar(&o.StrictSNI, "strict-sni", isTrue(util.Env("ROUTER_STRICT_SNI", "")), "Use strict-sni bind processing (do not use default cert).")
flag.StringVar(&o.MetricsType, "metrics-type", util.Env("ROUTER_METRICS_TYPE", ""), "Specifies the type of metrics to gather. Supports 'haproxy'.")
flag.StringVar(&o.ConfigManagerName, "config-manager", util.Env("ROUTER_CONFIG_MANAGER", ""), "Specifies the manager to use for dynamically configuring changes with the underlying router. Supports 'haproxy-manager'.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason not to default this on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we don't pass the router type to the infra command, so this needs to be set from above as of now. We can't set this for nginx template routers. We can either pass the type of router down (ala haproxy etc) and then do the required defaults here or then (as was done) pass this env variable down.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the defaulting later when you know the type. I doubt we’ll ever have multiple, so this is really just a bool.

flag.BoolVar(&o.BindPortsAfterSync, "bind-ports-after-sync", util.Env("ROUTER_BIND_PORTS_AFTER_SYNC", "") == "true", "Bind ports only after route state has been synchronized")
flag.StringVar(&o.MaxConnections, "max-connections", util.Env("ROUTER_MAX_CONNECTIONS", ""), "Specifies the maximum number of concurrent connections.")
flag.StringVar(&o.Ciphers, "ciphers", util.Env("ROUTER_CIPHERS", ""), "Specifies the cipher suites to use. You can choose a predefined cipher set ('modern', 'intermediate', or 'old') or specify exact cipher suites by passing a : separated list.")
flag.BoolVar(&o.StrictSNI, "strict-sni", isTrue(util.Env("ROUTER_STRICT_SNI", "")), "Use strict-sni bind processing (do not use default cert).")
flag.StringVar(&o.MetricsType, "metrics-type", util.Env("ROUTER_METRICS_TYPE", ""), "Specifies the type of metrics to gather. Supports 'haproxy'.")
flag.StringVar(&o.ConfigManagerName, "config-manager", util.Env("ROUTER_CONFIG_MANAGER", ""), "Specifies the manager to use for dynamically configuring changes with the underlying router. Supports 'haproxy-manager'.")
flag.StringVar(&o.ConfigManagerConnectionInfo, "config-manager-connection-info", "", "Specifies connection information for the dynamic configuration manager.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems overkill to configure this. We don't allow configuring the stats socket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the stats socket is sorta tied into how the template is configured (using af unix sockets).
It probably is overkill but if the haproxy template is potentially changed by the user, this is just a knob to use tcp if so needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this - we can add it later if needed.

flag.StringVar(&o.ConfigManagerConnectionInfo, "config-manager-connection-info", "", "Specifies connection information for the dynamic configuration manager.")
flag.DurationVar(&o.CommitInterval, "commit-interval", getIntervalFromEnv("COMMIT_INTERVAL", defaultCommitInterval), "Controls how often to commit (to the actual config) all the changes made using the router specific dynamic configuration manager.")
flag.StringVar(&o.BlueprintRouteNamespace, "blueprint-route-namespace", util.Env("ROUTER_BLUEPRINT_ROUTE_NAMESPACE", ""), "Specifies the namespace which contains the routes that serve as blueprints for the dynamic configuration manager.")
flag.IntVar(&o.BlueprintRoutePoolSize, "blueprint-route-pool-size", int(util.EnvInt("ROUTER_BLUEPRINT_ROUTE_POOL_SIZE", 10, 1)), "Specifies the size of the pre-allocated pool for each route blueprint managed by the router specific dynamic configuration manager. This can be overriden by an annotation router.openshift.io/pool-size on an individual route.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on an individual blueprint route

You need to explain blueprint routes in the command help for this.

flag.BoolVar(&o.BindPortsAfterSync, "bind-ports-after-sync", util.Env("ROUTER_BIND_PORTS_AFTER_SYNC", "") == "true", "Bind ports only after route state has been synchronized")
flag.StringVar(&o.MaxConnections, "max-connections", util.Env("ROUTER_MAX_CONNECTIONS", ""), "Specifies the maximum number of concurrent connections.")
flag.StringVar(&o.Ciphers, "ciphers", util.Env("ROUTER_CIPHERS", ""), "Specifies the cipher suites to use. You can choose a predefined cipher set ('modern', 'intermediate', or 'old') or specify exact cipher suites by passing a : separated list.")
flag.BoolVar(&o.StrictSNI, "strict-sni", isTrue(util.Env("ROUTER_STRICT_SNI", "")), "Use strict-sni bind processing (do not use default cert).")
flag.StringVar(&o.MetricsType, "metrics-type", util.Env("ROUTER_METRICS_TYPE", ""), "Specifies the type of metrics to gather. Supports 'haproxy'.")
flag.StringVar(&o.ConfigManagerName, "config-manager", util.Env("ROUTER_CONFIG_MANAGER", ""), "Specifies the manager to use for dynamically configuring changes with the underlying router. Supports 'haproxy-manager'.")
flag.StringVar(&o.ConfigManagerConnectionInfo, "config-manager-connection-info", "", "Specifies connection information for the dynamic configuration manager.")
flag.DurationVar(&o.CommitInterval, "commit-interval", getIntervalFromEnv("COMMIT_INTERVAL", defaultCommitInterval), "Controls how often to commit (to the actual config) all the changes made using the router specific dynamic configuration manager.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this configurable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would tie this to the router reload interval unless you have some sort of real reason to make this configurable.

Copy link
Contributor Author

@ramr ramr Mar 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is a separate config option is because we don't want to change the reload interval to something higher. As that means if the dynamic config manager can't bring routes online (because of certs/custom annotations etc), the reloads to bring those route would be staggered by the reload interval - not good if say that's 1 hour ... which the default for the commit interval.

@@ -113,12 +132,19 @@ func (o *TemplateRouter) Bind(flag *pflag.FlagSet) {
flag.StringVar(&o.DefaultDestinationCAPath, "default-destination-ca-path", util.Env("DEFAULT_DESTINATION_CA_PATH", "/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt"), "A path to a PEM file containing the default CA bundle to use with re-encrypt routes. This CA should sign for certificates in the Kubernetes DNS space (service.namespace.svc).")
flag.StringVar(&o.TemplateFile, "template", util.Env("TEMPLATE_FILE", ""), "The path to the template file to use")
flag.StringVar(&o.ReloadScript, "reload", util.Env("RELOAD_SCRIPT", ""), "The path to the reload script to use")
flag.DurationVar(&o.ReloadInterval, "interval", reloadInterval(), "Controls how often router reloads are invoked. Mutiple router reload requests are coalesced for the duration of this interval since the last reload time.")
flag.DurationVar(&o.ReloadInterval, "interval", getIntervalFromEnv("RELOAD_INTERVAL", defaultReloadInterval), "Controls how often router reloads are invoked. Mutiple router reload requests are coalesced for the duration of this interval since the last reload time.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just took the reloadInterval code and made it into a reusable function so that we can call it to get the interval value for both reload and commit intervals (same code passes a different parameters).

@@ -179,8 +195,9 @@ var helperFunctions = template.FuncMap{
"matchValues": matchValues, //compares a given string to a list of allowed strings

"genSubdomainWildcardRegexp": genSubdomainWildcardRegexp, //generates a regular expression matching the subdomain for hosts (and paths) with a wildcard policy
"generateRouteRegexp": generateRouteRegexp, //generates a regular expression matching the route hosts (and paths)
"generateRouteRegexp": GenerateRouteRegexp, //generates a regular expression matching the route hosts (and paths)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this exported?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are trying to reuse this, put it in its own package (templatehelpers maybe) and include it here. That gets more crap out of this package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will do.

glog.V(4).Infof("Dynamically replacing endpoints for associated backend %s", backendKey)
if err := r.dynamicConfigManager.ReplaceRouteEndpoints(backendKey, oldEndpoints, newEndpoints, weight); err != nil {
// Error dynamically modifying the config, so return false to cause a reload to happen.
glog.Warningf("Router will reload as dynamic endpoint replacement for service id %s (backend=%s, weight=%v) failed: %v", id, backendKey, weight, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could easily spew errors here, i'm a little concerned with this in the long term.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change to glog.V(4).Infof, so that its available if someone turns up the debugging level and for all the other "Router will reload" messages.


// If no initial sync was done, don't try to dynamically add the
// route as we will need a reload anyway.
if !r.synced {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this after registration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we don't add the routes if the initial sync has not been done. There's nothing in the haproxy config at that point in time (backends don't exist for blueprints as yet, etc). Plus since there's going to be a reload and the config is going to change anyway, the configmanager is not really going to do anything here (other than registration).
So, the registration is done always - irrespective of whether the router sync state. And that ensures that the mappings in the configmanager are correct.

@knobunc
Copy link
Contributor

knobunc commented Jul 12, 2018

/retest

@ramr
Copy link
Contributor Author

ramr commented Jul 12, 2018

/test integration
/test extended_image_ecosystem

@knobunc
Copy link
Contributor

knobunc commented Jul 12, 2018

/retest

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jul 12, 2018
@ramr
Copy link
Contributor Author

ramr commented Jul 12, 2018

sigh, one more rebase and updated dependencies
@deads2k there was a change to glide.lock (bump commit sha: 96611c2)

Edited bump commit sha

ramr added 9 commits July 12, 2018 22:38
…eded

when configuration changes. This commit includes:
  o Support for a dynamic configuration manager in the template router.
  o HAProxy configuration manager implementation.
  o Scale up support with a pool of dynamic servers for each backend.
  o Dynamic route addition support with a pre-allocated pool of blueprint routes.
  o HAProxy dynamic configuration retry support.
  o Passthrough routes weights are relative, so as to allow them to be scaled.
    Fixes a bug with the haproxy dynamic config api hanging.
  o Support for a blueprint route namespace, so that we can load custom
    blueprints at startup.
  o Refactor, log errors, cleanup and bug fixes.
  o Remove unused/extraneous map entries for blueprint routes.
  o Self-review changes.
  o Add new tests and modify existing integration+e2e tests.
… the

manager appropriately. And some more changes as per review comments.
@ramr
Copy link
Contributor Author

ramr commented Jul 14, 2018

@deads2k can you please reconfirm the bump(*) commits and approve. Thx

The bot saying a test failed is an old test - that actually threw me off a few times (I completely missed your comment about that ) before I realized a couple of days back that the build date was from April.
D'oh!

fyi @knobunc

@knobunc
Copy link
Contributor

knobunc commented Jul 16, 2018

/retest
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 16, 2018
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, knobunc, ramr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looks good

@knobunc
Copy link
Contributor

knobunc commented Jul 16, 2018

/retest

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jul 16, 2018

@ramr: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/unit 327ba9b link /test unit

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. component/routing lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. vendor-update Touching vendor dir or related files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants