Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healthchecks #298

Merged
merged 20 commits into from
Feb 7, 2018
Merged

Healthchecks #298

merged 20 commits into from
Feb 7, 2018

Conversation

roffe
Copy link
Collaborator

@roffe roffe commented Feb 6, 2018

Running this PR in my test env over day. Please review but do not merge until I have test results.

Fixes: #80 and #171

  • Added roling update strategy to all daemonsets. Max 2 unavail
  • Added liveness check in daemonsets to use default health port of 20244 on /healthz
  • Added health controller

Description:

Added health controller that uses syncinterval of controllers + 5 seconds. If a controller isn't reporting in with a heartbeat within the interval it kube-router will be flagged as unhealthy on the /healthz endpoint

Docs are in health.md

@roffe
Copy link
Collaborator Author

roffe commented Feb 6, 2018

5 hours in the test cluster so far so good

@murali-reddy
Copy link
Member

murali-reddy commented Feb 6, 2018

@roffe Really very clean design. Thanks for your efforts fixing important issues with respect to ops point of view. I have tested the changes. Once you are happy with your test results feel free to merge.

Any one wish to try out please try the image cloudnativelabs/kube-router-git:healthcheck.

PS: Not sure you observed, when you update/upgrade the kube-router pods they with draw the routes from the kernel routing table. @andrewsykim added support for BGP graceful restart. In my testing I observed that if I restart a pod by pod with BGP graceful restart enabled then data-plane traffic is never impacted and routes are not withdrawn. Do you see any such possiblity of such update strategy?

@@ -118,5 +119,6 @@ func (s *KubeRouterConfig) AddFlags(fs *pflag.FlagSet) {
// fs.StringVar(&s.FullMeshPassword, "nodes-full-mesh-password", s.FullMeshPassword,
// "Password that cluster-node BGP servers will use to authenticate one another when \"--nodes-full-mesh\" is set.")
fs.StringVarP(&s.VLevel, "v", "v", "0", "log level for V logs")
fs.Uint16Var(&s.HealthPort, "health-port", 20244, "Health check port, 0 = Disabled")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add the default value in the help message too please?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewsykim pflag prints it out as a parenthesis on the help text.

Usage of kube-router:
      --advertise-cluster-ip             Add Cluster IP of the service to the RIB so that it gets advertises to the BGP peers.
      --advertise-external-ip            Add External IP of service to the RIB so that it gets advertised to the BGP peers.
      --bgp-graceful-restart             Enables the BGP Graceful Restart capability so that routes are preserved on unexpected restarts
      --cleanup-config                   Cleanup iptables rules, ipvs, ipset configuration and exit.
      --cluster-asn uint                 ASN number under which cluster nodes will run iBGP.
      --cluster-cidr string              CIDR range of pods in the cluster. It is used to identify traffic originating from and destinated to pods.
      --config-sync-period duration      The delay between apiserver configuration synchronizations (e.g. '5s', '1m').  Must be greater than 0. (default 1m0s)
      --enable-ibgp                      Enables peering with nodes with the same ASN, if disabled will only peer with external BGP peers (default true)
      --enable-overlay                   When enable-overlay set to true, IP-in-IP tunneling is used for pod-to-pod networking across nodes in different subnets. When set to false no tunneling is used and routing infrastrcture is expected to route traffic for pod-to-pod networking across nodes in different subnets (default true)
      --enable-pod-egress                SNAT traffic from Pods to destinations outside the cluster. (default true)
      --enable-pprof                     Enables pprof for debugging performance and memory leak issues.
      --hairpin-mode                     Add iptable rules for every Service Endpoint to support hairpin traffic.
      --health-port uint16               Health check port, 0 = Disabled (default 20244)
  -h, --help                             Print usage information.
      --hostname-override string         Overrides the NodeName of the node. Set this if kube-router is unable to determine your NodeName automatically.
      --iptables-sync-period duration    The delay between iptables rule synchronizations (e.g. '5s', '1m'). Must be greater than 0. (default 1m0s)
      --ipvs-sync-period duration        The delay between ipvs config synchronizations (e.g. '5s', '1m', '2h22m'). Must be greater than 0. (default 1m0s)
      --kubeconfig string                Path to kubeconfig file with authorization information (the master location is set by the master flag).
      --masquerade-all                   SNAT all traffic to cluster IP/node port.
      --master string                    The address of the Kubernetes API server (overrides any value in kubeconfig).
      --metrics-path string              Prometheus metrics path (default "/metrics")
      --metrics-port uint16              Prometheus metrics port, 0 = Disabled
      --nodeport-bindon-all-ip           For service of NodePort type create IPVS service that listens on all IP's of the node.
      --nodes-full-mesh                  Each node in the cluster will setup BGP peering with rest of the nodes. (default true)
      --peer-router-asns uintSlice       ASN numbers of the BGP peer to which cluster nodes will advertise cluster ip and node's pod cidr. (default [])
      --peer-router-ips ipSlice          The ip address of the external router to which all nodes will peer and advertise the cluster ip and pod cidr's. (default [])
      --peer-router-multihop-ttl uint8   Enable eBGP multihop supports -- sets multihop-ttl. (Relevant only if ttl >= 2)
      --peer-router-passwords strings    Password for authenticating against the BGP peer defined with "--peer-router-ips".
      --routes-sync-period duration      The delay between route updates and advertisements (e.g. '5s', '1m', '2h22m'). Must be greater than 0. (default 1m0s)
      --run-firewall                     Enables Network Policy -- sets up iptables to provide ingress firewall for pods. (default true)
      --run-router                       Enables Pod Networking -- Advertises and learns the routes to Pods via iBGP. (default true)
      --run-service-proxy                Enables Service Proxy -- sets up IPVS for Kubernetes Services. (default true)
  -v, --v string                         log level for V logs (default "0")

I however saw that when the default value is 0, for some reason it doesn't so i added it for metrics port.

Thanks for the feedback! :)

@roffe roffe merged commit a480a51 into cloudnativelabs:master Feb 7, 2018
@roffe roffe deleted the healthcheck branch February 7, 2018 11:43
@andrewsykim
Copy link
Collaborator

@roffe sorry this comment is late, I have some concerns around setting the updateStrategy to RollingUpdate. I think this strategy works fine only if graceful restart is enabled, but when it is not it can lead to unexpected behaviours, possibly outages for some users 🤔

@SEJeff
Copy link
Contributor

SEJeff commented Feb 11, 2018

@andrewsykim what do you suggest otherwise? Having an updateStrategy of Recreate? Also, why would you disable graceful bgp restart?

@andrewsykim
Copy link
Collaborator

andrewsykim commented Feb 12, 2018

@roffe added a patch documenting that you can set the rolling update strategy yourself 3369890. The idea there is that updateStrategy: OnDelete is a much safer default since kube-router restarting without graceful restart would take down networking for the nodes being replaced.

Also, why would you disable graceful bgp restart?

Ideally, you would always enable graceful restart. But do consider that enabling graceful restarting is not just a matter of setting --bgp-graceful-restart on the kube-router spec, the global peers also have to have the capability enabled, which is not always up to the cluster admin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants