v0.9.9 cluster autoscaling with two nodepools #1072

jcrugzz · 2017-12-13T17:53:10Z

This is partially a question of how this is supposed to work when creating a fresh cluster. I created a cluster with two node pools in private subnets with autoscaling enabled. When the cluster successfully came up, there was only a cluster autoscaler on one of my controller nodes. Is this the expected behavior? Or should there be two autoscalers deployed with one in each node pool for the separate subnets/availability zones? Example partial config below, let me know if you need more.

  nodePools:
    - # Name of this node pool. Must be unique among all the node pools in this cluster
      name: nodepool1
#      # Subnet(s) to which worker nodes in this node pool are deployed
#      # References subnets defined under the top-level `subnets` key by their names
#      # If omitted, public subnets are created by kube-aws and used for worker nodes
      subnets:
      - name: ManagedPrivateSubnet1

#      # Instance type for worker nodes
#      # CAUTION: Don't use t2.micro or the cluster won't work. See https://github.com/kubernetes/kubernetes/issues/16122
      instanceType: m4.xlarge
#
#      # EC2 instance tags for worker nodes
      instanceTags:
        instanceRole: worker
        #
#      # Auto Scaling Group definition for workers. If only `workerCount` is specified, min and max will be the set to that value and `rollingUpdateMinInstancesInService` will be one less.
      autoScalingGroup:
        minSize: 4
        maxSize: 100
        rollingUpdateMinInstancesInService: 2

#      # Autoscaling by adding/removing nodes according to resource usage
      autoscaling:
#        # Make this node pool an autoscaling-target of k8s cluster-autoscaler
#        #
#        # Beware that making this an autoscaing-target doesn't automatically deploy cluster-autoscaler itself -
#        # turn on `addons.clusterAutoscaler.enabled` to deploy it on controller nodes.
        clusterAutoscaler:
          enabled: true
#
#      # Used to provide `/etc/environment` env vars with values from arbitrary CloudFormation refs
#      awsEnvironment:
#        enabled: true
#        environment:
#          CFNSTACK: '{ "Ref" : "AWS::StackId" }'
#
#      # Add predefined set of labels to the nodes
#      # The set includes names of launch configurations and autoscaling groups
      awsNodeLabels:
        enabled: true
#
#      # Will provision worker nodes with IAM permissions to run cluster-autoscaler and add node labels so that
#      # cluster-autoscaler pods are scheduled to nodes in this node pool
      clusterAutoscalerSupport:
        enabled: true
#
#      # Kubernetes node labels to be added to worker nodes
      nodeLabels:
        kube-aws.coreos.com/role: worker
#
#      # Kubernetes node taints to be added to worker nodes
#      taints:
#        - key: dedicated
#          value: search
#          effect: NoSchedule

    - # Name of this node pool. Must be unique among all the node pools in this cluster
      name: nodepool2
#      # Subnet(s) to which worker nodes in this node pool are deployed
#      # References subnets defined under the top-level `subnets` key by their names
#      # If omitted, public subnets are created by kube-aws and used for worker nodes
      subnets:
      - name: ManagedPrivateSubnet2

#      # Instance type for worker nodes
#      # CAUTION: Don't use t2.micro or the cluster won't work. See https://github.com/kubernetes/kubernetes/issues/16122
      instanceType: m4.xlarge
#
#      # EC2 instance tags for worker nodes
      instanceTags:
        instanceRole: worker
        #
#      # Auto Scaling Group definition for workers. If only `workerCount` is specified, min and max will be the set to that value and `rollingUpdateMinInstancesInService` will be one less.
      autoScalingGroup:
        minSize: 4
        maxSize: 100
        rollingUpdateMinInstancesInService: 2

#      # Autoscaling by adding/removing nodes according to resource usage
      autoscaling:
#        # Make this node pool an autoscaling-target of k8s cluster-autoscaler
#        #
#        # Beware that making this an autoscaing-target doesn't automatically deploy cluster-autoscaler itself -
#        # turn on `addons.clusterAutoscaler.enabled` to deploy it on controller nodes.
        clusterAutoscaler:
          enabled: true
#
#      # Used to provide `/etc/environment` env vars with values from arbitrary CloudFormation refs
#      awsEnvironment:
#        enabled: true
#        environment:
#          CFNSTACK: '{ "Ref" : "AWS::StackId" }'
#
#      # Add predefined set of labels to the nodes
#      # The set includes names of launch configurations and autoscaling groups
      awsNodeLabels:
        enabled: true
#
#      # Will provision worker nodes with IAM permissions to run cluster-autoscaler and add node labels so that
#      # cluster-autoscaler pods are scheduled to nodes in this node pool
      clusterAutoscalerSupport:
        enabled: true
#
#      # Kubernetes node labels to be added to worker nodes
      nodeLabels:
        kube-aws.coreos.com/role: worker

mumoshu · 2017-12-18T01:03:40Z

@jcrugzz Hi, thanks for trying kube-aws!
Could you clarify a bit more about your expectations?

I may be missing something but let me answer each question anyway:

When the cluster successfully came up, there was only a cluster autoscaler on one of my controller nodes.

Yes.

Or should there be two autoscalers deployed with one in each node pool for the separate subnets/availability zones?

No. CA works cluster-wide and therefore there should be no need to have a separate CA per subnet/az in a normal use-case in my mind. What is your concrete use-case of CA?

jcrugzz · 2017-12-19T03:28:24Z

@mumoshu gotcha, I was under the impression by reading the config comments that by enabling different node pools with different availability zones cluster autoscaler that it was necessary to have one to manage that different availability zone. If thats not the case then everything is good and my question is answered :)

mumoshu · 2017-12-19T03:47:27Z

@jcrugzz Thanks for the confirmation 👍
Your understanding seems correct to me.

Just to make sure you won't get into trouble - let me also add that you should have a separate node pool per AZ when you're going Multi-AZ while enabling CA. In other words, a single node-pool spanning multiple AZs does break CA a bit.

See the note starting "Cluster autoscaler is not zone aware" in https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#common-notes-and-gotchas for more info.

whereisaaron · 2017-12-28T21:28:20Z

@mumoshu I've always been confused about the multi-AZ with cluster-autoscaler. Even though CA is not zone aware, AWS is, and when CA scales up/down a multi-AZ pool, AWS will tend towards even distribution.

With three single-AZ CA node pools, and CA zone unaware, then you could easily get uneven AZ distribution, unless CA strongly tends to keep all pools the same size? It seems like a single multi-AZ pool seems more likely to distribute evenly?

I thought the issue was when you did need to be zone aware, e.g. you need more nodes on AZ 'x' because the required EBS volumes are in AZ 'x', and so CA can work out that scaling a single-zone AX 'x' pool will get more pods scheduled (than scaling up an AZ 'y' pool). So it seemed like single-AZ CA pools only made sense if CA were in fact zone aware, and thus able to choose which single-AZ pool to scale?

whereisaaron · 2018-01-28T19:45:13Z

I found the key discussion here. In it @mumoshu makes the same argument that I made above, that auto-scaled multi-AZ ASGs should work fine, if not better than single-AZ ASGs.

kubernetes-retired/contrib#1552 (diff)

This issue is the auto-scaler behavior where it samples one node from a pool and simulates scheduling on that node as a way to make the decision where to scale up that pool or not. It assumes all other nodes in the pool are identical in both spec and AZ. If the scheduled Pod has some zone aware component (like an EBS volume or node label) then the auto-scaler may make a poor decision.

If you are sure none of your scheduling is AZ aware/specific you are probably fine with multi-AZ node pools. In fact you may get better balancing this way. But if anything could be AZ-specific, you need to have only one AZ per node pool. There is a workaround to get better balancing with single-zone ASGs, but it relies on you not using any custom node tags.

Would be hand to reference this discussion in the documentation.

fejta-bot · 2019-04-22T14:58:27Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-05-22T15:40:58Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-06-21T16:31:10Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-06-21T16:31:18Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jcrugzz changed the title ~~cluster autoscaling with two nodepools~~ v0.9.9 cluster autoscaling with two nodepools Dec 13, 2017

mumoshu added the triage/support Indicates an issue that is a support question. label Dec 18, 2017

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 22, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 22, 2019

k8s-ci-robot closed this as completed Jun 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.9.9 cluster autoscaling with two nodepools #1072

v0.9.9 cluster autoscaling with two nodepools #1072

jcrugzz commented Dec 13, 2017

mumoshu commented Dec 18, 2017

jcrugzz commented Dec 19, 2017

mumoshu commented Dec 19, 2017

whereisaaron commented Dec 28, 2017 •

edited

Loading

whereisaaron commented Jan 28, 2018 •

edited

Loading

fejta-bot commented Apr 22, 2019

fejta-bot commented May 22, 2019

fejta-bot commented Jun 21, 2019

k8s-ci-robot commented Jun 21, 2019

v0.9.9 cluster autoscaling with two nodepools #1072

v0.9.9 cluster autoscaling with two nodepools #1072

Comments

jcrugzz commented Dec 13, 2017

mumoshu commented Dec 18, 2017

jcrugzz commented Dec 19, 2017

mumoshu commented Dec 19, 2017

whereisaaron commented Dec 28, 2017 • edited Loading

whereisaaron commented Jan 28, 2018 • edited Loading

fejta-bot commented Apr 22, 2019

fejta-bot commented May 22, 2019

fejta-bot commented Jun 21, 2019

k8s-ci-robot commented Jun 21, 2019

whereisaaron commented Dec 28, 2017 •

edited

Loading

whereisaaron commented Jan 28, 2018 •

edited

Loading