Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

v0.9.9 cluster autoscaling with two nodepools #1072

Closed
jcrugzz opened this issue Dec 13, 2017 · 9 comments
Closed

v0.9.9 cluster autoscaling with two nodepools #1072

jcrugzz opened this issue Dec 13, 2017 · 9 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. triage/support Indicates an issue that is a support question.

Comments

@jcrugzz
Copy link

jcrugzz commented Dec 13, 2017

This is partially a question of how this is supposed to work when creating a fresh cluster. I created a cluster with two node pools in private subnets with autoscaling enabled. When the cluster successfully came up, there was only a cluster autoscaler on one of my controller nodes. Is this the expected behavior? Or should there be two autoscalers deployed with one in each node pool for the separate subnets/availability zones? Example partial config below, let me know if you need more.

  nodePools:
    - # Name of this node pool. Must be unique among all the node pools in this cluster
      name: nodepool1
#      # Subnet(s) to which worker nodes in this node pool are deployed
#      # References subnets defined under the top-level `subnets` key by their names
#      # If omitted, public subnets are created by kube-aws and used for worker nodes
      subnets:
      - name: ManagedPrivateSubnet1

#      # Instance type for worker nodes
#      # CAUTION: Don't use t2.micro or the cluster won't work. See https://github.com/kubernetes/kubernetes/issues/16122
      instanceType: m4.xlarge
#
#      # EC2 instance tags for worker nodes
      instanceTags:
        instanceRole: worker
        #
#      # Auto Scaling Group definition for workers. If only `workerCount` is specified, min and max will be the set to that value and `rollingUpdateMinInstancesInService` will be one less.
      autoScalingGroup:
        minSize: 4
        maxSize: 100
        rollingUpdateMinInstancesInService: 2

#      # Autoscaling by adding/removing nodes according to resource usage
      autoscaling:
#        # Make this node pool an autoscaling-target of k8s cluster-autoscaler
#        #
#        # Beware that making this an autoscaing-target doesn't automatically deploy cluster-autoscaler itself -
#        # turn on `addons.clusterAutoscaler.enabled` to deploy it on controller nodes.
        clusterAutoscaler:
          enabled: true
#
#      # Used to provide `/etc/environment` env vars with values from arbitrary CloudFormation refs
#      awsEnvironment:
#        enabled: true
#        environment:
#          CFNSTACK: '{ "Ref" : "AWS::StackId" }'
#
#      # Add predefined set of labels to the nodes
#      # The set includes names of launch configurations and autoscaling groups
      awsNodeLabels:
        enabled: true
#
#      # Will provision worker nodes with IAM permissions to run cluster-autoscaler and add node labels so that
#      # cluster-autoscaler pods are scheduled to nodes in this node pool
      clusterAutoscalerSupport:
        enabled: true
#
#      # Kubernetes node labels to be added to worker nodes
      nodeLabels:
        kube-aws.coreos.com/role: worker
#
#      # Kubernetes node taints to be added to worker nodes
#      taints:
#        - key: dedicated
#          value: search
#          effect: NoSchedule

    - # Name of this node pool. Must be unique among all the node pools in this cluster
      name: nodepool2
#      # Subnet(s) to which worker nodes in this node pool are deployed
#      # References subnets defined under the top-level `subnets` key by their names
#      # If omitted, public subnets are created by kube-aws and used for worker nodes
      subnets:
      - name: ManagedPrivateSubnet2

#      # Instance type for worker nodes
#      # CAUTION: Don't use t2.micro or the cluster won't work. See https://github.com/kubernetes/kubernetes/issues/16122
      instanceType: m4.xlarge
#
#      # EC2 instance tags for worker nodes
      instanceTags:
        instanceRole: worker
        #
#      # Auto Scaling Group definition for workers. If only `workerCount` is specified, min and max will be the set to that value and `rollingUpdateMinInstancesInService` will be one less.
      autoScalingGroup:
        minSize: 4
        maxSize: 100
        rollingUpdateMinInstancesInService: 2

#      # Autoscaling by adding/removing nodes according to resource usage
      autoscaling:
#        # Make this node pool an autoscaling-target of k8s cluster-autoscaler
#        #
#        # Beware that making this an autoscaing-target doesn't automatically deploy cluster-autoscaler itself -
#        # turn on `addons.clusterAutoscaler.enabled` to deploy it on controller nodes.
        clusterAutoscaler:
          enabled: true
#
#      # Used to provide `/etc/environment` env vars with values from arbitrary CloudFormation refs
#      awsEnvironment:
#        enabled: true
#        environment:
#          CFNSTACK: '{ "Ref" : "AWS::StackId" }'
#
#      # Add predefined set of labels to the nodes
#      # The set includes names of launch configurations and autoscaling groups
      awsNodeLabels:
        enabled: true
#
#      # Will provision worker nodes with IAM permissions to run cluster-autoscaler and add node labels so that
#      # cluster-autoscaler pods are scheduled to nodes in this node pool
      clusterAutoscalerSupport:
        enabled: true
#
#      # Kubernetes node labels to be added to worker nodes
      nodeLabels:
        kube-aws.coreos.com/role: worker
@jcrugzz jcrugzz changed the title cluster autoscaling with two nodepools v0.9.9 cluster autoscaling with two nodepools Dec 13, 2017
@mumoshu
Copy link
Contributor

mumoshu commented Dec 18, 2017

@jcrugzz Hi, thanks for trying kube-aws!
Could you clarify a bit more about your expectations?

I may be missing something but let me answer each question anyway:

When the cluster successfully came up, there was only a cluster autoscaler on one of my controller nodes.

Yes.

Or should there be two autoscalers deployed with one in each node pool for the separate subnets/availability zones?

No. CA works cluster-wide and therefore there should be no need to have a separate CA per subnet/az in a normal use-case in my mind. What is your concrete use-case of CA?

@mumoshu mumoshu added the triage/support Indicates an issue that is a support question. label Dec 18, 2017
@jcrugzz
Copy link
Author

jcrugzz commented Dec 19, 2017

@mumoshu gotcha, I was under the impression by reading the config comments that by enabling different node pools with different availability zones cluster autoscaler that it was necessary to have one to manage that different availability zone. If thats not the case then everything is good and my question is answered :)

@mumoshu
Copy link
Contributor

mumoshu commented Dec 19, 2017

@jcrugzz Thanks for the confirmation 👍
Your understanding seems correct to me.

Just to make sure you won't get into trouble - let me also add that you should have a separate node pool per AZ when you're going Multi-AZ while enabling CA. In other words, a single node-pool spanning multiple AZs does break CA a bit.

See the note starting "Cluster autoscaler is not zone aware" in https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#common-notes-and-gotchas for more info.

@whereisaaron
Copy link
Contributor

whereisaaron commented Dec 28, 2017

@mumoshu I've always been confused about the multi-AZ with cluster-autoscaler. Even though CA is not zone aware, AWS is, and when CA scales up/down a multi-AZ pool, AWS will tend towards even distribution.

With three single-AZ CA node pools, and CA zone unaware, then you could easily get uneven AZ distribution, unless CA strongly tends to keep all pools the same size? It seems like a single multi-AZ pool seems more likely to distribute evenly?

I thought the issue was when you did need to be zone aware, e.g. you need more nodes on AZ 'x' because the required EBS volumes are in AZ 'x', and so CA can work out that scaling a single-zone AX 'x' pool will get more pods scheduled (than scaling up an AZ 'y' pool). So it seemed like single-AZ CA pools only made sense if CA were in fact zone aware, and thus able to choose which single-AZ pool to scale?

@whereisaaron
Copy link
Contributor

whereisaaron commented Jan 28, 2018

I found the key discussion here. In it @mumoshu makes the same argument that I made above, that auto-scaled multi-AZ ASGs should work fine, if not better than single-AZ ASGs.

kubernetes-retired/contrib#1552 (diff)

This issue is the auto-scaler behavior where it samples one node from a pool and simulates scheduling on that node as a way to make the decision where to scale up that pool or not. It assumes all other nodes in the pool are identical in both spec and AZ. If the scheduled Pod has some zone aware component (like an EBS volume or node label) then the auto-scaler may make a poor decision.

If you are sure none of your scheduling is AZ aware/specific you are probably fine with multi-AZ node pools. In fact you may get better balancing this way. But if anything could be AZ-specific, you need to have only one AZ per node pool. There is a workaround to get better balancing with single-zone ASGs, but it relies on you not using any custom node tags.

Would be hand to reference this discussion in the documentation.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 22, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 22, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. triage/support Indicates an issue that is a support question.
Projects
None yet
Development

No branches or pull requests

5 participants