Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5.x: Clarify documentation for autoscaler #809

Open
rodrigc opened this issue Aug 30, 2023 · 2 comments
Open

5.x: Clarify documentation for autoscaler #809

rodrigc opened this issue Aug 30, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@rodrigc
Copy link
Contributor

rodrigc commented Aug 30, 2023

The existing docs for the autoscaler at https://oracle-terraform-modules.github.io/terraform-oci-oke/guide/workers_scaling.html could use some clarification.

In private discussion with @hyder , he mentioned:

Let me explain the cluster autoscaler mechanics here

Cluster Autoscaler has 2 main requirements:
the CA pod must run on a worker node from nodepool that it itself is not managing i.e. is not autoscaled
the CA pod has the necessary permissions to perform cluster autoscaling actions

For (1), we use a dedicated nodepool. By default, we economically set its size to 1. From the perspective of the cluster autoscaler, this node pool is "unmanaged". Please do not confuse this with self-managed nodes.

For (2), we do a few things:
a. ensure that the cluster autoscaler pod always lands on this worker node by using taint and tolerations.
b. assign kubernetes labels that will facilitate (a)
c. create a dynamic group that uses defined tags as rules to determine membership
d. apply defined tags on the unmanaged worker node from the nodepool in (1) that will make worker nodes from this pool a member of dynamic group
e. create policies that give the dynamic group in (c) the ability to perform autoscaling actions
when creating worker pools, you then need to let the CA know which worker pools you want to be autoscaled


e.g. you may not want all the pools in your cluster to be autoscaling, especially if their shapes are on the high end and cost more. This is important because OKE allows you to run worker pools for mixed performance


so the autoscale parameter in each worker pool allows you to tell the autoscaler whether it has to manage this pool. By default, it's false I think and you have to explicitly enable it

@robo-cap

@rodrigc rodrigc added the enhancement New feature or request label Aug 30, 2023
@rodrigc
Copy link
Contributor Author

rodrigc commented Aug 30, 2023

@robo-cap in the meeting we had today, I mentioned I came up with this query:

kubectl get nodes --output=custom-columns='NAME:.metadata.name,POOL:.metadata.labels.oke\.oraclecloud\.com/pool\.name,CLUSTER_AUTOSCALER:.metadata.labels.oke\.oraclecloud\.com/cluster_autoscaler'
NAME           POOL           CLUSTER_AUTOSCALER
node01         pool1         managed
node02        pool1          managed
node03        pool2        managed
node04        pool1         managed
node05        pool5         managed
node06        pool3           managed
node07        pool2         managed
node08       pool4         managed
node09        pool1          managed
node10   pool5         managed
node11   pool6          allowed
node12   pool1          managed
node13   pool1          managed
node14   pool4        managed
node15    pool3         managed
node16   pool5         managed
node17   pool5        managed
node18   pool4        managed
node19   pool2         managed
node20   pool6         allowed
node21   pool5        managed

As part of clarifying the docs at https://oracle-terraform-modules.github.io/terraform-oci-oke/guide/workers_scaling.html
it would be good to improve the docs which explain what the values managed, allowed, disabled for the label oke.oraclecloud.com/cluster_autoscaler actually do.

It wasn't clear to me when reading the docs. Ali clarified in private conversations, but it would be good to have this in the docs.

@rodrigc
Copy link
Contributor Author

rodrigc commented Aug 31, 2023

In meeting I had on August 30, @robo-cap suggested that the improvements to the autoscaler docs for this module
might benefit from having a link to the OCI autoscaler at:
https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/oci#cluster-autoscaler-for-oracle-cloud-infrastructure-oci

I think this is a good idea, since it for users of this Terraform module, it will help those who need to set up or debug autoscaler issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant