Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NR-250703: cluster-autoscaler improvements for GCP #36

Open
wants to merge 2,581 commits into
base: master
Choose a base branch
from

Conversation

sachin-shankar
Copy link

Add support for label(tags in Azure) based auto-discovery of GCP Managed Instance Groups. The discovery includes figuring out the min and max sizes for the mig pool.

Add unit-tests for all the new code added.

Add GCP auto-discovery documentation .

Auto-Discovery Setup

To run a cluster-autoscaler which auto-discovers instance groups, use the --node-group-auto-discovery flag. There are 2 auto-discovery options to choose from.

NOTE - Only one of the 2 options can be used when configuring the --node-group-auto-discovery flag for cluster-autoscaler.

Auto-Discovery by Labels

For example, --node-group-auto-discovery=label:cluster-autoscaler-enabled=true,cluster-autoscaler-name=<YOUR CLUSTER NAME> will find all the instance groups with instance templates that are tagged with those labels containing those values.


NOTE

  • It is recommended to use a second tag like cluster-autoscaler-name=<YOUR CLUSTER NAME> when cluster-autoscaler-enabled=true is used across many clusters to prevent Instance Groups from different clusters recognized as the node groups
  • There are no --nodes flags passed to cluster-autoscaler because the node groups are automatically discovered by tags
  • No min/max values are provided when using this option. cluster-autoscaler will detect the "min" and "max" labels on the Instane Group resource in GCP, adjusting the desired number of nodes within these limits.
  • If there are no min/max labels on the Instance Group resource, cluster-autoscaler will use the default min/max values of 0 and 1000 respectively.

Auto-Discovery by NamePrefix

For example, --node-group-auto-discovery=mig:namePrefix=test-lemon-peel-mp,min=2,max=10 will internally use a Regular Expression to find all the instance groups whose name begins with test-lemon-peel-mp and set the minimum and maximum number of nodes to 2 and 10 respectively.


NOTE

  • Min and Max key/value pairs where max > min must be specified when using this option and will not use any defaults.
  • To add more than one instance groups that do not share the same name prefix, use the --node-group-auto-discovery flag multiple times. Ex:
--node-group-auto-discovery=mig:namePrefix=test-lemon-peel-mp,min=2,max=10
--node-group-auto-discovery=mig:namePrefix=confab-nodes,min=2,max=10
  • Clearly, the name-prefixes must be statically configured before the initialization of the cluster-autoscaler container which makes this option less flexible.

BigDarkClown and others added 30 commits January 11, 2024 16:45
…wn-after-add-per-ng-poc

feat: support `--scale-down-delay-after-*` per nodegroup
Rancher: Fix error messages and expose underlying error.
Existing bucketing is inconsistent. Specifically, the second to last
bucket is [100, 1000), which is huge and doesn't allow to differentiate
between something that took 2m (120s) and something that took 15m (900s).
Use exponential buckets for function_duration_seconds
fix(kwok): prevent quitting when scaling down node group
…n_ds_v2

Allow draining when DaemonSet kind has custom API Group
…ealthy_metrics

feat:add node group health and back off metrics
mewa and others added 30 commits March 26, 2024 16:26
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
CA: Before we perform go test, synchronizing go modules
The grouping should be made by the schedulability equivalence
meaning we can introduce optimizations to the binpacking.

Introduce a benchmark that estimates capacity needed for 51k pods,
which can be grouped to two equivalence groups 50k and 1k.
Add a link to the sample manifest and update the image used in the
example.

Signed-off-by: Lennart Jern <lennart.jern@est.tech>
Bumps golang from 1.22.1 to 1.22.2.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps golang from 1.22.1 to 1.22.2.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps golang from 1.22.1 to 1.22.2.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
…vertical-pod-autoscaler/pkg/recommender/golang-1.22.2

Bump golang from 1.22.1 to 1.22.2 in /vertical-pod-autoscaler/pkg/recommender
…vertical-pod-autoscaler/pkg/updater/golang-1.22.2

Bump golang from 1.22.1 to 1.22.2 in /vertical-pod-autoscaler/pkg/updater
The optimization uses the fact that pods which are equivalent do not
need to be check multiple times against already filled nodes.
This changes the time complexity from O(pods*nodes) to O(pods).
…policy-example

docs: precise AWS IAM policy example
Fix broken link in README.md to point to equinixmetal readme
Include helm chart version in cluster-autoscaler version matrix
…vertical-pod-autoscaler/pkg/admission-controller/golang-1.22.2

Bump golang from 1.22.1 to 1.22.2 in /vertical-pod-autoscaler/pkg/admission-controller
Add support for label(tags in Azure) based auto-discovery of GCP Managed Instance Groups. The discovery includes figuring out the min and max sizes for the mig pool.

Add unit-tests for all the new code added.

Add GCP auto-discovery documentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.