Improve Node Startup Latency #1099

bwagner5 · 2022-11-11T17:17:59Z

Dynamically provisioning nodes quickly is important to ensure workloads can scale out quickly due to demand or recover in the event of infrastructure instability. This issue will track node startup latency improvement work on the EKS Optimized AL2 AMI.

Current work:

Cache Common Startup Container Images #938

There are several container images that are commonly needed to bootstrap a node (get it to a Ready state for pods):

pause
aws-node (VPC CNI) images (init and aws-node)
kube-proxy images (minimal and normal)

Disable Startup Yum Update Check #1074

There is currently a yum update run that blocks executing the eks-bootstrap script. This update check generally results in 0 updates if you are updating AMIs frequently, but the check takes 5-8 seconds because it hydrates the yum cache. This check also causes version skew across a cluster, where the same AMI ID may run different software versions depending on when it was launched which could cause problems with rollback and node churn.

Remove unnecessary Sleeps in the VPC CNI Initialization aws/amazon-vpc-cni-k8s#2104 (released in v1.12.0)

The VPC CNI had some unnecessary sleeps that resulted in 2-3 seconds of latency starting up. VPC CNI is required to be fully initialized before pods can be created on a node, so the initialization process should be as fast as possible.

Remove init-container from VPC CNI aws/amazon-vpc-cni-k8s#2137

The VPC CNI uses an init container to initialize some kernel settings related to networking that must be run as Privileged. The sequencing of the init container was resulting in some latency on startup. Generally, the VPC CNI would take 9-10 secs to full initialize. The PR above removed the init container and runs it as a regular container in the pod. This allows some parallelization of the initialization and removes the kubelet sequencing latency. The PR results in the VPC CNI's full initialization time to 4 sec. Half of the remaining latency is the container pulls which is solved in the caching PR above (#938). Integrating these two PRs together results in a 2 sec full initialization of the VPC CNI.

Add CLUSTER_ENDPOINT parameter to the VPC CNI to avoid kube-proxy race aws/amazon-vpc-cni-k8s#2138

With all the optimizations listed above, a new concern introduced is race between the VPC CNI and kube-proxy. The VPC CNI uses the kubernetes service cluster IP to reach kube-apiserver. This wasn't a concern before since the increased latency of the VPC CNI resulted in kube-proxy almost always winning the race to initialize. After the optimizations, the VPC CNI loses about half of the time and then hangs due to a 5 second timeout on reaching the kube-apiserver. The whole race can be avoided by passing the CLUSTER_ENDPOINT (the kube-apiserver load balancer endpoint) to the VPC CNI to use for initialization. The VPC CNI still needs to wait on kube-proxy to finish before completing the CNI plugin initialization, but the work to get to that point can be parallelized.

The text was updated successfully, but these errors were encountered:

stevehipwell · 2022-11-11T18:36:55Z

💯 for removing the yum update! If you can't completely remove it could it at least be configurable?

bwagner5 · 2022-11-22T15:23:56Z

Here's a timing chart that is pretty steady from a K8s 1.24 cluster with all of the enhancements mentioned above:

c6a.4xlarge - i-02ca66ee7f998e438

Event	Time (seconds)
Instance Request	0
Instance Created 15:18:20	1
VM Initialized 15:18:30	11
Cloud-Init Initial Starts 15:18:35	16
Network Target Start 15:18:35	16
Cloud-Init Config Starts 15:18:36	17
Cloud-Init Config z-Exits 15:18:36	17
Cloud-Init Final Starts 15:18:36	17
Cloud-Init Initial z-Exits 15:18:36	17
Network Target Ready 15:18:36	17
ContainerD Starts 15:18:37	18
Cloud-Init Final (user-data) z-Exits 15:18:39	20
Kubelet Starts 15:18:39	20
Kubelet Node Registration 15:18:40	21
AWS Node Container Starts 15:18:41	22
kube-proxy Started 15:18:41	22
VPC CNI Init Container Starts 15:18:41	22
VPC CNI Plugin Initialized 15:18:44	25
Node Ready 15:18:50	31
Pod Starts 15:18:50	31

john-zielke-snkeos · 2022-12-14T12:10:34Z

How did you measure these startup times? And what were the startup times before implementing the changes? I am using the terraform karpenter example to start Nvidia GPU nodes and am currently seeing startup times of 2-3 minutes for a node to be ready using the default eks-AMI.

bwagner5 · 2022-12-15T20:03:58Z

How did you measure these startup times? And what were the startup times before implementing the changes? I am using the terraform karpenter example to start Nvidia GPU nodes and am currently seeing startup times of 2-3 minutes for a node to be ready using the default eks-AMI.

The measurements were taken using some custom tooling that we'll be open sourcing very soon. You'll be able to run the tooling as a daemonset to capture timing metrics for your nodes in a standardized format like shown above.

GPU instance types often take much longer to boot within EC2. I have not focused on exotic instance types like baremetal and gpus. The timings above are for c6a.4xlarge, but I have tested on m5 and c6i with similar results. Notably, i instance types and instance types with d (disk) capabilities will take a little longer to boot as well.

I will update this issue once the timing tooling is available.

Here are timings from before the optimizations:

c5.xlarge - i-08f01b534e902dbf7

Event	Time (seconds)
Instance Request	0
Instance Created 16:18:45	1
VM Initialized 16:18:56	12
Cloud-Init Config Starts 16:19:05	21
Cloud-Init Initial Starts 16:19:05	21
Cloud-Init Initial z-Exits 16:19:05	21
Network Target Ready 16:19:05	21
Network Target Start 16:19:05	21
Cloud-Init Config z-Exits 16:19:13	29
Cloud-Init Final Starts 16:19:13	29
ContainerD Starts 16:19:14	30
Cloud-Init Final (user-data) z-Exits 16:19:18	34
Kubelet Starts 16:19:18	34
Kubelet Node Registration 16:19:23	39
kube-proxy Started 16:19:26	42
VPC CNI Init Container Starts 16:19:28	44
AWS Node Container Starts 16:19:35	51
VPC CNI Plugin Initialized 16:19:39	55
Node Ready 16:19:44	60
Pod Starts 16:19:48	64

bwagner5 · 2022-12-15T20:32:46Z

Here's a gif of startups taking around 25 seconds by cherry-picking a v1.26 kubelet change to v1.24 (along with all the optimizations mentioned above and auto-scaled with Karpenter).

stevehipwell · 2022-12-16T14:24:51Z

@bwagner5 how does this compare to Bottlerocket? Also are we likely to see this change backported for EKS?

FernandoMiguel · 2022-12-16T14:39:44Z

bit +1 on getting all these improvements to BR too

bwagner5 · 2022-12-16T17:21:45Z

I still need to do testing on BR, but at least some will carry-over like the VPC CNI improvements. Some don't apply like the yum updates. I'll have to see if we could include container caching in BR. I'll get back to you on back porting to EKS.

kakarotbyte · 2022-12-21T14:21:40Z

@bwagner5
Other than the changed mentioned above do I need to take can network considerations (like using VPC endpoint) and bootstrap (hadrcoding API url, CA, Kube-dns ip ) considerations to achieve above 31 second results ?

bwagner5 · 2022-12-27T20:49:11Z

I tested it with Karpenter which will automatically hardcode the API URL, CA Bundle, and kube-dns IP as params to the eks bootstrap.sh script. The EKS DescribeCluster call that occurs in the bootstrap.sh script shouldn't take much time though, so I suspect you can get similar results without hardcoding the params, at least for single node launches. You may run into rate limiting on the API call though when doing large node scale outs.

kakarotbyte · 2022-12-27T21:06:17Z

Understood, wont it help with speeding up downloading aws-node and kube-proxy images if ECR,S3 VPC endpoints are enabled.

bwagner5 · 2022-12-29T23:45:45Z

@kakarotbyte You may see some improvement using the endpoints but I wouldn't expect it to be significant (haven't tested that though). However, in my tests I used the AMI's cached images of aws-node and kube-proxy though.

armenr · 2022-12-30T08:05:02Z

@kakarotbyte - I've been lucky enough to have the joy of exchanging ideas and observations with @bwagner5 .

I can tell you - from my extensive testing - that using those VPC endpoints terminated inside the VPC doesn't give you an appreciable improvement in provisioning speed.

But using a super-primitive and minimal HTTP client instead of curl/wget or the imds bash script to fetch and parse your instance metadata from IMDS endpoints can reduce startup latency time by at least 4 seconds (sometimes 6-8 seconds)...in cases where you can't hardcode the input values, and your bootstrap.sh script for EKS needs to go ask IMDS questions.

Using a custom-rolled AMI, built on a custom OS, with a custom kernel (we have really specific use-cases and latency requirements at the company I'm with)...+ using @bwagner5 's improvements, I have done some tests.

IF:

use my custom AMI + kernel (system is booted and ready in 2.1 seconds total)
use all of @bwagner5 's improvements
hard-code every possible value that the eks bootstrap.sh script is looking for
use a lightweight HTTP client instead of curl/wget/the imds script
cache all of the necessary docker images during AMI baking
use Karpenter instead of Cluster Autoscaler

EXPERIMENT:

Schedule a pod into the cluster with resource requests/limits that require Karpenter to provision a new node
Start running the timer from when you kubectl apply the "un-schedulable" Pod gets scheduled

RESULT on m6i.large

| New Node Ready | 26 seconds |
| New Pod Starts | 32 seconds|

bwagner5 · 2023-01-04T20:10:01Z

FYI I've open sourced the node latency timing tool that I've been using to create the timing charts and emit metrics for my testing here: https://github.com/awslabs/node-latency-for-k8s

Would love feedback on how / if this works well on other OS distributions (I've only been using the eks-optimized AL2).

armenr · 2023-04-13T08:30:03Z

@bwagner5 - With whatever changes have already been implemented and merged/released, I'm seeing consistent "Node Ready" state at (or below) an average of ~26(ish) seconds on vanilla EKS AL2 nodes. I don't even need my own custom AMI anymore, to be honest. 👏👏

bwagner5 self-assigned this Nov 11, 2022

bwagner5 added the enhancement New feature or request label Nov 11, 2022

bwagner5 mentioned this issue Nov 22, 2022

Can nodes provision faster than 2-4 minutes? Also, can Karpenter provision additional nodes based on node resource metrics? aws/karpenter-provider-aws#2906

Closed

cartermckinnon mentioned this issue Jan 4, 2023

Change in launch_block_device_mappings_volume_size from 4 to 8 GiB #1142

Closed

sidewinder12s mentioned this issue Jan 14, 2023

Node take 100s to start aws/karpenter-provider-aws#3102

Closed

cartermckinnon closed this as completed May 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Node Startup Latency #1099

Improve Node Startup Latency #1099

bwagner5 commented Nov 11, 2022

stevehipwell commented Nov 11, 2022

bwagner5 commented Nov 22, 2022 •

edited

Loading

john-zielke-snkeos commented Dec 14, 2022

bwagner5 commented Dec 15, 2022

bwagner5 commented Dec 15, 2022 •

edited

Loading

stevehipwell commented Dec 16, 2022

FernandoMiguel commented Dec 16, 2022

bwagner5 commented Dec 16, 2022

kakarotbyte commented Dec 21, 2022

bwagner5 commented Dec 27, 2022

kakarotbyte commented Dec 27, 2022

bwagner5 commented Dec 29, 2022

armenr commented Dec 30, 2022 •

edited

Loading

bwagner5 commented Jan 4, 2023

armenr commented Apr 13, 2023 •

edited

Loading

Improve Node Startup Latency #1099

Improve Node Startup Latency #1099

Comments

bwagner5 commented Nov 11, 2022

Current work:

stevehipwell commented Nov 11, 2022

bwagner5 commented Nov 22, 2022 • edited Loading

c6a.4xlarge - i-02ca66ee7f998e438

john-zielke-snkeos commented Dec 14, 2022

bwagner5 commented Dec 15, 2022

bwagner5 commented Dec 15, 2022 • edited Loading

stevehipwell commented Dec 16, 2022

FernandoMiguel commented Dec 16, 2022

bwagner5 commented Dec 16, 2022

kakarotbyte commented Dec 21, 2022

bwagner5 commented Dec 27, 2022

kakarotbyte commented Dec 27, 2022

bwagner5 commented Dec 29, 2022

armenr commented Dec 30, 2022 • edited Loading

bwagner5 commented Jan 4, 2023

armenr commented Apr 13, 2023 • edited Loading

bwagner5 commented Nov 22, 2022 •

edited

Loading

bwagner5 commented Dec 15, 2022 •

edited

Loading

armenr commented Dec 30, 2022 •

edited

Loading

armenr commented Apr 13, 2023 •

edited

Loading