-
Notifications
You must be signed in to change notification settings - Fork 741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce Startup Latency by moving the init container to a regular container #2137
Conversation
Need to run https://github.com/aws/amazon-vpc-cni-k8s/blob/master/Makefile#L342 and copy the new manifests here - https://github.com/aws/amazon-vpc-cni-k8s/tree/master/config/master. Self managed addons upgrades/downgrades will be reapplying the manifests but we need to confirm on managed addons upgrade/downgrade. |
Did we validate the IPv6 test suite? If not, can we do that as well.. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes overall look valid to me, and IPv6 integration tests passing gives more confidence. Can you confirm upgrade/downgrade is successful? That's my only other concern.
@bwagner5 can you update this branch? I can approve after that. As for coverage here, it sounds like you have run all of the ginkgo suites, and the integration suite will be run on merge, so if there is any issues with that, we can revert. |
@jdn5126 - We need to verify MAO upgrades/downgrades before merging. Also we need to get the manifest related changes reviewed by MAO team. |
I have tested upgrade and downgrade:
|
config/master/aws-k8s-cni.yaml
Outdated
securityContext: | ||
containers: | ||
- name: aws-vpc-cni-init | ||
image: "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni-init:v1.12.0-11-g69122330" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jayanthvn what should the tag be set to when making a change like this? If we keep it at v1.12.0
, then the manifest and image are incompatible, but I doubt we want to release a new tag. Do we guarantee that anyone cloning this should be able to run kubectl apply -f config/master/aws-k8s-cni.yaml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we will have build a RC image and use that tag here since both manifest and the entry point is changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating! After addressing these changes, can you also squash and rebase so that this is one commit? Since this is an important change, I think we want a clean and easy to read history here
We have to hold off on this change until we can resolve issues with the MAO upgrade path 😞 |
This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days |
Shhhh bot, not stale |
This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days |
No, please Mr. GitHub-bot. Still not stale 😬 |
This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days |
Would love to see this merged. Looks like it needs a rebase tho |
This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days |
/not stale |
This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days |
Pull request closed due to inactivity. |
What type of PR is this?
Which issue does this PR fix:
awslabs/amazon-eks-ami#1099
What does this PR do / Why do we need it:
If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:
Testing done on this change:
I installed the changes into my cluster (1.22) via helm and ran tests on the current production version (v1.12.0) and my change. Below are representative samples across 10 runs.
The "VPC CNI Plugin Initialized" event is when the CNI config file is moved to the
/etc/cni/net.d
dir, signaling to the container runtime that networking is ready.Previous aws-node startup timing (9 secs):
After this change startup timing (4 secs):
Automation added to e2e:
Will this PR introduce any new dependencies?:
No
Will this break upgrades or downgrades. Has updating a running cluster been tested?:
I have not tested updating a running cluster, but the changes are only on improving the startup sequence.
Does this change require updates to the CNI daemonset config files to work?:
Yes, this change requires moving the previous init-container under the regular containers list in the DaemonSet and involves adding an emptyDir volume and mapping in both containers.
Does this PR introduce any user-facing change?:
No
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.