-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
馃悰: ec2/byoip: fix EIP leak when creating machine #5039
base: main
Are you sure you want to change the base?
Conversation
Hi @mtulio. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/ok-to-test |
DeviceIndex: aws.Int64(0), | ||
SubnetId: aws.String(i.SubnetID), | ||
Groups: aws.StringSlice(i.SecurityGroupIDs), | ||
AssociatePublicIpAddress: i.PublicIPOnLaunch, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see you already set PublicIPOnLaunch
to false
if EIPPool is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, preserving the value from machine spec, and forcing to false when EIPPool
cluster-api-provider-aws/pkg/cloud/services/ec2/instances.go
Lines 185 to 193 in 1313226
// Preserve user-defined PublicIp option. | |
input.PublicIPOnLaunch = scope.AWSMachine.Spec.PublicIP | |
// Public address from BYO Public IPv4 Pools need to be associated after launch (main machine | |
// reconciliate loop) preventing duplicated public IP. The map on launch is explicitly | |
// disabled in instances with PublicIP defined to true. | |
if scope.AWSMachine.Spec.ElasticIPPool != nil && scope.AWSMachine.Spec.ElasticIPPool.PublicIpv4Pool != nil { | |
input.PublicIPOnLaunch = ptr.To(false) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main problem in this point is leaving the flag unset (original code), the AssociatePublicIpAddress
seems to assume True in some point making the EIP being created before BYO IP reconciliation loop is reached.
I may have some scenarios triggered by those tests to review:
--- FAIL: TestCreateInstance (0.01s)
--- FAIL: TestCreateInstance/when_multiple_subnets_match_filters,_subnets_in_the_cluster_vpc_are_preferred (0.00s)
--- FAIL: TestCreateInstance/with_a_subnet_outside_the_cluster_vpc (0.00s)
--- FAIL: TestCreateInstance/with_dedicated_tenancy_cloud-config (0.00s)
--- FAIL: TestCreateInstance/with_custom_placement_group_cloud-config (0.00s)
--- FAIL: TestCreateInstance/with_dedicated_tenancy_and_placement_group_ignition (0.00s)
--- FAIL: TestCreateInstance/with_custom_placement_group_and_partition_number (0.00s)
Need to review those tests if it is expected or also a bug in the expected return values from API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes in pkg/cloud/services/ec2/instances_test.go
fixes the RunInstanceWithContext api call to adapt to this change, it seems not impact in the use case described in the tests as it will result in the same.
b58e3e0
to
ace1bee
Compare
The instance creation flow is creating by default EIP to instances even if the BYO IP flow is set. BYO IPv4 creates and associates the EIP to instance after it is created, preventing paying for additional EIP (amazon-provided) when creating the instance when the BYO IPv4 Pool is defined to be used by the machine. Furthermore, the fix provides additional checks to prevent duplicated EIP in the BYO IP reconciliation loop. The extra checks include running the EIP association many times, while the EIP is already associated, and failures in the log when running the EIP association prematurely - when the instance isn't ready, Eg ec2 in pending state.
ace1bee
to
d5882fa
Compare
/test ? |
@mtulio: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test pull-cluster-api-provider-aws-e2e |
Premature failure. /test pull-cluster-api-provider-aws-e2e |
/test pull-cluster-api-provider-aws-e2e |
Okay, previous test failures were flake. The latest run pass. OpenShift e2e BYOIP test is also passing install:
This PR is ready for review. PTAL? /test pull-cluster-api-provider-aws-e2e-eks |
@@ -541,7 +541,7 @@ func (r *AWSMachineReconciler) reconcileNormal(_ context.Context, machineScope * | |||
// a BYOIP without duplication. | |||
if pool := machineScope.GetElasticIPPool(); pool != nil { | |||
if err := ec2svc.ReconcileElasticIPFromPublicPool(pool, instance); err != nil { | |||
machineScope.Error(err, "failed to associate elastic IP address") | |||
machineScope.Error(err, "failed to reconcile BYO Public IPv4") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about what I did with registering the instance with the LB: https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/5040/files#diff-0b559bbd149f0e6d54d789235423f66fa8906fdc6ee9c99b9e85db912912011eR622-R629. Just returning an error because the instance is not yet running might confuse users looking at the logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@r4f4 that's a great idea, specially now it is generating more warnings with this approach, instead of leak and failures. But non error/warn messages for expected states would be much better.
What type of PR is this?
/kind bug
What this PR does / why we need it:
The instance creation flow is creating by default EIP to instances even if the BYO IP flow is set. BYO IPv4 creates and associates the EIP to instance after it is created, preventing paying for additional EIP (amazon-provided) when creating the instance when the BYO IPv4 Pool is defined to be used by the machine.
Furthermore, the fix provides additional checks to prevent duplicated EIP in the BYO IP reconciliation loop. The extra checks include running the EIP association many times, while the EIP is already associated, and failures in the log when running the EIP association prematurely - when the instance isn't ready, Eg ec2 in pending state.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #5038
Special notes for your reviewer:
Checklist:
Release note: