Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent leaking EIP when creating machines with BYO IPv4 Pool #5038

Open
mtulio opened this issue Jun 27, 2024 · 2 comments · May be fixed by #5039
Open

Prevent leaking EIP when creating machines with BYO IPv4 Pool #5038

mtulio opened this issue Jun 27, 2024 · 2 comments · May be fixed by #5039
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@mtulio
Copy link
Contributor

mtulio commented Jun 27, 2024

/kind bug

What steps did you take and what happened:
[A clear and concise description of what the bug is.]

There are some situations that a machine deployed with BYO IPv4 pool is leaking Elastic IPs:

  • 1/ When the instance is created, the flag (AssociatePublicIpAddress) to create the instance without public IP must be set to false to the primary network interface, otherwise the instance will be created with an Amazon-provided while the BYO IP reconciliation loop doesn't reach the BYO reconciliation, and the custom EIP is allocated and associated to the instance.
  • 2/ The machine reconciliation loop is reaching some race condition or inconsistency from AWS API which is making the controller to create two EIP for each machine created, alternating* between each EIP while the reconciliation finished, leaking one (unused/dissociated) in the instance life cycle. The delete flow is removing both.

*alternating is expected as the algorithm lookup the EIP by role trying to optimize/reuse disassociated.

Furthermore, the following failure is happening as the BYO IP reconciliation loop is trying to associate to an non-running machine:

time="2024-05-08T15:49:33-03:00" level=debug msg="I0508 15:49:33.785472 2878400 recorder.go:104] 
\"Failed to associate Elastic IP for \\\"ec2-i-03de70744825f25c5\\\": InvalidInstanceID: 
The pending instance 'i-03de70744825f25c5' is not in a valid state for this operation.\\n\\tstatus code: 
400, request id: 7582391c-b35e-44b9-8455-e68663d90fed\" logger=\"events\" type=\"Warning\" 
object=[...]\"name\":\"mrb-byoip-32-kbcz9\",\"[...] reason=\"FailedAssociateEIP\""

time="2024-05-08T15:49:33-03:00" level=debug msg="E0508 15:49:33.803742 2878400 controller.go:329] \"Reconciler error\" err=<"

time="2024-05-08T15:49:33-03:00" level=debug msg="\tfailed to reconcile EIP: failed to associate Elastic IP 
\"eipalloc-08faccab2dbb28d4f\" to instance \"i-03de70744825f25c5\": 
InvalidInstanceID: The pending instance 'i-03de70744825f25c5' is not in a valid state for this operation."

What did you expect to happen:

  • Machine is created successfully allocating a single EIP when using BYO IPv4 Pool
  • Machine reconciliation loop must wait the instance to leaving the pending state before trying to associate EIP, preventing error messages from expected behaviors in the logs

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api-provider-aws version: v2.5.2
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release): RHCOS
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 27, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mtulio
Copy link
Contributor Author

mtulio commented Jun 27, 2024

cc @r4f4

@mtulio mtulio linked a pull request Jun 27, 2024 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants