Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-44925: aws: add ec2:AllocateAddress perm requirement. #9234

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

r4f4
Copy link
Contributor

@r4f4 r4f4 commented Nov 22, 2024

It's needed by CAPA when Ipv4Pools are supplied.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 22, 2024
@openshift-ci-robot
Copy link
Contributor

@r4f4: This pull request references Jira Issue OCPBUGS-44925, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.18.0) matches configured target version for branch (4.18.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @gpei

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

It's needed by CAPA when Ipv4Pools are supplied.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Nov 22, 2024
Copy link
Contributor

openshift-ci bot commented Nov 22, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from r4f4. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@r4f4
Copy link
Contributor Author

r4f4 commented Nov 22, 2024

/label platform/aws

@r4f4
Copy link
Contributor Author

r4f4 commented Nov 23, 2024

/hold

We are still missing another permission: ec2:AssociateAddress

time="2024-11-22T22:27:02Z" level=debug msg="E1122 22:27:02.785017     333 awsmachine_controller.go:543] \"Failed to reconcile BYO Public IPv4\" err=<"
time="2024-11-22T22:27:02Z" level=debug msg="\tfailed to reconcile Elastic IP: failed to associate Elastic IP \"eipalloc-0d9343e1bba507e66\" to instance \"i-0c664692a59d18dc5\": UnauthorizedOperation: You are not authorized to perform this operation. User: arn:aws:iam::460538899914:user/ci-op-ty3fb51s-e9af7-minimal-perm is not authorized to perform: ec2:AssociateAddress on resource: arn:aws:ec2:us-east-1:460538899914:elastic-ip/eipalloc-0d9343e1bba507e66 because no identity-based policy allows the ec2:AssociateAddress action. Encoded authorization failure message: jAhD8NZ_EO7bmq89o1YFJPdSyOJL0KSBMQh9r8DDvmX1GOASHHe1scQITr_dIA5P2rKWAPT-a54UTIind4Pqh4z4x-vXRLRk-k0Vq4u61G2CalS22C-Vw_oQhmiITgr9llWVtLP0SwsKYMT0uWxOlvlfqmwZ8BNw3bcgzP8W2N8wZnwB6pDW5BoPg7Zx-OgPd3rth36YPMawV8RW1B-LUY4aVsfWUmZfwfQXChsDesd39LClcPExlFh__cV8hwF4TYHJDruc6vqtwSdFhTyCq3ibWNAlutg-3ptOEM7zRx33USs4uTqLxdYLj4n-AaPdtj-ishlFEh0aZiyl6QmBvaecUTq4v2hUwyAssKdlwZIpjv7zoRYBw59qrBiksPkTQDOP-3cnLxIix6ZwX0nkDwCR3qG5ZwppzRAPpMYgOU03Uo9r3RMbB_pr9h0b6amdBBOilkYmnHIAk8_vWBvhBoBXblPc4LgbUv-ZB62g0oKM0GqwNJPp8JOaFMMSrL82cf2hxZ_a1Bv4sf2WwIoE7HY23Su7dE_KE8jwmhchRMPmb4nRVlyED-Vb39Tn14CZeWt4WFYZb2F6XBRXixuqCvcC-vxf2StrnUvlfczQA_bw1GqV8_0_6kvxAQvxOU7zCId4lQ3-cpCcfGh5Qeh3UwX5D1dDzeKCpqXbCnjT5mhn35Ani7CK7XpGTOWzK5VZu7unuau_n2L5292OQu2xbNPwgJTYpf_7nFwPRYVjE6RM_ZCU65TAJ_umlRpKbERYoahrBEpcJCVB2Z3WSzaaMfHvUYvOY8fKv6SglOCJjphoyLn70jkfZjLY5FuvxBwrlUfCqbPMFWn514b1ZY01o--5-v77NAQtIHmLaAvLv_pU2wKKa9g9qsjxnCPUMLIFxmgCUyKDT7nbBK9PUPipa4I8EEzWnw"
time="2024-11-22T22:27:02Z" level=debug msg="\t\tstatus code: 403, request id: b2c73727-f7fb-4ffd-b812-9484cae2ac11"
time="2024-11-22T22:27:02Z" level=debug msg=" >"

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 23, 2024
`ec2:AllocateAddress` and `ec2:AssociateAddress` are needed by CAPA when
Ipv4Pools are supplied to allocate a new IP from the pool and associate
that IP with an instance.
Copy link
Contributor

openshift-ci bot commented Nov 23, 2024

@r4f4: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-shared-vpc-edge-zones 85617f6 link false /test e2e-aws-ovn-shared-vpc-edge-zones
ci/prow/e2e-aws-ovn-edge-zones 85617f6 link false /test e2e-aws-ovn-edge-zones
ci/prow/okd-scos-e2e-aws-ovn 85617f6 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-external-aws-ccm 85617f6 link false /test e2e-external-aws-ccm

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@r4f4
Copy link
Contributor Author

r4f4 commented Nov 23, 2024

@mtulio any idea why this is happening?

@mtulio
Copy link
Contributor

mtulio commented Nov 25, 2024

@mtulio any idea why this is happening?

@r4f4 there is a problem in the machine manifest as the type added to the machineset manifest, m6i.xlarge, is not supported in the zone:

$ aws ec2 describe-instance-type-offerings --location-type availability-zone \
--filters Name=location,Values=us-west-2-wl1-sfo-wlz-1 \
 --region us-west-2 --query 'InstanceTypeOfferings[].InstanceType'
[
    "t3.xlarge",
    "g4dn.2xlarge",
    "t3.medium",
    "r5.2xlarge"
]

This is happening because is missing the permission ec2:DescribeInstanceTypeOfferings:

level=warning msg=unable to select instanceType on the zone[us-west-2-lax-1b] from the preferred \
list: [m6i.xlarge m5.xlarge r5.xlarge c5.2xlarge m5.2xlarge c5d.2xlarge r5.2xlarge]. \
You must update the MachineSet manifest: UnauthorizedOperation: You are not authorized to perform this operation. \
User: arn:aws:iam::460538899914:user/ci-op-nrkwfijt-e9af7-minimal-perm is not authorized to perform: \
ec2:DescribeInstanceTypeOfferings because no identity-based policy allows the \
ec2:DescribeInstanceTypeOfferings action

@r4f4
Copy link
Contributor Author

r4f4 commented Nov 25, 2024

@mtulio that should've been added by #9114
edit: is that permission always needed when specifying edge machine pools? If so we should add it to the edge permission group in #9230

@mtulio
Copy link
Contributor

mtulio commented Nov 25, 2024

@mtulio that should've been added by #9114 edit: is that permission always needed when specifying edge machine pools? If so we should add it to the edge permission group in #9230

@r4f4 ec2:DescribeInstanceTypeOfferings permissions is a default behavior when no instance is added to the (any) machine pool (CP, worker, or edge), it discovers what is the "best" supported instance to be used in the pool based in the target region (for general pools), and zone (for edge zones), using filters of that API. Not an edge-specific feature.

@r4f4
Copy link
Contributor Author

r4f4 commented Nov 25, 2024

@mtulio that should've been added by #9114 edit: is that permission always needed when specifying edge machine pools? If so we should add it to the edge permission group in #9230

@r4f4 ec2:DescribeInstanceTypeOfferings permissions is a default behavior when no instance is added to the (any) machine pool (CP, worker, or edge), it discovers what is the "best" supported instance to be used in the pool based in the target region (for general pools), and zone (for edge zones), using filters of that API. Not an edge-specific feature.

@mtulio that perm is not required in the non-edge case and we just display a warning that we could not find a preferred instance type. If the edge node cannot work with the default instance type, there should be a better default or further validation.

@mtulio
Copy link
Contributor

mtulio commented Nov 25, 2024

@mtulio that perm is not required in the non-edge case and we just display a warning that we could not find a preferred instance type.

@r4f4 I am interpreting this warning (which, imo, might be interpreted as failed in certain situations like CP or worker nodes' pool to prevent later failure) as required permission for control plane and worker pools. The installer will always call getInstanceTypeZoneInfo() when no instance type is set in the pool (master, worker), as this is the default path for IPI, right? Am I missing some bit? do we have an CI test with this scenario (default install, without setting custom instances)?

@r4f4
Copy link
Contributor Author

r4f4 commented Nov 25, 2024

@mtulio that perm is not required in the non-edge case and we just display a warning that we could not find a preferred instance type.

@r4f4 I am interpreting this warning (which, imo, might be interpreted as failed in certain situations like CP or worker nodes' pool to prevent later failure) as required permission for control plane and worker pools. The installer will always call getInstanceTypeZoneInfo() when no instance type is set in the pool (master, worker), as this is the default path for IPI, right? Am I missing some bit? do we have an CI test with this scenario (default install, without setting custom instances)?

It's not required, it's optional. If this call fails, we proceed with the hardcoded default instance types in the installer master, worker

@r4f4
Copy link
Contributor Author

r4f4 commented Nov 25, 2024

do we have an CI test with this scenario (default install, without setting custom instances)?

AFAIK we do not as the way in which the steps are written we always set an instance type in the install-config.yaml

@mtulio
Copy link
Contributor

mtulio commented Nov 25, 2024

@mtulio that perm is not required in the non-edge case and we just display a warning that we could not find a preferred instance type.

@r4f4 I am interpreting this warning (which, imo, might be interpreted as failed in certain situations like CP or worker nodes' pool to prevent later failure) as required permission for control plane and worker pools. The installer will always call getInstanceTypeZoneInfo() when no instance type is set in the pool (master, worker), as this is the default path for IPI, right? Am I missing some bit? do we have an CI test with this scenario (default install, without setting custom instances)?

It's not required, it's optional. If this call fails, we proceed with the hardcoded default instance types in the installer master, worker

my interpretation of this is required as, afaik, we don't expect the default path to fail :)

Furthermore, this function has been introduced long time ago, even before edge zones, to get the best instance in mostly regions, still covering regions that takes time to rolls up new gen of instances by AWS. For example, m6i.xlarge took some time to be available in eu-west-2 - where it supported only 5th Generation. Should the mostly users be penalty by getting more expensive, and slower instance types of mostly regions when some regions does not support it?

@r4f4
Copy link
Contributor Author

r4f4 commented Nov 25, 2024

@mtulio that perm is not required in the non-edge case and we just display a warning that we could not find a preferred instance type.

@r4f4 I am interpreting this warning (which, imo, might be interpreted as failed in certain situations like CP or worker nodes' pool to prevent later failure) as required permission for control plane and worker pools. The installer will always call getInstanceTypeZoneInfo() when no instance type is set in the pool (master, worker), as this is the default path for IPI, right? Am I missing some bit? do we have an CI test with this scenario (default install, without setting custom instances)?

It's not required, it's optional. If this call fails, we proceed with the hardcoded default instance types in the installer master, worker

my interpretation of this is required as, afaik, we don't expect the default path to fail :)

If we want it to be required, we have to remove the warning and actually fail the install. But that's not the case today and the warning was a design choice to make the permission optional.

@r4f4
Copy link
Contributor Author

r4f4 commented Nov 26, 2024

@mtulio I suggested making the perm required for the edge case because an edge mpool misconfiguration doesn't cause the install to fail. We only noticed the issue due to the node-readiness check step added by the multi-arch team for the day0/day2 multi-arch nodes. Alternatively, because it doesn't fail installs, we can keep the perm optional and cluster admins can edit/fix the edge mpools after the fact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. platform/aws
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants