Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter managed non-ASG nodes by tag #669

Merged
merged 14 commits into from
Aug 18, 2022
Merged

Filter managed non-ASG nodes by tag #669

merged 14 commits into from
Aug 18, 2022

Conversation

AustinSiu
Copy link
Contributor

@AustinSiu AustinSiu commented Aug 9, 2022

Issue #, if available: #654

Description of changes:

  • Replaced fields check-asg-tag-before-draining and managed-asg-tag with check-tag-before-draining and managed-tag, updated docs to mark them as deprecated, and removed them from charts
  • Removed ASG api calls to fetch ASG info when determining whether a node should be managed by NTH.
  • Fixed bug where the presence of the managedAsgTag/managedTag on an instance wouldn't be checked correctly before deferring to ASG lookup

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@AustinSiu AustinSiu requested a review from a team as a code owner August 9, 2022 15:30
@AustinSiu AustinSiu changed the title Restore ability to filter managed non-ASG nodes by tag Filter managed non-ASG nodes by tag Aug 9, 2022
Copy link
Contributor

@bwagner5 bwagner5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@snay2 snay2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as you address Brandon's comment (add back the old names for the config values and mark them as deprecated), I am good with you merging this.

I recommend we use a minor version bump when we release (i.e., v1.17.0), since this may potentially break customers who are using ASG without propagating tags to their instances.

pkg/config/config.go Outdated Show resolved Hide resolved
pkg/config/config.go Outdated Show resolved Hide resolved
Copy link
Contributor

@bwagner5 bwagner5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think localstack doesn't like that there's not a Value when specifying the tag.

test/e2e/asg-lifecycle-sqs-test Outdated Show resolved Hide resolved
test/e2e/ec2-state-change-sqs-test Outdated Show resolved Hide resolved
test/e2e/rebalance-recommendation-sqs-test Outdated Show resolved Hide resolved
test/e2e/scheduled-change-event-sqs-test Outdated Show resolved Hide resolved
test/e2e/spot-interruption-sqs-test Outdated Show resolved Hide resolved
@snay2 snay2 mentioned this pull request Aug 15, 2022
snay2
snay2 previously approved these changes Aug 17, 2022
Copy link
Contributor

@snay2 snay2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested these changes on EKS and verified that they work as expected: NTH no longer calls ASG and does not throw errors when ASG permissions are revoken from its IAM role. NTH checks the tag of the node that receives the ITN, even when that node is not associated with an ASG.

Here's how I tested:

  1. Create an EKS cluster with a managed node group
  2. Create an SQS queue and EventBridge rules using the CloudFormation template in this repo
  3. Modify the node IAM role to add the following inline policy:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "autoscaling:CompleteLifecycleAction",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeTags",
                "sqs:DeleteMessage",
                "sqs:ReceiveMessage"
            ],
            "Resource": "*"
        }
    ]
}
  1. Install NTH v1.16.5 into the cluster in Queue Processor mode
curl -L https://github.com/aws/aws-node-termination-handler/releases/download/v1.16.5/all-resources-queue-processor.yaml -o nth-1.16.5.yaml
# Open the YAML file and update QUEUE_URL value
kubectl apply -f ./nth-1.16.5.yaml
  1. Launch an EC2 instance:
simple-ec2 launch --capacity-type spot --tags aws-node-termination-handler/managed=
  1. Send that instance a Spot ITN:
ec2-spot-interrupter --interactive
  1. Verify that NTH receives the message from the queue but does not act on it

Now to test it with the new code.

  1. Modify the node IAM role to remove permissions to ASG from the inline policy:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sqs:DeleteMessage",
                "sqs:ReceiveMessage"
            ],
            "Resource": "*"
        }
    ]
}
  1. Build NTH locally with these changes (make build-docker-images), push that container to a private ECR repo, and install that version of NTH into the EKS cluster
make build-docker-images
docker tag amazon/aws-node-termination-handler:<TAG_OF_LATEST_BUILD> <ECR_REPO_URL>:v1.16.5-gc7b6f88-linux-amd64
docker push <ECR_REPO_URL>:v1.16.5-gc7b6f88-linux-amd64
cp nth-1.16.5.yaml nth-my-build.yaml
# Edit nth-my-build.yaml to point to the ECR image we just pushed
kubectl apply -f ./nth-my-build.yaml
  1. Launch and EC2 instance and sent it a Spot ITN, as before
  2. Verify that NTH received the message and attempts to act on it, and that NTH emits no error messages about lack of permissions to access ASG. ✅

@snay2 snay2 merged commit 5fa4dc4 into aws:main Aug 18, 2022
snay2 added a commit to snay2/aws-node-termination-handler that referenced this pull request Aug 30, 2022
snay2 added a commit to snay2/aws-node-termination-handler that referenced this pull request Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants