Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move from @1debit to @chime #94

Merged
merged 3 commits into from
May 9, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -1 +1 @@
* @1debit/infrastructure-eng @1debit/security
* @chime/maintainers
44 changes: 21 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@

NAT Gateways are dead. Long live NAT instances!

Built and released with 💚 by <a href="https://chime.com"><img src="/assets/Chime_company_logo.png" alt="Chime Engineering" width="146"/></a>
Built and released with 💚 by <a href="https://chime.com"><img src="/assets/Chime_company_logo.png" alt="Chime Engineering" width="60"/></a>
bwhaley marked this conversation as resolved.
Show resolved Hide resolved

[![GitHub Actions](https://github.com/1debit/alternat/workflows/Build/badge.svg)](https://github.com/1debit/alternat/actions)
[![GitHub Actions](https://github.com/chime/terraform-aws-alternat/workflows/Build/badge.svg)](https://github.com/chime/terraform-aws-alternat/actions)


## Background

On AWS, [NAT devices](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat.html) are required for accessing the Internet from private VPC subnets. Usually, the best option is a NAT gateway, a fully managed NAT service. The [pricing structure of NAT gateway](https://aws.amazon.com/vpc/pricing/) includes charges of $.045 per hour per NAT Gateway, plus **$.045 per GB** processed. The former charge is reasonable at about $32.40 per month. However, the latter charge can be *extremely* expensive for larger traffic volumes.
On AWS, [NAT devices](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat.html) are required for accessing the Internet from private VPC subnets. Usually, the best option is a NAT gateway, a fully managed NAT service. The [pricing structure of NAT gateway](https://aws.amazon.com/vpc/pricing/) includes charges of $0.045 per hour per NAT Gateway, plus **$0.045 per GB** processed. The former charge is reasonable at about $32.40 per month. However, the latter charge can be *extremely* expensive for larger traffic volumes.

In addition to the direct NAT Gateway charges, there are also Data Transfer charges for outbound traffic leaving AWS (known as egress traffic). The cost varies depending on destination and volume, ranging from $0.09/GB to $0.01 per GB (after a free tier of 100GB). That’s right: traffic traversing the NAT Gateway is first charged for processing, then charged again for egress to the Internet.
In addition to the direct NAT Gateway charges, there are also Data Transfer charges for outbound traffic leaving AWS (known as egress traffic). The cost varies depending on destination and volume, ranging from $0.01 to $0.09 per GB (after a free tier of 100GB). That’s right: traffic traversing the NAT Gateway is first charged for processing, then charged again for egress to the Internet.

Consider, for instance, the cost of sending 1PB to and from the Internet through a NAT Gateway - not an unusual amount for some use cases - is $75,604. Many customers may be dealing with far less than 1PB, but the cost can be high even at relatively lower traffic volumes. This drawback of NAT gateway is [widely](https://www.lastweekinaws.com/blog/the-aws-managed-nat-gateway-is-unpleasant-and-not-recommended/) [lamented](https://www.cloudforecast.io/blog/aws-nat-gateway-pricing-and-cost/) [among](https://www.vantage.sh/blog/nat-gateway-vpc-endpoint-savings) [AWS users](https://www.stephengrier.com/reducing-the-cost-of-aws-nat-gateways/).

Expand Down Expand Up @@ -41,7 +41,7 @@ Features:

Read on to learn more about alterNAT.

## Architecture overview
## Architecture Overview

![Architecture diagram](/assets/architecture.png)

Expand All @@ -52,7 +52,7 @@ The two main elements of the NAT instance solution are:

Both are deployed by the Terraform module located in [`modules/terraform-aws-alternat`](modules/terraform-aws-alternat).

### NAT instance Auto Scaling Group and standby NAT Gateway
### NAT Instance Auto Scaling Group and Standby NAT Gateway

The solution deploys an Auto Scaling Group (ASG) for each provided public subnet. Each ASG contains a single instance. When the instance boots, the [user data](modules/terraform-aws-alternat/alternat.sh.tftpl) initializes the instance to do the NAT stuff.

Expand All @@ -65,7 +65,7 @@ By default, the ASGs are configured with a [maximum instance lifetime](https://d

The standby NAT Gateway is a safety measure. It is only used if the NAT instance is actively being replaced, either due to the maximum instance lifetime or due to some other failure scenario.

### replace-route Lambda Function
### `replace-route` Lambda Function

The purpose of [the replace-route Lambda Function](functions/replace-route) is to update the route table of the private subnets to route through the standby NAT gateway. It does this in response to two events:

Expand Down Expand Up @@ -96,7 +96,7 @@ For our use case, and for many others, this limitation is acceptable. Many clien

The Internet is unreliable by design, so failure modes such as connection loss should be a consideration in any resilient system.

### Edge cases
### Edge Cases

As described above, alterNAT uses the [`ReplaceRoute` API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_ReplaceRoute.html) (among others) to switch the route in the event of a NAT instance failure or Auto Scaling termination event. One possible failure scenario could occur where the EC2 control plane is for some reason not functional (e.g. an outage within AWS) and a NAT instance fails at the same time. The replace-route function may be unable to automatically switch the route to the NAT Gateway because the control plane is down. One mitigation would be to attempt to manually replace the route for the impacted subnet(s) using the CLI or console. However, if the control plane is in fact down and no APIs are working, waiting until the issue is resolved may be the only option.

Expand All @@ -107,9 +107,9 @@ There are two ways to deploy alterNAT:
- By building a Docker image and using AWS Lambda support for containers
- By using AWS Lambda runtime for Python directly

Use this project directly, as provided, or draw inspiration from it and use only the parts you need. We cut [releases](https://github.com/1debit/alternat/releases) following the [Semantic Versioning](https://semver.org/) method. We recommend pinning to our tagged releases or using the short commit SHA if you decide to use this repo directly.
Use this project directly, as provided, or draw inspiration from it and use only the parts you need. We cut [releases](https://github.com/chime/terraform-aws-alternat/releases) following the [Semantic Versioning](https://semver.org/) method. We recommend pinning to our tagged releases or using the short commit SHA if you decide to use this repo directly.

### Building and pushing the container image
### Building and Pushing the Container Image

Build and push the container image using the [`Dockerfile`](Dockerfile).

Expand All @@ -120,7 +120,7 @@ docker build . -t <your_registry_url>/<your_repo:<release tag or short git commi
docker push <your_registry_url>/<your_repo:<release tag or short git commit sha>
```

### Use the Terraform module
### Use the Terraform Module

Start by reviewing the available [input variables](modules/terraform-aws-alternat/variables.tf). Example usage:

Expand All @@ -143,7 +143,7 @@ data "aws_subnet" "subnet" {
}

module "alternat_instances" {
source = "git::https://github.com/1debit/alternat.git//modules/terraform-aws-alternat?ref=v0.3.3"
source = "git::https://github.com/chime/terraform-aws-alternat.git//modules/terraform-aws-alternat?ref=v0.3.3"

alternat_image_uri = "0123456789012.dkr.ecr.us-east-1.amazonaws.com/alternat-functions-lambda"
alternat_image_tag = "v0.3.3"
Expand Down Expand Up @@ -207,7 +207,7 @@ AlterNATively, you can remove the NAT Gateways and their EIPs from your existing

While we'd like for this to be available on the Terraform Registry, it requires a specific repo naming convention and folder structure that we do not want to adopt.

### Other considerations
### Other Considerations

- Read [the Amazon EC2 instance network bandwidth page](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html) carefully. In particular:

Expand Down Expand Up @@ -244,9 +244,7 @@ While we'd like for this to be available on the Terraform Registry, it requires
nat_gateway_id = "nat-..."
```



## Future work
## Future Work

We would like this benefit to benefit as many users as possible. Possible future enhancements include:

Expand All @@ -257,7 +255,7 @@ We would like this benefit to benefit as many users as possible. Possible future

## Contributing

[Issues](https://github.com/1debit/alternat/issues) and [pull requests](https://github.com/1debit/alternat/pulls) are most welcome!
[Issues](https://github.com/chime/terraform-aws-alternat/issues) and [pull requests](https://github.com/chime/terraform-aws-alternat/pulls) are most welcome!

alterNAT is intended to be a safe, welcoming space for collaboration. Contributors are expected to adhere to the [Contributor Covenant code of conduct](CODE_OF_CONDUCT.md).

Expand All @@ -266,39 +264,39 @@ alterNAT is intended to be a safe, welcoming space for collaboration. Contributo

To test locally, install the AWS SAM CLI client:

```
```shell
brew tap aws/tap
brew install aws-sam-cli
```

Build sam and invoke the functions:

```
```shell
sam build
sam local invoke <FUNCTION NAME> -e <event_filename>.json
```

Example:

```
```shell
cd functions/replace-route
sam local invoke AutoScalingTerminationFunction -e sns-event.json
sam local invoke ConnectivityTestFunction -e cloudwatch-event.json
```


## Making actual calls to AWS for testing
## Making Actual Calls to AWS for Testing

In the first terminal

```
```shell
cd functions/replace-route
sam build && sam local start-lambda # This will start up a docker container running locally
```

In a second terminal, invoke the function back in terminal one:

```
```shell
cd functions/replace-route
aws lambda invoke --function-name "AutoScalingTerminationFunction" --endpoint-url "http://127.0.0.1:3001" --region us-east-1 --cli-binary-format raw-in-base64-out --payload file://./sns-event.json --no-verify-ssl out.txt
aws lambda invoke --function-name "ConnectivityTestFunction" --endpoint-url "http://127.0.0.1:3001" --region us-east-1 --cli-binary-format raw-in-base64-out --payload file://./cloudwatch-event.json --no-verify-ssl out.txt
Expand Down
4 changes: 2 additions & 2 deletions docs/0.2.0-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Previouly, using the alternat module with the open source [`terraform-aws-vpc` m

```
module "alternat" {
source = "git@github.com:1debit/alternat.git//modules/terraform-aws-alternat?ref=v0.1.3"
source = "git@github.com:chime/terraform-aws-alternat.git//modules/terraform-aws-alternat?ref=v0.1.3"

alternat_image_uri = "012345678901.dkr.ecr.us-west-2.amazonaws.com/alternat"
alternat_image_tag = "v0.1.3"
Expand Down Expand Up @@ -53,7 +53,7 @@ locals {
}

module "alternat" {
source = "git@github.com:1debit/alternat.git//modules/terraform-aws-alternat?ref=v0.2.0"
source = "git@github.com:chime/terraform-aws-alternat.git//modules/terraform-aws-alternat?ref=v0.2.0"

alternat_image_uri = "188238883601.dkr.ecr.us-west-2.amazonaws.com/alternat"
alternat_image_tag = "v0.2.0"
Expand Down
2 changes: 1 addition & 1 deletion functions/replace-route/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@


# Overrides socket.getaddrinfo to perform IPv4 lookups
# See https://github.com/1debit/alternat/issues/87
# See https://github.com/chime/terraform-aws-alternat/issues/87
def disable_ipv6():
prv_getaddrinfo = socket.getaddrinfo
def getaddrinfo_ipv4(*args):
Expand Down
Loading