Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(examples): local zones examples #314

Merged
merged 10 commits into from
Jun 22, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions examples/deadline/Local-Zone/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# RFDK Sample Application - Local Zones
Copy link
Contributor

@jusiskin jusiskin Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example does a great job of showing how to deploy the Workers to a local zone - great work!

I think one thing that might be missing though is an explanation of why this is important/useful. RFDK users want this so they can have a low-latency connection between their AWS infrastructure and their infrastructure hosted outside AWS. It might help to spend some time at the beginning of this README setting the stage here. It may also help to add some guidance, links, and next steps to point users in the right direction for how they'd take this example and proceed to the next step of connecting their infrastructure outside of AWS.

One good resource we could link to is our developer guide documentation on connecting to a RFDK render farm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding documentation for this in the top-level README.

As discussed offline, let's add a "next steps" section to the Python/TS READMEs that list the steps required to connect an on-premise networked file-system to the Workers. Even if we don't have all of the steps completely detailed, for now having a checklist would help guide readers to research and complete them. These are the steps that I foresee here:

  1. Follow the connecting to your render farm dev guide page to modify the NetworkTier with the the VPN connection.
  2. Ensure there is a route from the worker subnet to the on-prem file-system
  3. Ensure the VPN endpoint's security group has ingress rules to allow NFS port traffic from the Workers
  4. Add user-data to mount the NFS on the compute tier
  5. (optional) set up path mapping rules in Deadline


If you have large asset files that your Worker instances need to access from your on-prem infrastructure, deploying the Workers to a geographically close AWS Local Zone can reduce the latency and increase the speed of your renders. This example will walk you through setting up your workers in a local zone while leaving the rest of the render farm in standard availability zones. Currently Amazon has launched a local zone in Los Angeles that is a part of the us-west-2 region, but they have more on the way. For more information on where local zones are available, how to get access, and what services they provide, refer to the [AWS Local Zones about page](https://aws.amazon.com/about-aws/global-infrastructure/localzones/).

Before deploying your farm, you may want to read our [Connecting to the Render Farm](https://docs.aws.amazon.com/rfdk/latest/guide/connecting-to-render-farm.html#connecting-with-site-to-site-vpn) developer guide for guidance on how to create a connection from your local network to the farm using something like a VPN. All of the techniques listed in the guide require changes to the networking tier of your RFDK app to allow the connection. After your connection is set up, you will be able to configure your network file server to be available on your workers, so any local assets you have can be transferred as needed by the jobs they perform.

---

_**Note:** This application is an illustrative example to showcase some of the capabilities of the RFDK. **It is not intended to be used for production render farms**, which should be built with more consideration of the security and operational needs of the system._

---

## Architecture

This example app assumes you're familiar with the general architecture of an RFDK render farm. If not, please refer to the [All-In-AWS-Infrastructure-Basic](../All-In-AWS-Infrastructure-Basic/README.md) example for the basics.

### Components

#### Network Tier

The network tier sets up a [VPC](https://aws.amazon.com/vpc/) that spans across all of the standard availability zones and local zones that are used, but the NAT Gateway for the VPC is only added to the standard zones, as it is not available in any local zones at this time. In this tier we override the Stack's `availabilityZones()` method, which returns the list of availability zones the Stack can use. It's by this mechanism that we control which zones the VPC will be deployed to.

#### Security Tier

This holds the root CA certificate used for signing any certificates required by the farm, such as the one used by the render queue.

#### Service Tier

The service tier contains the repository and render queue, both of which are provided the selection of standard availability zone subnets to be deployed into. The DocumentDB and EFS filesystem are not available in the local zones at this time, so the repository cannot be moved there. Since the repository needs to be in a standard availability zone, there isn't any benefit to moving the render queue to a local zone.

#### Compute Tier

This tier holds the worker fleet and its health monitor. The health monitor contains a [Network Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) used to perform application-level health checks and the worker fleet contains an [Auto Scaling Group](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroup.html). Currently, these services are available in all launched local zones, so the construct can be placed in those zones.

## Typescript

[Continue to Typescript specific documentation.](ts/README.md)

## Python

[Continue to Python specific documentation.](python/README.md)
14 changes: 14 additions & 0 deletions examples/deadline/Local-Zone/python/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
*.swp
package-lock.json
__pycache__
.pytest_cache
.env
*.egg-info
venv
build

# CDK asset staging directory
.cdk.staging
cdk.out
cdk.context.json
stage
84 changes: 84 additions & 0 deletions examples/deadline/Local-Zone/python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# RFDK Sample Application - Local Zones - Python

## Overview
[Back to overview](../README.md)

## Instructions

---
**NOTE**

These instructions assume that your working directory is `examples/deadline/Local-Zones/python/` relative to the root of the AWS-RFDK package.

---

1. This sample app on the `mainline` branch may contain features that have not yet been officially released, and may not be available in the `aws-rfdk` package installed through pip from PyPI. To work from an example of the latest release, please switch to the `release` branch. If you would like to try out unreleased features, you can stay on `mainline` and follow the instructions for building, packing, and installing the `aws-rfdk` from your local repository.

2. Install the dependencies of the sample app:

```bash
pip install -r requirements.txt
```

3. If working on the `release` branch, this step can be skipped. If working on `mainline`, navigate to the base directory where the build and packaging scripts are, then run them and install the result over top of the `aws-rfdk` version that was installed in the previous step:
```bash
# Navigate to the root directory of the RFDK repository
pushd ../../../..
# Enter the Docker container to run the build and pack scripts
./scripts/rfdk_build_environment.sh
./build.sh
./pack.sh
# Exit the Docker container
exit
# Navigate back to the example directory
popd
pip install ../../../../dist/python/aws-rfdk-<version>.tar.gz
```

4. You must read and accept the [AWS Thinkbox End-User License Agreement (EULA)](https://www.awsthinkbox.com/end-user-license-agreement) to deploy and run Deadline. To do so, change the value of the `accept_aws_thinkbox_eula` in `package/lib/config.py` like this:

```py
self.accept_aws_thinkbox_eula: AwsThinkboxEulaAcceptance = AwsThinkboxEulaAcceptance.USER_ACCEPTS_AWS_THINKBOX_EULA
```

5. Change the value of the `deadline_version` variable in `package/config.py` to specify the desired version of Deadline to be deployed to your render farm. RFDK is compatible with Deadline versions 10.1.9.x and later. To see the available versions of Deadline, consult the [Deadline release notes](https://docs.thinkboxsoftware.com/products/deadline/10.1/1_User%20Manual/manual/release-notes.html). It is recommended to use the latest version of Deadline available when building your farm, but to pin this version when the farm is ready for production use. For example, to pin to the latest `10.1.15.x` release of Deadline, use:

```python
self.deadline_version: str = '10.1.15'
```

6. Change the value of the `deadline_client_linux_ami_map` variable in `package/config.py` to include the region + AMI ID mapping of your EC2 AMI(s) with Deadline Worker. You can use the following AWS CLI command to look up AMIs, replacing the `<region>` and `<version>` to match the AWS region and Deadline version you're looking for:

```bash
aws --region <region> ec2 describe-images --owners 357466774442 --filters "Name=name,Values=*Worker*" "Name=name,Values=*<version>*" --query 'Images[*].[ImageId, Name]' --output text
```

7. Also in `package/lib/config.py`, you can set the `availability_zones_standard` and `availability_zones_local` values to the availability zones you want to use. These values must all be from the same region. It's required that you use at least two standard zones, but you can use more if you'd like. For the local zones, you can use one or more.

8. To gain the benefits of putting your workers in a local zone close to your asset server, you are going to want to set up a connection from your local network to the one you're creating in AWS.
1. You should start by reading through the [Connecting to the Render Farm](https://docs.aws.amazon.com/rfdk/latest/guide/connecting-to-render-farm.html) documentation and implementing one of the methods for connecting your network to your AWS VPC described there.
2. With whichever option you choose, you'll want to make sure you are propagating the worker subnets to your local network. All the options in the document show how to propagate all the private subnets, which will include the ones used by the workers.
3. Ensure your worker fleet's security group allows traffic from your network on the correct ports that your NFS requires to be open. The documentation shows how to [allow connections to the Render Queue](https://docs.aws.amazon.com/rfdk/latest/guide/connecting-to-render-farm.html#allowing-connection-to-the-render-queue), which you may also want to enable if you plan on connecting any of your local machines to your render farm, but you would also want to do something similar for the worker fleet, for example, ports `22` and `2049` are commonly required for NFS, so this code could be added to the `ComputeTier`:

```python
# The customer-prefix-cidr-range needs to be replaced by the CIDR range for your local network that you used when configuring the VPC connection
self.worker_fleet.connections.allow_from(Peer.ipv4('customer-prefix-cidr-range'), Port.tcp(22))
self.worker_fleet.connections.allow_from(Peer.ipv4('customer-prefix-cidr-range'), Port.udp(22))
self.worker_fleet.connections.allow_from(Peer.ipv4('customer-prefix-cidr-range'), Port.tcp(2049))
self.worker_fleet.connections.allow_from(Peer.ipv4('customer-prefix-cidr-range'), Port.tcp(2049))
```

4. Add user-data to mount the NFS on the compute tier. This can be provided in the `UserDataProvider` in the `ComputeTier`.
5. (optional) Set up [path mapping rules in Deadline](https://docs.thinkboxsoftware.com/products/deadline/10.1/1_User%20Manual/manual/cross-platform.html).

9. Deploy all the stacks in the sample app:

```bash
cdk deploy "*"
```

10. Once you are finished with the sample app, you can tear it down by running:

```bash
cdk destroy "*"
```
3 changes: 3 additions & 0 deletions examples/deadline/Local-Zone/python/cdk.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"app": "python -m package.app"
}
Empty file.
94 changes: 94 additions & 0 deletions examples/deadline/Local-Zone/python/package/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
#!/usr/bin/env python3

# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0

import os

from aws_cdk.core import (
App,
Environment
)
from aws_cdk.aws_ec2 import (
MachineImage
)

from .lib import (
config,
network_tier,
security_tier,
service_tier,
compute_tier
)


def main():
# ------------------------------
# Validate Config Values
# ------------------------------
if not config.config.key_pair_name:
print('EC2 key pair name not specified. You will not have SSH access to the render farm.')

# ------------------------------
# Application
# ------------------------------
app = App()

if 'CDK_DEPLOY_ACCOUNT' not in os.environ and 'CDK_DEFAULT_ACCOUNT' not in os.environ:
raise ValueError('You must define either CDK_DEPLOY_ACCOUNT or CDK_DEFAULT_ACCOUNT in the environment.')
if 'CDK_DEPLOY_REGION' not in os.environ and 'CDK_DEFAULT_REGION' not in os.environ:
raise ValueError('You must define either CDK_DEPLOY_REGION or CDK_DEFAULT_REGION in the environment.')
env = Environment(
account=os.environ.get('CDK_DEPLOY_ACCOUNT', os.environ.get('CDK_DEFAULT_ACCOUNT')),
region=os.environ.get('CDK_DEPLOY_REGION', os.environ.get('CDK_DEFAULT_REGION'))
)

# ------------------------------
# Network Tier
# ------------------------------
network = network_tier.NetworkTier(
app,
'NetworkTier',
env=env
)

# ------------------------------
# Security Tier
# ------------------------------
security = security_tier.SecurityTier(
app,
'SecurityTier',
env=env
)

# ------------------------------
# Service Tier
# ------------------------------
service_props = service_tier.ServiceTierProps(
vpc=network.vpc,
availability_zones=config.config.availability_zones_standard,
root_ca=security.root_ca,
dns_zone=network.dns_zone,
deadline_version=config.config.deadline_version,
accept_aws_thinkbox_eula=config.config.accept_aws_thinkbox_eula
)
service = service_tier.ServiceTier(app, 'ServiceTier', props=service_props, env=env)

# ------------------------------
# Compute Tier
# ------------------------------
deadline_client_image = MachineImage.generic_linux(config.config.deadline_client_linux_ami_map)
compute_props = compute_tier.ComputeTierProps(
vpc=network.vpc,
availability_zones=config.config.availability_zones_local,
render_queue=service.render_queue,
worker_machine_image=deadline_client_image,
key_pair_name=config.config.key_pair_name,
)
_compute = compute_tier.ComputeTier(app, 'ComputeTier', props=compute_props, env=env)

app.synth()


if __name__ == '__main__':
main()
Empty file.
111 changes: 111 additions & 0 deletions examples/deadline/Local-Zone/python/package/lib/compute_tier.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0

from dataclasses import dataclass
from typing import (
List,
Optional
)

from aws_cdk.core import (
Construct,
Stack,
StackProps
)
from aws_cdk.aws_ec2 import (
IMachineImage,
InstanceClass,
InstanceSize,
InstanceType,
IVpc,
SubnetSelection,
SubnetType
)

from aws_rfdk import (
HealthMonitor,
SessionManagerHelper
)
from aws_rfdk.deadline import (
InstanceUserDataProvider,
IRenderQueue,
WorkerInstanceFleet
)


@dataclass
class ComputeTierProps(StackProps):
"""
Properties for ComputeTier
"""
# The VPC to deploy resources into.
vpc: IVpc
# The availability zones the worker instances will be deployed to. This can include your local
# zones, but they must belong to the same region as the standard zones used in other stacks in
# this application.
availability_zones: List[str]
# The IRenderQueue that Deadline Workers connect to.
render_queue: IRenderQueue
# The IMachineImage to use for Workers (needs Deadline Client installed).
worker_machine_image: IMachineImage
# The name of the EC2 keypair to associate with Worker nodes.
key_pair_name: Optional[str]


class UserDataProvider(InstanceUserDataProvider):
def __init__(self, scope: Construct, stack_id: str):
super().__init__(scope, stack_id)

def pre_worker_configuration(self, host) -> None:
# Add code here for mounting your NFS to the workers
host.user_data.add_commands("echo preWorkerConfiguration")


class ComputeTier(Stack):
"""
The computer tier consists of the worker fleets. We'll be deploying the workers into the
local zone we're using.
"""
def __init__(self, scope: Construct, stack_id: str, *, props: ComputeTierProps, **kwargs):
"""
Initializes a new instance of ComputeTier
:param scope: The Scope of this construct.
:param stack_id: The ID of this construct.
:param props: The properties of this construct.
:param kwargs: Any kwargs that need to be passed on to the parent class.
"""
super().__init__(scope, stack_id, **kwargs)

# We can put the health monitor and worker fleet in all of the local zones we're using
subnets = SubnetSelection(
availability_zones=props.availability_zones,
subnet_type=SubnetType.PRIVATE,
one_per_az=True
)

# We can put the health monitor in all of the local zones we're using for the worker fleet
self.health_monitor = HealthMonitor(
self,
'HealthMonitor',
vpc=props.vpc,
vpc_subnets=subnets,
deletion_protection=False
)

self.worker_fleet = WorkerInstanceFleet(
self,
'WorkerFleet',
vpc=props.vpc,
vpc_subnets=subnets,
render_queue=props.render_queue,
# Not all instance types will be available in local zones. For a list of the instance types
# available in each local zone, you can refer to:
# https://aws.amazon.com/about-aws/global-infrastructure/localzones/features/#AWS_Services
# BURSTABLE3 is a T3; the third generation of burstable instances
instance_type=InstanceType.of(InstanceClass.BURSTABLE3, InstanceSize.LARGE),
worker_machine_image=props.worker_machine_image,
health_monitor=self.health_monitor,
key_name=props.key_pair_name,
user_data_provider=UserDataProvider(self, 'UserDataProvider')
)
SessionManagerHelper.grant_permissions_to(self.worker_fleet)
Loading