This directory contains a set of Terraform configuration files for deploying a complete, end-to-end set of resources for running Metaflow on AWS using Terraform modules from terraform-aws-metaflow.
This repo only contains configuration for non-Metaflow-specific resources, such as AWS VPC infra and Sagemaker notebook instance; Metaflow-specific parts are provided by reusable modules from terraform-aws-metaflow.
Note: The reusable terraform module (source code here) itself includes a couple of full "start-from-scratch" examples of:
- a minimal Metaflow stack (using AWS Batch for compute and AWS Step Functions for orchestration)
- a Kubernetes based Metaflow stack (using AWS EKS for compute, and Argo Workflows for orchestration)
Download and install terraform 0.14.x or later.
AWS credentials should be configured in your environment.
The infra sub-project provides example networking infrastructure for the Metaflow service. For more details see the README
Copy example.tfvars
to prod.tfvars
(or whatever environment name you prefer) and update that env
name and the region
as needed. These variables are used to construct unique names for infrastructure resources.
To deploy, initialize Terraform:
cd infra && terraform init
Apply the configuration:
terraform apply --var-file prod.tfvars
The metaflow sub-project uses modules from terraform-aws-metaflow to provision the Metaflow service, AWS Step Functions, and AWS Batch resources.
Copy example.tfvars
to prod.tfvars
(or whatever environment name you prefer) and update that env
name and the region
as needed. These variables are used to construct unique names for infrastructure resources.
By default, the Metadata API has basic authentication enabled, but it is exposed to the public internet via Amazon API Gateway. To further restrict access to the API, the access_list_cidr_blocks
can be set to specify IPs or network cidr blocks that are allowed to access the endpoint, blocking all other access.
Additionally, the enable_step_functions
flag can be set to false to not provision the AWS Step Functions infrastructure.
To deploy, initialize Terraform:
cd metaflow && terraform init
Apply the configuration:
terraform apply --var-file prod.tfvars
Once the Terraform executes, configure Metaflow using metaflow configure import ./metaflow_config_<env>_<region>.json
A custom container image can be used by setting the variable enable_custom_batch_container_registry
to true
. This will provision an Amazon ECR registry, and the generated Metaflow configuration will have METAFLOW_BATCH_CONTAINER_IMAGE
and METAFLOW_BATCH_CONTAINER_REGISTRY
set to point to the private Amazon ECR repository. The container image must then be pushed into the repository before the first flow can be executed.
To do this, first copy the output of metaflow_batch_container_image
.
Then login to the Amazon ECR repository:
aws ecr get-login-password | docker login --username AWS --password-stdin <ecr-repository-name>
Pull the appropriate image from Docker Hub. In this case, we are using continuumio/miniconda3:latest
:
docker pull continuumio/miniconda3
Tag the image:
docker tag continuumio/miniconda3:latest <ecr-repository-name>
Push the image:
docker push <ecr-repository-name>
The sagemaker-notebook subproject provisions an optional Jupyter notebook with access to the Metaflow API.
Copy example.tfvars
to prod.tfvars
(or whatever environment name you prefer) and update that env
name and the region
as needed. These variables are used to construct unique names for infrastructure resources.
To deploy, initialize Terraform:
cd sagemaker-notebook && terraform init
Apply the configuration:
terraform apply --var-file prod.tfvars
The Amazon Sagemaker notebook url is output as SAGEMAKER_NOTEBOOK_URL
. Open it to access the notebook.