-
Notifications
You must be signed in to change notification settings - Fork 230
AWS ECS
AWS can be a big, scary place. This page tries to demystify the process of creating and running your own Faktory service within AWS's Elastic Container Service.
Faktory is a stateful database process (i.e. it contains your persistent job data) so your application will have one Faktory server instance combined with many Faktory worker instances. The worker instances are the language-specific processes which fetch and execute jobs from the Faktory server.
Faktory {OSS,Ent} both publish Docker images for each release. We're going to run the stock Docker image within ECS.
We need to create a task which configures all of the resources necessary for the Docker image to run. The task definition is quite complex and can be highly specific to your environment and specific application. See https://docs.aws.amazon.com/AmazonECS/latest/userguide/create-task-definition.html
- Go to the AWS Console > ECS > Task definitions and select "Create a new Task Definition".
- I'd recommend the Fargate task type so AWS will auto-select a proper EC2 instance type as your need grows.
- Give it a name like
$APP-faktory-task
. - All pending job data must fit in RAM so tune the RAM based on expected scale carefully. This will depend on how many jobs you expect to be enqueued at once, the number of jobs scheduled to run in the future, failed jobs awaiting retry, etc. A small app might need 0.5GB/0.5vCPU, a large busy app might need 8GB/4vCPU.
- Add a container:
- To run Faktory OSS, you can use the image name
contribsys/faktory:latest
. For commercial users, you'll usedocker.contribsys.com/contribsys/faktory-ent:latest
. Replacelatest
with a specific version if you want precise control. Commercial users: you can use the private repository authentication feature with your credentials. - Add soft and hard memory limits based on your memory config above.
- Map TCP ports 7419 and 7420.
- Set container start/stop timeouts to 60 seconds.
- Environment variables
- Set FAKTORY_ENV to
production
orstaging
depending on your environment. - Set FAKTORY_PASSWORD to a value that your clients know.
- Commercial users: set FAKTORY_LICENSE to the value in your access email.
- Set FAKTORY_ENV to
- Mount a persistent, read/write filesystem to
/var/lib/faktory
so that Faktory's datafile is saved across reboots. A reboot can occur for good (e.g. an upgrade) or for bad (hardware error, bug, etc). - Mount a read-only filesystem to
/etc/faktory
for Faktory's runtime configuration.
Much of this setup overlaps heavily with the advice on the Docker wiki page.
Create a Cluster based on the task created above.
- Go to ECS > Clusters > Create cluster.
- Select
Networking only
. - Give it a name like
$APP-faktory-$ENVIRONMENT-cluster
.
- In the cluster you just created, on the Tasks tab, click
Run new Task
. - Select Launch type
FARGATE
. - Select your Faktory task if it isn't already selected.
- Number of tasks: 1.
- Configure your VPC/network as necessary. Make sure your selected Security Group allows traffic to ports 7419 and 7420.
- Run that task!
As an exercise to the reader:
- Open the CloudWatch logs to the task. Verify you see no errors and "Listening" log messages.
- Open up the Web UI, port 7420, in your browser. If it times out, you've likely got VPC/security group/network issues.
- Create a dynamic DNS entry for your Faktory server.
- Connect a Faktory client, push a job and verify it appears in the Web UI.
If you want to make regular backups of your datafile, you can mount the read/write filesystem above and cp
the data/redis.db
file in a cron job to an S3 bucket or some other storage. Keep in mind that Faktory's datafile is typically quite small because queues are meant to be empty most of the time. Only if you schedule a lot of jobs or have a lot of failed jobs should you see larger file sizes.
// Create the cluster
faktoryCluster := ecs.NewCluster(as.Stack, jsii.String("FaktoryCluster"), &ecs.ClusterProps{
Vpc: as.vpc,
ContainerInsights: jsii.Bool(true),
})
// Create task definition
faktoryTaskDef := ecs.NewFargateTaskDefinition(as.Stack, jsii.String("FaktoryTaskDef"), &ecs.FargateTaskDefinitionProps{
Cpu: jsii.Number(props.faktoryCPU),
MemoryLimitMiB: jsii.Number(props.faktoryMemoryLimitMiB),
})
// Fetch faktory license which was manually created
faktoryLicense := secretsmanager.Secret_FromSecretCompleteArn(
as.Stack,
jsii.String("permanent/sandbox/FaktoryLicense"),
jsii.String("<license-arn>"),
)
// Create docker image asset from local build to pass to fargate
faktoryImageAsset := ecs.ContainerImage_FromAsset(
jsii.String("../cmd/worker"),
&ecs.AssetImageProps{File: jsii.String("Dockerfile.faktory")},
)
// Add the Faktory container
faktoryContainer := faktoryTaskDef.AddContainer(jsii.String("FaktoryContainer"), &ecs.ContainerDefinitionOptions{
Image: faktoryImageAsset,
Essential: jsii.Bool(true),
Environment: &map[string]*string{
"FAKTORY_ENV": jsii.String(props.env),
"FAKTORY_PASSWORD": jsii.String(faktoryPassword),
},
LinuxParameters: ecs.NewLinuxParameters(as.Stack, jsii.String("FaktoryContainerLinuxParams"), &ecs.LinuxParametersProps{
InitProcessEnabled: jsii.Bool(true), // https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-exec.html#ecs-exec-considerations
}),
Logging: ecs.LogDriver_AwsLogs(&ecs.AwsLogDriverProps{
StreamPrefix: stackName,
LogRetention: logs.RetentionDays_THREE_MONTHS, // TODO: determine optimal value
}),
Secrets: &map[string]ecs.Secret{
"FAKTORY_LICENSE": ecs.Secret_FromSecretsManager(faktoryLicense, nil),
},
StartTimeout: cdk.Duration_Seconds(jsii.Number(60)), // Recommended values from the Faktory wiki
StopTimeout: cdk.Duration_Seconds(jsii.Number(60)), // Recommended values from the Faktory wiki
PortMappings: &[]*ecs.PortMapping{
{
ContainerPort: jsii.Number(faktoryJobsPort),
HostPort: jsii.Number(faktoryJobsPort),
},
{
ContainerPort: jsii.Number(faktoryWebPort),
HostPort: jsii.Number(faktoryWebPort),
},
},
})
// Add a CloudWatch Agent sidecar for Faktory metrics statsd collection
// https://github.com/contribsys/faktory/wiki/Ent-Metrics
// https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/deploy_servicelens_CloudWatch_agent_deploy_ECS.html#deploy_servicelens_CloudWatch_agent_deploy_ECS_definition_Fargate
// https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-custom-metrics-statsd.html
// https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html
cwAgentConfig := fmt.Sprintf(`{"metrics":{"namespace":"%s-faktory-metrics","metrics_collected":{"statsd":{"service_address":":%d","metrics_collection_interval":30,"metrics_aggregation_interval":30}}}}`, *stackName, statsdPort)
faktoryTaskDef.AddContainer(jsii.String("CWAgentSidecar"), &ecs.ContainerDefinitionOptions{
Image: ecs.ContainerImage_FromRegistry(jsii.String("public.ecr.aws/cloudwatch-agent/cloudwatch-agent:latest"), &ecs.RepositoryImageProps{}),
Essential: jsii.Bool(true),
Environment: &map[string]*string{
"CW_CONFIG_CONTENT": jsii.String(cwAgentConfig),
},
Logging: ecs.LogDriver_AwsLogs(&ecs.AwsLogDriverProps{
StreamPrefix: stackName,
LogRetention: logs.RetentionDays_THREE_MONTHS,
}),
PortMappings: &[]*ecs.PortMapping{
{
ContainerPort: jsii.Number(statsdPort),
HostPort: jsii.Number(statsdPort),
},
},
})
cwMetricsPolicy := iam.NewPolicyStatement(&iam.PolicyStatementProps{
Effect: iam.Effect_ALLOW,
Actions: &[]*string{
jsii.String("cloudwatch:PutMetricData"),
},
Resources: &[]*string{
jsii.String("*"),
},
})
faktoryTaskDef.AddToTaskRolePolicy(cwMetricsPolicy)
// Create service via ecspatterns
faktorySvc := ecspatterns.NewNetworkMultipleTargetGroupsFargateService(as.Stack, jsii.String("Faktory"), &ecspatterns.NetworkMultipleTargetGroupsFargateServiceProps{
Cluster: faktoryCluster,
TaskDefinition: faktoryTaskDef,
EnableExecuteCommand: jsii.Bool(true),
PropagateTags: ecs.PropagatedTagSource_SERVICE,
LoadBalancers: &[]*ecspatterns.NetworkLoadBalancerProps{
{
Name: jsii.String("FaktoryNLB"),
Listeners: &[]*ecspatterns.NetworkListenerProps{
{
Name: jsii.String("FaktoryJobsListener"),
Port: jsii.Number(faktoryJobsPort),
},
{
Name: jsii.String("FaktoryWebListener"),
Port: jsii.Number(faktoryWebPort),
},
},
PublicLoadBalancer: jsii.Bool(false),
},
},
TargetGroups: &[]*ecspatterns.NetworkTargetProps{
{
ContainerPort: jsii.Number(faktoryJobsPort),
Listener: jsii.String("FaktoryJobsListener"),
},
{
ContainerPort: jsii.Number(faktoryWebPort),
Listener: jsii.String("FaktoryWebListener"),
},
},
})
// Open up ingress to the cluster so that NLB health checks can pass
// NOTE: This is more permissive than strictly necessary, but currently there doesn't appear to
// be a better solution for NLBs. See for more info:
// https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-register-targets.html
// https://github.com/aws/aws-cdk/issues/1490
faktorySG := (*faktorySvc.Service().Connections().SecurityGroups())[0] // We assume only 1 SG
faktorySG.AddIngressRule(
ec2.Peer_Ipv4(as.vpc.VpcCidrBlock()),
ec2.Port_TcpRange(jsii.Number(faktoryJobsPort), jsii.Number(faktoryWebPort)),
jsii.String("Allows access to faktory from anywhere within the VPC CIDR block (needed for NLB health checks)"),
jsii.Bool(false),
)
// Define the health checks
for _, faktoryTG := range *faktorySvc.TargetGroups() {
if int(*faktoryTG.DefaultPort()) == faktoryWebPort {
faktoryTG.SetHealthCheck(&elb.HealthCheck{
Protocol: elb.Protocol_HTTP,
HealthyHttpCodes: jsii.String("200"),
Path: jsii.String("/health"),
Port: jsii.String(strconv.Itoa(faktoryWebPort)),
})
}
if int(*faktoryTG.DefaultPort()) == faktoryJobsPort {
faktoryTG.SetHealthCheck(&elb.HealthCheck{
Protocol: elb.Protocol_TCP,
Port: jsii.String(strconv.Itoa(faktoryJobsPort)),
})
}
}
// ************************************************************************
// SET UP FAKTORY EFS VOLUME FOR PERSISTENT STORAGE
// ************************************************************************
// Create FS encryption key
faktoryFSEncKey := kms.NewKey(as.Stack, jsii.String("FaktoryFSEncryptionKey"), &kms.KeyProps{
Alias: jsii.String(fmt.Sprintf("efs-%s-faktory-fs", *stackName)),
EnableKeyRotation: jsii.Bool(false), // NOTE: key rotation can lead to expoenetial kms cost increases
})
// Create the file system
faktoryFS := efs.NewFileSystem(as.Stack, jsii.String("FaktoryFS"), &efs.FileSystemProps{
Vpc: as.vpc,
EnableAutomaticBackups: jsii.Bool(false),
Encrypted: jsii.Bool(true),
KmsKey: faktoryFSEncKey,
LifecyclePolicy: efs.LifecyclePolicy_AFTER_90_DAYS,
OutOfInfrequentAccessPolicy: efs.OutOfInfrequentAccessPolicy_AFTER_1_ACCESS,
PerformanceMode: efs.PerformanceMode_GENERAL_PURPOSE,
SecurityGroup: faktorySG,
ThroughputMode: efs.ThroughputMode_ELASTIC,
VpcSubnets: &ec2.SubnetSelection{Subnets: as.vpc.PrivateSubnets()},
RemovalPolicy: cdk.RemovalPolicy_DESTROY,
})
// Create the FS access point
faktoryFSAP := faktoryFS.AddAccessPoint(jsii.String("FaktoryFSAccessPoint"), &efs.AccessPointOptions{
Path: jsii.String("/var/lib/faktory"),
CreateAcl: &efs.Acl{
OwnerGid: jsii.String("1000"),
OwnerUid: jsii.String("1000"),
Permissions: jsii.String("755"),
},
PosixUser: &efs.PosixUser{
Gid: jsii.String("1000"),
Uid: jsii.String("1000"),
},
})
// Open up FS ingress from the faktory service
faktorySG.AddIngressRule(
faktorySG,
ec2.Port_Tcp(jsii.Number(2049)), // NFS service
jsii.String("Allows access to EFS NFS service from faktory"),
jsii.Bool(false),
)
// Define the volume
faktoryFSConfig := ecs.EfsVolumeConfiguration{
FileSystemId: jsii.String(*faktoryFS.FileSystemId()),
AuthorizationConfig: &ecs.AuthorizationConfig{
AccessPointId: faktoryFSAP.AccessPointId(),
Iam: jsii.String("ENABLED"),
},
TransitEncryption: jsii.String("ENABLED"),
}
faktoryFSVolume := ecs.Volume{
Name: jsii.String("FaktoryFSVolume"),
EfsVolumeConfiguration: &faktoryFSConfig,
}
// Add/mount the volume
faktoryTaskDef.AddVolume(&ecs.Volume{
Name: jsii.String("FaktoryFSVolume"),
EfsVolumeConfiguration: &faktoryFSConfig,
})
faktoryContainer.AddMountPoints(&ecs.MountPoint{
ContainerPath: jsii.String("/var/lib/faktory"),
ReadOnly: jsii.Bool(false),
SourceVolume: faktoryFSVolume.Name,
})
Home | Installation | Getting Started Ruby | Job Errors | FAQ | Related Projects
This wiki is tracked by git and publicly editable. You are welcome to fix errors and typos. Any defacing or vandalism of content will result in your changes being reverted and you being blocked.