Skip to content
Gregory M Kohler edited this page Nov 14, 2023 · 5 revisions

AWS can be a big, scary place. This page tries to demystify the process of creating and running your own Faktory service within AWS's Elastic Container Service.

Background

Faktory is a stateful database process (i.e. it contains your persistent job data) so your application will have one Faktory server instance combined with many Faktory worker instances. The worker instances are the language-specific processes which fetch and execute jobs from the Faktory server.

Faktory {OSS,Ent} both publish Docker images for each release. We're going to run the stock Docker image within ECS.

Step 1 - Create a Faktory Server task

We need to create a task which configures all of the resources necessary for the Docker image to run. The task definition is quite complex and can be highly specific to your environment and specific application. See https://docs.aws.amazon.com/AmazonECS/latest/userguide/create-task-definition.html

  1. Go to the AWS Console > ECS > Task definitions and select "Create a new Task Definition".
  2. I'd recommend the Fargate task type so AWS will auto-select a proper EC2 instance type as your need grows.
  3. Give it a name like $APP-faktory-task.
  4. All pending job data must fit in RAM so tune the RAM based on expected scale carefully. This will depend on how many jobs you expect to be enqueued at once, the number of jobs scheduled to run in the future, failed jobs awaiting retry, etc. A small app might need 0.5GB/0.5vCPU, a large busy app might need 8GB/4vCPU.
  5. Add a container:
  • To run Faktory OSS, you can use the image name contribsys/faktory:latest. For commercial users, you'll use docker.contribsys.com/contribsys/faktory-ent:latest. Replace latest with a specific version if you want precise control. Commercial users: you can use the private repository authentication feature with your credentials.
  • Add soft and hard memory limits based on your memory config above.
  • Map TCP ports 7419 and 7420.
  • Set container start/stop timeouts to 60 seconds.
  • Environment variables
    • Set FAKTORY_ENV to production or staging depending on your environment.
    • Set FAKTORY_PASSWORD to a value that your clients know.
    • Commercial users: set FAKTORY_LICENSE to the value in your access email.
  • Mount a persistent, read/write filesystem to /var/lib/faktory so that Faktory's datafile is saved across reboots. A reboot can occur for good (e.g. an upgrade) or for bad (hardware error, bug, etc).
  • Mount a read-only filesystem to /etc/faktory for Faktory's runtime configuration.

Much of this setup overlaps heavily with the advice on the Docker wiki page.

Step 2 - Create Faktory Cluster

Create a Cluster based on the task created above.

  1. Go to ECS > Clusters > Create cluster.
  2. Select Networking only.
  3. Give it a name like $APP-faktory-$ENVIRONMENT-cluster.

Step 3 - Run your Faktory task

  1. In the cluster you just created, on the Tasks tab, click Run new Task.
  2. Select Launch type FARGATE.
  3. Select your Faktory task if it isn't already selected.
  4. Number of tasks: 1.
  5. Configure your VPC/network as necessary. Make sure your selected Security Group allows traffic to ports 7419 and 7420.
  6. Run that task!

Step 4 - Verify

As an exercise to the reader:

  1. Open the CloudWatch logs to the task. Verify you see no errors and "Listening" log messages.
  2. Open up the Web UI, port 7420, in your browser. If it times out, you've likely got VPC/security group/network issues.
  3. Create a dynamic DNS entry for your Faktory server.
  4. Connect a Faktory client, push a job and verify it appears in the Web UI.

Notes

Backups

If you want to make regular backups of your datafile, you can mount the read/write filesystem above and cp the data/redis.db file in a cron job to an S3 bucket or some other storage. Keep in mind that Faktory's datafile is typically quite small because queues are meant to be empty most of the time. Only if you schedule a lot of jobs or have a lot of failed jobs should you see larger file sizes.

Setup using CDK (in golang)

        // Create the cluster
	faktoryCluster := ecs.NewCluster(as.Stack, jsii.String("FaktoryCluster"), &ecs.ClusterProps{
		Vpc:               as.vpc,
		ContainerInsights: jsii.Bool(true),
	})

	// Create task definition
	faktoryTaskDef := ecs.NewFargateTaskDefinition(as.Stack, jsii.String("FaktoryTaskDef"), &ecs.FargateTaskDefinitionProps{
		Cpu:            jsii.Number(props.faktoryCPU),
		MemoryLimitMiB: jsii.Number(props.faktoryMemoryLimitMiB),
	})

	// Fetch faktory license which was manually created
	faktoryLicense := secretsmanager.Secret_FromSecretCompleteArn(
		as.Stack,
		jsii.String("permanent/sandbox/FaktoryLicense"),
		jsii.String("<license-arn>"),
	)

	// Create docker image asset from local build to pass to fargate
	faktoryImageAsset := ecs.ContainerImage_FromAsset(
		jsii.String("../cmd/worker"),
		&ecs.AssetImageProps{File: jsii.String("Dockerfile.faktory")},
	)

	// Add the Faktory container
	faktoryContainer := faktoryTaskDef.AddContainer(jsii.String("FaktoryContainer"), &ecs.ContainerDefinitionOptions{
		Image:     faktoryImageAsset,
		Essential: jsii.Bool(true),
		Environment: &map[string]*string{
			"FAKTORY_ENV":      jsii.String(props.env),
			"FAKTORY_PASSWORD": jsii.String(faktoryPassword),
		},
		LinuxParameters: ecs.NewLinuxParameters(as.Stack, jsii.String("FaktoryContainerLinuxParams"), &ecs.LinuxParametersProps{
			InitProcessEnabled: jsii.Bool(true), // https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-exec.html#ecs-exec-considerations
		}),
		Logging: ecs.LogDriver_AwsLogs(&ecs.AwsLogDriverProps{
			StreamPrefix: stackName,
			LogRetention: logs.RetentionDays_THREE_MONTHS, // TODO: determine optimal value
		}),
		Secrets: &map[string]ecs.Secret{
			"FAKTORY_LICENSE": ecs.Secret_FromSecretsManager(faktoryLicense, nil),
		},
		StartTimeout: cdk.Duration_Seconds(jsii.Number(60)), // Recommended values from the Faktory wiki
		StopTimeout:  cdk.Duration_Seconds(jsii.Number(60)), // Recommended values from the Faktory wiki
		PortMappings: &[]*ecs.PortMapping{
			{
				ContainerPort: jsii.Number(faktoryJobsPort),
				HostPort:      jsii.Number(faktoryJobsPort),
			},
			{
				ContainerPort: jsii.Number(faktoryWebPort),
				HostPort:      jsii.Number(faktoryWebPort),
			},
		},
	})

	// Add a CloudWatch Agent sidecar for Faktory metrics statsd collection
	// https://github.com/contribsys/faktory/wiki/Ent-Metrics
	// https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/deploy_servicelens_CloudWatch_agent_deploy_ECS.html#deploy_servicelens_CloudWatch_agent_deploy_ECS_definition_Fargate
	// https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-custom-metrics-statsd.html
	// https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html
	cwAgentConfig := fmt.Sprintf(`{"metrics":{"namespace":"%s-faktory-metrics","metrics_collected":{"statsd":{"service_address":":%d","metrics_collection_interval":30,"metrics_aggregation_interval":30}}}}`, *stackName, statsdPort)
	faktoryTaskDef.AddContainer(jsii.String("CWAgentSidecar"), &ecs.ContainerDefinitionOptions{
		Image:     ecs.ContainerImage_FromRegistry(jsii.String("public.ecr.aws/cloudwatch-agent/cloudwatch-agent:latest"), &ecs.RepositoryImageProps{}),
		Essential: jsii.Bool(true),
		Environment: &map[string]*string{
			"CW_CONFIG_CONTENT": jsii.String(cwAgentConfig),
		},
		Logging: ecs.LogDriver_AwsLogs(&ecs.AwsLogDriverProps{
			StreamPrefix: stackName,
			LogRetention: logs.RetentionDays_THREE_MONTHS,
		}),
		PortMappings: &[]*ecs.PortMapping{
			{
				ContainerPort: jsii.Number(statsdPort),
				HostPort:      jsii.Number(statsdPort),
			},
		},
	})
	cwMetricsPolicy := iam.NewPolicyStatement(&iam.PolicyStatementProps{
		Effect: iam.Effect_ALLOW,
		Actions: &[]*string{
			jsii.String("cloudwatch:PutMetricData"),
		},
		Resources: &[]*string{
			jsii.String("*"),
		},
	})
	faktoryTaskDef.AddToTaskRolePolicy(cwMetricsPolicy)

	// Create service via ecspatterns
	faktorySvc := ecspatterns.NewNetworkMultipleTargetGroupsFargateService(as.Stack, jsii.String("Faktory"), &ecspatterns.NetworkMultipleTargetGroupsFargateServiceProps{
		Cluster:              faktoryCluster,
		TaskDefinition:       faktoryTaskDef,
		EnableExecuteCommand: jsii.Bool(true),
		PropagateTags:        ecs.PropagatedTagSource_SERVICE,
		LoadBalancers: &[]*ecspatterns.NetworkLoadBalancerProps{
			{
				Name: jsii.String("FaktoryNLB"),
				Listeners: &[]*ecspatterns.NetworkListenerProps{
					{
						Name: jsii.String("FaktoryJobsListener"),
						Port: jsii.Number(faktoryJobsPort),
					},
					{
						Name: jsii.String("FaktoryWebListener"),
						Port: jsii.Number(faktoryWebPort),
					},
				},
				PublicLoadBalancer: jsii.Bool(false),
			},
		},
		TargetGroups: &[]*ecspatterns.NetworkTargetProps{
			{
				ContainerPort: jsii.Number(faktoryJobsPort),
				Listener:      jsii.String("FaktoryJobsListener"),
			},
			{
				ContainerPort: jsii.Number(faktoryWebPort),
				Listener:      jsii.String("FaktoryWebListener"),
			},
		},
	})

	// Open up ingress to the cluster so that NLB health checks can pass
	// NOTE: This is more permissive than strictly necessary, but currently there doesn't appear to
	//       be a better solution for NLBs. See for more info:
	//       https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-register-targets.html
	//       https://github.com/aws/aws-cdk/issues/1490
	faktorySG := (*faktorySvc.Service().Connections().SecurityGroups())[0] // We assume only 1 SG
	faktorySG.AddIngressRule(
		ec2.Peer_Ipv4(as.vpc.VpcCidrBlock()),
		ec2.Port_TcpRange(jsii.Number(faktoryJobsPort), jsii.Number(faktoryWebPort)),
		jsii.String("Allows access to faktory from anywhere within the VPC CIDR block (needed for NLB health checks)"),
		jsii.Bool(false),
	)

	// Define the health checks
	for _, faktoryTG := range *faktorySvc.TargetGroups() {
		if int(*faktoryTG.DefaultPort()) == faktoryWebPort {
			faktoryTG.SetHealthCheck(&elb.HealthCheck{
				Protocol:         elb.Protocol_HTTP,
				HealthyHttpCodes: jsii.String("200"),
				Path:             jsii.String("/health"),
				Port:             jsii.String(strconv.Itoa(faktoryWebPort)),
			})
		}
		if int(*faktoryTG.DefaultPort()) == faktoryJobsPort {
			faktoryTG.SetHealthCheck(&elb.HealthCheck{
				Protocol: elb.Protocol_TCP,
				Port:     jsii.String(strconv.Itoa(faktoryJobsPort)),
			})
		}
	}

	// ************************************************************************
	// SET UP FAKTORY EFS VOLUME FOR PERSISTENT STORAGE
	// ************************************************************************
	// Create FS encryption key
	faktoryFSEncKey := kms.NewKey(as.Stack, jsii.String("FaktoryFSEncryptionKey"), &kms.KeyProps{
		Alias:             jsii.String(fmt.Sprintf("efs-%s-faktory-fs", *stackName)),
		EnableKeyRotation: jsii.Bool(false), // NOTE: key rotation can lead to expoenetial kms cost increases
	})

	// Create the file system
	faktoryFS := efs.NewFileSystem(as.Stack, jsii.String("FaktoryFS"), &efs.FileSystemProps{
		Vpc:                         as.vpc,
		EnableAutomaticBackups:      jsii.Bool(false),
		Encrypted:                   jsii.Bool(true),
		KmsKey:                      faktoryFSEncKey,
		LifecyclePolicy:             efs.LifecyclePolicy_AFTER_90_DAYS,
		OutOfInfrequentAccessPolicy: efs.OutOfInfrequentAccessPolicy_AFTER_1_ACCESS,
		PerformanceMode:             efs.PerformanceMode_GENERAL_PURPOSE,
		SecurityGroup:               faktorySG,
		ThroughputMode:              efs.ThroughputMode_ELASTIC,
		VpcSubnets:                  &ec2.SubnetSelection{Subnets: as.vpc.PrivateSubnets()},
		RemovalPolicy:               cdk.RemovalPolicy_DESTROY,
	})

	// Create the FS access point
	faktoryFSAP := faktoryFS.AddAccessPoint(jsii.String("FaktoryFSAccessPoint"), &efs.AccessPointOptions{
		Path: jsii.String("/var/lib/faktory"),
		CreateAcl: &efs.Acl{
			OwnerGid:    jsii.String("1000"),
			OwnerUid:    jsii.String("1000"),
			Permissions: jsii.String("755"),
		},
		PosixUser: &efs.PosixUser{
			Gid: jsii.String("1000"),
			Uid: jsii.String("1000"),
		},
	})

	// Open up FS ingress from the faktory service
	faktorySG.AddIngressRule(
		faktorySG,
		ec2.Port_Tcp(jsii.Number(2049)), // NFS service
		jsii.String("Allows access to EFS NFS service from faktory"),
		jsii.Bool(false),
	)

	// Define the volume
	faktoryFSConfig := ecs.EfsVolumeConfiguration{
		FileSystemId: jsii.String(*faktoryFS.FileSystemId()),
		AuthorizationConfig: &ecs.AuthorizationConfig{
			AccessPointId: faktoryFSAP.AccessPointId(),
			Iam:           jsii.String("ENABLED"),
		},
		TransitEncryption: jsii.String("ENABLED"),
	}
	faktoryFSVolume := ecs.Volume{
		Name:                   jsii.String("FaktoryFSVolume"),
		EfsVolumeConfiguration: &faktoryFSConfig,
	}

	// Add/mount the volume
	faktoryTaskDef.AddVolume(&ecs.Volume{
		Name:                   jsii.String("FaktoryFSVolume"),
		EfsVolumeConfiguration: &faktoryFSConfig,
	})
	faktoryContainer.AddMountPoints(&ecs.MountPoint{
		ContainerPath: jsii.String("/var/lib/faktory"),
		ReadOnly:      jsii.Bool(false),
		SourceVolume:  faktoryFSVolume.Name,
	})