- Create EKS infra using TF
- Prep data and send to S3
- Auto serve S3 data using FSx for Lustre file-system
- Launch training jobs using Helm
- Test trained models using Jupyter Labs as a K8s service (access endpoints via LB IP)
- Automated CI for image build
- Automated CI for TF builds