Skip to content

AWS summit seoul 2019 - How to scale your ml job with Amazon EKS Demo

License

Notifications You must be signed in to change notification settings

hongkunyoo/eks-ml

Repository files navigation

eks-ml

AWS summit seoul 2019 - How to scale your ml job with Amazon EKS

Scripts description

  • setup.sh: setup script before demo
  • train.py: model script to train
  • train-oom.py: model script to demonstrate OOM error
  • example.yaml: example of kubernetes job
  • oom.yaml: example of OOM killed job
  • experiments.yaml: list of experiment to train
  • run-experiments.py: python code to run experiments
  • Dockerfile: file to make docker image hongkunyoo/eks-ml
  • Dockerfile.oom: file to make OOM training image hongkunyoo/eks-ml:oom

1. Configuration

Install awscli & configure ACCESS_KEY and SECRET_KEY. The user should have following permissions.

  • EKS full access
  • CloudFormation full access
  • EC2 full access
  • IAM full access

these permissions are quite administrative. please use it at your own risk.

2. Setup

Run setup.sh as sudo user.

sudo ./setup.sh

3. Hello world

Apply first example of job

kubectl apply -f example.yaml

4. Run all

Run all experiments

python run-experiments.py

About

AWS summit seoul 2019 - How to scale your ml job with Amazon EKS Demo

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published