Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

Latest commit

 

History

History
67 lines (59 loc) · 2.42 KB

File metadata and controls

67 lines (59 loc) · 2.42 KB

Docker

Follow these instructions to set up and run our provided Docker image.

Set Up Docker Engine and Docker Compose

You'll need to install Docker Engine on your development system. Note that while Docker Engine is free to use, Docker Desktop may require you to purchase a license. See the Docker Engine Server installation instructions for details.

To build and run this workload inside a Docker Container, ensure you have Docker Compose installed on your machine. If you don't have this tool installed, consult the official Docker Compose installation documentation.

DOCKER_CONFIG=${DOCKER_CONFIG:-$HOME/.docker}
mkdir -p $DOCKER_CONFIG/cli-plugins
curl -SL https://github.com/docker/compose/releases/download/v2.24.5/docker-compose-linux-x86_64 -o $DOCKER_CONFIG/cli-plugins/docker-compose
chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose
docker compose version

Set Up Docker Image

Build or Pull the provided docker images.

cd docker
docker compose build

OR

docker pull intel/intel-extension-for-transformers:1.4.0
docker pull intel/intel-extension-for-transformers:devel-1.4.0

Use Docker Image

Utilize the TLT CLI without installation by using the provided docker image and docker compose.

docker compose run devel
docker compose run devel python setup.py sdist
docker compose run devel python tests/<test_mytest>.py

Kubernetes

1. Install Helm

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 && \
chmod 700 get_helm.sh && \
./get_helm.sh

2. Setting up Training Operator

Install the standalone operator from GitHub/Artifacthub or use a pre-existing Kubeflow configuration.

kubectl apply -k "github.com/kubeflow/training-operator/manifests/overlays/standalone"

OR

helm repo add cowboysysop https://cowboysysop.github.io/charts/
helm install <release name> cowboysysop/training-operator

3. Deploy ITREX Distributed Job

For more customization information, see the chart README

export NAMESPACE=kubeflow
helm install --namespace ${NAMESPACE} --set ... itrex-distributed ./chart

4. View

To view your workflow progress

kubectl get -o yaml pytorchjob itrex-distributed -n ${NAMESPACE}