Skip to content

Commit

Permalink
Fix README
Browse files Browse the repository at this point in the history
  • Loading branch information
andreyvelich committed Sep 16, 2020
1 parent edc119c commit e5dde94
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 23 deletions.
35 changes: 17 additions & 18 deletions examples/v1beta1/tekton/README.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,40 @@
# Katib examples with Tekton integration

Here you can find examples of using Katib with [Tekton](https://github.com/tektoncd/pipeline).
Check [here](https://github.com/tektoncd/pipeline/blob/master/docs/install.md#installing-tekton-pipelines-on-kubernetes) how to install Tekton on your cluster.

**Note** that you must modify Tekton [`nop`](https://github.com/tektoncd/pipeline/tree/master/cmd/nop) image to run Tekton pipelines. `Nop` images is used to stop sidecar containers after main container is completed. Metrics collector must be not stopped after training container is finished. To avoid this problem, `nop` image should be equal to metrics collector sidecar image.
Check [here](https://github.com/tektoncd/pipeline/blob/master/docs/install.md#installing-tekton-pipelines-on-kubernetes)
how to install Tekton on your cluster.

For example, if you are using [StdOut](https://www.kubeflow.org/docs/components/hyperparameter-tuning/experiment/#metrics-collector) metrics collector, `nop` image must be equal to `gcr.io/kubeflow-images-public/katib/v1beta1/file-metrics-collector`.
**Note** that you must modify Tekton [`nop`](https://github.com/tektoncd/pipeline/tree/master/cmd/nop)
image to run Tekton pipelines. `Nop` image is used to stop sidecar containers after main container
is completed. Metrics collector should not be stopped after training container is finished.
To avoid this problem, set `nop` image to metrics collector sidecar image.

After deploying Tekton on your cluster, run bellow command to modify `nop` image.
For example, if you are using
[StdOut](https://www.kubeflow.org/docs/components/hyperparameter-tuning/experiment/#metrics-collector) metrics collector,
`nop` image must be equal to `gcr.io/kubeflow-images-public/katib/v1beta1/file-metrics-collector`.

After deploying Tekton on your cluster, run bellow command to modify `nop` image:

```bash
kubectl patch deploy tekton-pipelines-controller -n tekton-pipelines --type='json' \
-p='[{"op": "replace", "path": "/spec/template/spec/containers/0/args/9", "value": "gcr.io/kubeflow-images-public/katib/v1beta1/file-metrics-collector"}]'
-p='[{"op": "replace", "path": "/spec/template/spec/containers/0/args/9", "value": "gcr.io/kubeflow-images-public/katib/v1beta1/file-metrics-collector"}]'
```

Check that Tekton controller's pod was restarted:

```
kubectl get pods -n tekton-pipelines
```

Expected output:
```bash
$ kubectl get pods -n tekton-pipelines

```
NAME READY STATUS RESTARTS AGE
tekton-pipelines-controller-7fcb6c6cd4-p8zf2 1/1 Running 0 2m2s
tekton-pipelines-webhook-7f9888f9b-7d6mr 1/1 Running 0 12h
```

Check that `nop` image was modified:

```
kubectl get pod <tekton-controller-pod-name> -n tekton-pipelines- -o yaml | grep katib/v1beta1/file-metrics-collector
```

Expected output:
```bash
$ kubectl get pod <tekton-controller-pod-name> -n tekton-pipelines -o yaml | grep katib/v1beta1/file-metrics-collector

```
- gcr.io/kubeflow-images-public/katib/v1beta1/file-metrics-collector
- gcr.io/kubeflow-images-public/katib/v1beta1/file-metrics-collector
```
8 changes: 3 additions & 5 deletions examples/v1beta1/tekton/pipeline-run.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
# This examples shows how you can use Tekton Pipelines in Katib.
# PipelineRun shows how you can transfer parameters from one Task to another and run HP job.
# This examples shows how you can use Tekton Pipelines in Katib, transfer parameters from one Task to another and run HP job.
# It uses simple random algorithm and tunes only learning rate.
# Pipelines contains 2 Tasks, first is data-preprocessing second is model-training.
# First Task shows how you can prepare your training data (simply divide number of training examples) before running HP job.
# Number of examples is transferred to the second Task.
# First Task shows how you can prepare your training data (here: simply divide number of training examples) before running HP job.
# Number of training examples is transferred to the second Task.
# Second Task is the actual training which metrics collector sidecar is injected.
# Note that for this example Tekton controller's nop image must be equal to StdOut metrics collector image.
apiVersion: "kubeflow.org/v1beta1"
Expand Down Expand Up @@ -99,6 +98,5 @@ spec:
command:
- "python3"
- "/opt/mxnet-mnist/mnist.py"
- "--batch-size=64"
- "--num-examples=$(params.num-examples)"
- "--lr=$(params.lr)"

0 comments on commit e5dde94

Please sign in to comment.