Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm charts refactor proposal #139

Merged
merged 16 commits into from
Sep 19, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ script:
# fail fast
- set -e
- export MAKE_ARGS=--no-print-directory
- export VM_TYPE=minikube
# Open SSH
# - echo travis:$sshpassword | sudo chpasswd
# - sudo sed -i 's/ChallengeResponseAuthentication no/ChallengeResponseAuthentication yes/' /etc/ssh/sshd_config
Expand All @@ -48,9 +49,14 @@ script:
- make $MAKE_ARGS gen-certs
- make $MAKE_ARGS build
- make $MAKE_ARGS docker-build
- make $MAKE_ARGS create-volumes
# deploy services
- make $MAKE_ARGS deploy
# initialize helm tiller
- helm init
- while ! (kubectl get pods --all-namespaces | grep tiller-deploy | grep '1/1'); do sleep 5; done
# Prepare any local/static volume as the shared file system and deploy all the helper micro-services for ffdl
- helm install docs/helm-charts/ffdl-helper-0.1.1.tgz --set prometheus.deploy=false,localstorage=true --wait
# Deploy all the core ffdl services.
- export IMAGE_TAG=user-$(whoami)
- helm install docs/helm-charts/ffdl-core-0.1.1.tgz --set trainer.version=${IMAGE_TAG},restapi.version=${IMAGE_TAG},lcm.version=${IMAGE_TAG},trainingdata.version=${IMAGE_TAG},databroker.tag=${IMAGE_TAG},databroker.version=${IMAGE_TAG},webui.version=${IMAGE_TAG} --wait
# submit a test job
- make $MAKE_ARGS test-submit-minikube-ci

Expand Down
4 changes: 0 additions & 4 deletions Chart.yaml

This file was deleted.

2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ WHOAMI ?= $(shell whoami)
IMAGE_TAG ?= user-$(WHOAMI)
TEST_SAMPLE ?= tf-model
# VM_TYPE is "vagrant", "minikube" or "none"
VM_TYPE ?= minikube
VM_TYPE ?= none
HAS_STATIC_VOLUMES?=false
TEST_USER = test-user
SET_LOCAL_ROUTES ?= 0
Expand Down
65 changes: 30 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,8 @@ To know more about the architectural details, please read the [design document](
* `helm`: The Kubernetes package manager (https://helm.sh)
* `docker`: The Docker command-line interface (https://www.docker.com/)
* `S3 CLI`: The [command-line interface](https://aws.amazon.com/cli/) to configure your Object Storage
* An existing Kubernetes cluster (e.g., [Kubeadm-DIND](https://github.com/kubernetes-sigs/kubeadm-dind-cluster#using-preconfigured-scripts) for local testing).
* An existing Kubernetes cluster (e.g., [Kubeadm-DIND](https://github.com/kubernetes-sigs/kubeadm-dind-cluster#using-preconfigured-scripts) for local testing or Follow the appropriate instructions for standing up your Kubernetes cluster using [IBM Cloud Public](https://github.com/IBM/container-journey-template/blob/master/README.md) or [IBM Cloud Private](https://github.com/IBM/deploy-ibm-cloud-private/blob/master/README.md)). The minimum capacity requirement for FfDL is 4GB Memory and 3 CPUs.
<!-- For Minikube, use the command `make minikube` to start Minikube and set up local network routes. Minikube **v0.25.1** is tested with Travis CI. -->
* Follow the appropriate instructions for standing up your Kubernetes cluster using [IBM Cloud Public](https://github.com/IBM/container-journey-template/blob/master/README.md) or [IBM Cloud Private](https://github.com/IBM/deploy-ibm-cloud-private/blob/master/README.md)
* The minimum capacity requirement for FfDL is 4GB Memory and 3 CPUs.

## Usage Scenarios

Expand All @@ -37,8 +35,8 @@ To know more about the architectural details, please read the [design document](
## Steps

1. [Quick Start](#1-quick-start)
- 1.1 [Installation using Kubeadm-DIND](#11-installation-using-kubeadm-dind)
- 1.2 [Installation using Kubernetes Cluster](#12-installation-using-kubernetes-cluster)
- 1.1 [Installation using Kubernetes Cluster](#11-installation-using-kubernetes-cluster)
- 1.2 [Installation using Kubeadm-DIND](#12-installation-using-kubeadm-dind)
2. [Test](#2-test)
3. [Monitoring](#3-monitoring)
4. [Development](#4-development)
Expand All @@ -48,57 +46,55 @@ To know more about the architectural details, please read the [design document](

## 1. Quick Start

There are multiple installation paths for installing FfDL into an existing Kubernetes cluster. Below are the steps for quick install. If you want to follow more detailed step by step instructions , please visit [the detailed installation guide](docs/detailed-installation-guide.md)
There are multiple installation paths for installing FfDL into an existing Kubernetes cluster. Below are the steps for quick install. If you want to follow more detailed step by step instructions , please visit [the detailed installation guide](docs/detailed-installation-guide.md)

> If you are using bash shell, you can modify the necessary environment variables in `env.txt` and export all of them using the following commands
> ```shell
> source env.txt
> export $(cut -d= -f1 env.txt)
> ```
* You need to initialize tiller with `helm init` before running the following commands.

### 1.1 Installation using Kubeadm-DIND
### 1.1 Installation using Kubernetes Cluster

To install FfDL to any proper Kubernetes cluster, make sure `kubectl` points to the right namespace,
then deploy the platform services:

If you have [Kubeadm-DIND](https://github.com/kubernetes-sigs/kubeadm-dind-cluster#using-preconfigured-scripts) installed on your machine, use these commands to deploy the FfDL platform:
``` shell
export VM_TYPE=dind
export PUBLIC_IP=localhost
export SHARED_VOLUME_STORAGE_CLASS="";
export NAMESPACE=default # If your namespace does not exist yet, please create the namespace `kubectl create namespace $NAMESPACE` before running the make commands below
export SHARED_VOLUME_STORAGE_CLASS="ibmc-file-gold" # Change the storage class to what's available on your Cloud Kubernetes Cluster.

make deploy-plugin
make quickstart-deploy
helm install ibmcloud-object-storage-plugin --name ibmcloud-object-storage-plugin --repo https://ibm.github.io/FfDL/helm-charts --set namespace=$NAMESPACE # Configure s3 driver on the cluster
helm install ffdl-helper --name ffdl-helper --repo https://ibm.github.io/FfDL/helm-charts --set namespace=$NAMESPACE,shared_volume_storage_class=$SHARED_VOLUME_STORAGE_CLASS --wait # Deploy all the helper micro-services for ffdl
helm install ffdl-core --name ffdl-core --repo https://ibm.github.io/FfDL/helm-charts --set namespace=$NAMESPACE,lcm.shared_volume_storage_class=$SHARED_VOLUME_STORAGE_CLASS --wait # Deploy all the core ffdl services.
```

### 1.2 Installation using Kubernetes Cluster

To install FfDL to any proper Kubernetes cluster, make sure `kubectl` points to the right namespace,
then deploy the platform services:
> Note: For PUBLIC_IP, put down one of your Cluster Public IP that can access your Cluster's NodePorts. For IBM Cloud, you can get your Public IP with `bx cs workers <cluster_name>`.
### 1.2 Installation using Kubeadm-DIND

If you have [Kubeadm-DIND](https://github.com/kubernetes-sigs/kubeadm-dind-cluster#using-preconfigured-scripts) installed on your machine, use these commands to deploy the FfDL platform:
``` shell
export VM_TYPE=none
export PUBLIC_IP=<Cluster Public IP>
export NAMESPACE=default # If your namespace does not exist yet, please create the namespace `kubectl create namespace $NAMESPACE` before running the make commands below
export SHARED_VOLUME_STORAGE_CLASS=""
export NAMESPACE=default

# Change the storage class to what's available on your Cloud Kubernetes Cluster.
export SHARED_VOLUME_STORAGE_CLASS="ibmc-file-gold";
./bin/s3_driver.sh # Copy the s3 drivers to each of the DIND node
helm install ibmcloud-object-storage-plugin --name ibmcloud-object-storage-plugin --repo https://ibm.github.io/FfDL/helm-charts --set namespace=$NAMESPACE,cloud=false
helm install ffdl-helper --name ffdl-helper --repo https://ibm.github.io/FfDL/helm-charts --set namespace=$NAMESPACE,shared_volume_storage_class=$SHARED_VOLUME_STORAGE_CLASS,localstorage=true --wait
helm install ffdl-core --name ffdl-core --repo https://ibm.github.io/FfDL/helm-charts --set namespace=$NAMESPACE,lcm.shared_volume_storage_class=$SHARED_VOLUME_STORAGE_CLASS --wait

make deploy-plugin
make quickstart-deploy
# Forward the necessary microservices from the DIND cluster to your localhost.
./bin/dind-port-forward.sh
```

## 2. Test

To submit a simple example training job that is included in this repo (see `etc/examples` folder):
> Note: For PUBLIC_IP, put down one of your Cluster Public IP that can access your Cluster's NodePorts. You can check your Cluster Public IP with `kubectl get nodes -o wide`.
> For IBM Cloud, you can get your Public IP with `bx cs workers <cluster_name>`.

``` shell
export PUBLIC_IP=<Cluster Public IP> # Put down localhost if you are running with Kubeadm-DIND
make test-push-data-s3
make test-job-submit
```

## 3. Monitoring

The platform ships with a simple Grafana monitoring dashboard. The URL is printed out when running the `deploy` make target.
The platform ships with a simple Grafana monitoring dashboard. The URL is printed out when running the `status` make target.

## 4. Development

Expand All @@ -107,12 +103,11 @@ Please refer to the [developer guide](docs/developer-guide.md) for more details.
## 5. Clean Up
If you want to remove FfDL from your cluster, simply use the following commands.
```shell
helm delete $(helm list | grep ffdl | awk '{print $1}' | head -n 1)
helm delete --purge ffdl-core ffdl-helper
```
If you want to remove the storage driver and pvc from your cluster, run:
If you want to remove the storage driver from your cluster, run:
```shell
kubectl delete pvc static-volume-1
helm delete $(helm list | grep ibmcloud-object-storage-plugin | awk '{print $1}' | head -n 1)
helm delete --purge ibmcloud-object-storage-plugin
```
For Kubeadm-DIND, you need to kill your forwarded ports. Note that the below command will kill all the ports that are created with `kubectl`.
```shell
Expand Down
4 changes: 2 additions & 2 deletions bin/s3_driver.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
declare -a arrNodes=($(docker ps --format '{{.Names}}' | grep "kube-node-\|kube-master"))
for node in "${arrNodes[@]}"
do
docker cp $FFDL_PATH/bin/ibmc-s3fs $node:/root/ibmc-s3fs
docker cp $FFDL_PATH/bin/s3fs $node:/usr/local/bin/s3fs
docker cp ./bin/ibmc-s3fs $node:/root/ibmc-s3fs
docker cp ./bin/s3fs $node:/usr/local/bin/s3fs
docker exec -i $node /bin/bash <<_EOF
apt-get -y update
apt-get -y install s3fs
Expand Down
Loading