Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create ensembler web service #165

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
4c50590
Add skeleton class for pyfunc ensembler
deadlycoconuts Feb 4, 2022
9cd9384
Refactor ensembler class in pyfunc to accept both batch and live requ…
deadlycoconuts Feb 4, 2022
97d97bd
Add supporting classes for live pyfunc ensembler
deadlycoconuts Feb 4, 2022
c39a7ac
Add preprocessing methods for live ensembler
deadlycoconuts Feb 6, 2022
77331ff
Update PyFunc ensembler in SDK to utilise returned treatment_config
deadlycoconuts Feb 6, 2022
3cab352
Modify predict method in SDK PyFunc to allow backward compatibility w…
deadlycoconuts Feb 6, 2022
c863c08
Set output from prediction to be a list-like object
deadlycoconuts Feb 6, 2022
dbaf6e1
Remove redundant header names for features in PyFunc
deadlycoconuts Feb 7, 2022
062d55e
Rename PyFuncEnsembler to PyFuncEnsemblerRunner to remove overloaded …
deadlycoconuts Feb 7, 2022
0e296b1
Rename references to renamed PyFuncEnsemblerRunner
deadlycoconuts Feb 7, 2022
b0fca59
Add docstrings to various methods
deadlycoconuts Feb 7, 2022
9cec5da
Add README template
deadlycoconuts Feb 7, 2022
2cf245e
Add base files for containerisation
deadlycoconuts Feb 8, 2022
4281458
Make container use a multi-stage build that use a venv derived from a…
deadlycoconuts Feb 8, 2022
5162f5f
Rename preprocess method to make it appear private
deadlycoconuts Feb 8, 2022
096d277
Add gitignore file
deadlycoconuts Feb 8, 2022
686a3cc
Add test for preprocessing method for pyfunc_ensembler_runner
deadlycoconuts Feb 8, 2022
5f96038
Cleanup some testing configurations
deadlycoconuts Feb 8, 2022
0690e40
Rename test sample data to improve consistency in naming
deadlycoconuts Feb 8, 2022
1aa4803
Remove test request
deadlycoconuts Feb 8, 2022
472572a
Add additional tests for web service
deadlycoconuts Feb 11, 2022
2f0438d
Add files for containerisation
deadlycoconuts Feb 11, 2022
ddb43f4
Rename live-ensembler to real-time-ensembler
deadlycoconuts Feb 11, 2022
ea81728
Add github workflow for real-time-ensembler
deadlycoconuts Feb 11, 2022
854553e
Edit typo in workflow
deadlycoconuts Feb 11, 2022
78f58c5
Edit typo in readme file
deadlycoconuts Feb 11, 2022
06b769a
Add changes missed out by rebasing
deadlycoconuts Feb 11, 2022
8c2ec33
Edit typo in exception message
deadlycoconuts Feb 11, 2022
440ba05
Separate dockerfiles into a base and app file
deadlycoconuts Feb 11, 2022
b7381e7
Edit typo in dockerfile
deadlycoconuts Feb 15, 2022
0d7fe79
Rename real-time ensembler module and mentions to pyfunc-ensembler-se…
deadlycoconuts Feb 15, 2022
d5b9a40
Rename batch-ensembler module and mentions with pyfunc-ensembler-job
deadlycoconuts Feb 15, 2022
31c6f36
Rename remnants of ensemblers with old naming convention
deadlycoconuts Feb 15, 2022
ac5fcec
Add new pyfunc-ensembler-service engine to Turing CI
deadlycoconuts Feb 15, 2022
9a36f04
Replace vanilla debian image with its slim version
deadlycoconuts Feb 15, 2022
0324af1
Clean up dockerfiles to utilise env variables
deadlycoconuts Feb 15, 2022
48b4e30
Replace redundant run.sh script by running webservice from dockerfile
deadlycoconuts Feb 15, 2022
48074e0
Remove redundant entries in .gitignore
deadlycoconuts Feb 15, 2022
c8f9b8b
Rename batch ensembler to pyfunc-ensembler-job
deadlycoconuts Feb 15, 2022
1bce96a
Revamp pyfunc implementation to avoid dataframe manipulations for rea…
deadlycoconuts Feb 15, 2022
03ac53a
Remove redundant imports
deadlycoconuts Feb 15, 2022
4e9841a
Replace incorrect env variables in dockerfiles
deadlycoconuts Feb 16, 2022
9360625
Refactor pyfunc predict method to use helper methods dependent on inp…
deadlycoconuts Feb 16, 2022
57d956f
Rewrite help tags for arg parser
deadlycoconuts Feb 16, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,25 +1,25 @@
name: engines/batch-ensembler
name: engines/pyfunc-ensembler-job

on:
# Automatically run CI on Release and Pre-Release tags and main branch
# (only if there are changes to relevant paths)
push:
tags:
- "batch-ensembler/v[0-9]+.[0-9]+.[0-9]+*"
- "pyfunc-ensembler-job/v[0-9]+.[0-9]+.[0-9]+*"
branches:
- main
paths:
- ".github/workflows/batch-ensembler.yaml"
- "engines/batch-ensembler/**"
- ".github/workflows/pyfunc-ensembler-job.yaml"
- "engines/pyfunc-ensembler-job/**"
- "sdk/**"

# Automatically run CI on branches, that have active PR opened
pull_request:
branches:
- main
paths:
- ".github/workflows/batch-ensembler.yaml"
- "engines/batch-ensembler/**"
- ".github/workflows/pyfunc-ensembler-job.yaml"
- "engines/pyfunc-ensembler-job/**"
- "sdk/**"

# To make it possible to trigger e2e CI workflow for any arbitrary git ref
Expand Down Expand Up @@ -50,13 +50,13 @@ jobs:
- name: Cache Conda environment
uses: actions/cache@v2
with:
path: engines/batch-ensembler/env
path: engines/pyfunc-ensembler-job/env
key: |
conda-${{ hashFiles('engines/batch-ensembler/environment.yaml') }}-${{ hashFiles('engines/batch-ensembler/requirements.txt') }}-${{ hashFiles('engines/batch-ensembler/requirements.dev.txt') }}
conda-${{ hashFiles('engines/pyfunc-ensembler-job/environment.yaml') }}-${{ hashFiles('engines/pyfunc-ensembler-job/requirements.txt') }}-${{ hashFiles('engines/pyfunc-ensembler-job/requirements.dev.txt') }}
restore-keys: conda-

- name: Run Tests
working-directory: engines/batch-ensembler
working-directory: engines/pyfunc-ensembler-job
run: |
make setup
make test
Expand All @@ -70,7 +70,7 @@ jobs:
- id: release-rules
uses: ./.github/actions/release-rules
with:
prefix: batch-ensembler/
prefix: pyfunc-ensembler-job/

publish:
# Automatically publish release and pre-release artifacts.
Expand Down Expand Up @@ -103,13 +103,13 @@ jobs:

- name: Build Docker Image
id: build
working-directory: engines/batch-ensembler
working-directory: engines/pyfunc-ensembler-job
env:
DOCKER_REGISTRY: ghcr.io/${{ github.repository }}
run: |
set -o pipefail
make build-image | tee output.log
echo "::set-output name=ensembler-image::$(sed -n 's%Building docker image: \(.*\)%\1%p' output.log)"
echo "::set-output name=pyfunc-ensembler-job::$(sed -n 's%Building docker image: \(.*\)%\1%p' output.log)"

- name: Publish Batch Ensembler Docker Image
run: docker push ${{ steps.build.outputs.ensembler-image }}
- name: Publish Pyfunc Ensembler Job Docker Image
run: docker push ${{ steps.build.outputs.pyfunc-ensembler-job }}
109 changes: 109 additions & 0 deletions .github/workflows/pyfunc-ensembler-service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
name: engines/pyfunc-ensembler-service

on:
# Automatically run CI on Release and Pre-Release tags and main branch
# (only if there are changes to relevant paths)
push:
tags:
- "pyfunc-ensembler-service/v[0-9]+.[0-9]+.[0-9]+*"
branches:
- main
paths:
- ".github/workflows/pyfunc-ensembler-service.yaml"
- "engines/pyfunc-ensembler-service/**"
- "sdk/**"

# Automatically run CI on branches, that have active PR opened
pull_request:
branches:
- main
paths:
- ".github/workflows/pyfunc-ensembler-service.yaml"
- "engines/pyfunc-ensembler-service/**"
- "sdk/**"

# To make it possible to trigger e2e CI workflow for any arbitrary git ref
workflow_dispatch:

jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2

- name: Setup Python
uses: actions/setup-python@v2
with:
python-version: 3.8

- name: Setup Conda
uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true

- name: Cache Conda environment
uses: actions/cache@v2
with:
path: engines/pyfunc-ensembler-service/env
key: |
conda-${{ hashFiles('engines/pyfunc-ensembler-service/environment.yaml') }}-${{ hashFiles('engines/pyfunc-ensembler-service/requirements.txt') }}-${{ hashFiles('engines/pyfunc-ensembler-service/requirements.dev.txt') }}
restore-keys: conda-

- name: Run Tests
working-directory: engines/pyfunc-ensembler-service
run: |
make setup
make test

release-rules:
runs-on: ubuntu-latest
outputs:
release-type: ${{ steps.release-rules.outputs.release-type }}
steps:
- uses: actions/checkout@v2
- id: release-rules
uses: ./.github/actions/release-rules
with:
prefix: pyfunc-ensembler-service/

publish:
# Automatically publish release and pre-release artifacts.
#
# As for dev releases, make it possible to publish artifacts
# manually by approving 'deployment' in the 'manual' environment.
#
# Dev build can be released either from the 'main' branch or
# by running this workflow manually with `workflow_dispatch` event.
if: >-
contains('release,pre-release', needs.release-rules.outputs.release-type)
|| ( github.event_name != 'pull_request' )
|| ( github.event.pull_request.head.repo.full_name == github.repository )
environment: ${{ needs.release-rules.outputs.release-type == 'dev' && 'manual' || '' }}
runs-on: ubuntu-latest
needs:
- release-rules
- test
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0

- name: Log in to the Container registry
uses: docker/login-action@v1
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Build Docker Image
id: build
working-directory: engines/pyfunc-ensembler-service
env:
DOCKER_REGISTRY: ghcr.io/${{ github.repository }}
run: |
set -o pipefail
make build-image | tee output.log
echo "::set-output name=pyfunc-ensembler-service-image::$(sed -n 's%Building docker image: \(.*\)%\1%p' output.log)"

- name: Publish Pyfunc Ensembler Service Docker Image
run: docker push ${{ steps.build.outputs.pyfunc-ensembler-service-image }}
12 changes: 8 additions & 4 deletions .github/workflows/turing.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,11 @@ on:
- main
paths-ignore:
- "docs/**"
- "engines/batch-ensembler/**"
- "engines/pyfunc-ensembler-job/**"
- "engines/pyfunc-ensembler-service/**"
- "sdk/**"
- ".github/workflows/batch-ensembler.yaml"
- ".github/workflows/pyfunc-ensembler-job.yaml"
- ".github/workflows/pyfunc-ensembler-service.yaml"
- ".github/workflows/sdk.yaml"
- ".github/workflows/helm-chart.yaml"
- ".github/workflows/cluster-init.yaml"
Expand All @@ -23,9 +25,11 @@ on:
- main
paths-ignore:
- "docs/**"
- "engines/batch-ensembler/**"
- "engines/pyfunc-ensembler-job/**"
- "engines/pyfunc-ensembler-service/**"
- "sdk/**"
- ".github/workflows/batch-ensembler.yaml"
- ".github/workflows/pyfunc-ensembler-job.yaml"
- ".github/workflows/pyfunc-ensembler-service.yaml"
- ".github/workflows/sdk.yaml"
- ".github/workflows/helm-chart.yaml"

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,10 +180,10 @@ BatchEnsemblingConfig:
BuildNamespace: default
BuildTimeoutDuration: 20m
DestinationRegistry: ghcr.io
BaseImageRef: ghcr.io/gojek/turing/batch-ensembler:latest
BaseImageRef: ghcr.io/gojek/turing/pyfunc-ensembler-job:latest
KanikoConfig:
BuildContextURI: git://github.com/gojek/turing.git#refs/heads/main
DockerfileFilePath: engines/batch-ensembler/app.Dockerfile
DockerfileFilePath: engines/pyfunc-ensembler-job/app.Dockerfile
Image: gcr.io/kaniko-project/executor
ImageVersion: v1.6.0
ResourceRequestsLimits:
Expand Down
4 changes: 2 additions & 2 deletions api/config-dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ BatchEnsemblingConfig:
BuildNamespace: default
BuildTimeoutDuration: 20m
DestinationRegistry: ghcr.io
BaseImageRef: ghcr.io/gojek/turing/batch-ensembler:latest
BaseImageRef: ghcr.io/gojek/turing/pyfunc-ensembler-job:latest
KanikoConfig:
BuildContextURI: git://github.com/gojek/turing.git#refs/heads/main
DockerfileFilePath: engines/batch-ensembler/app.Dockerfile
DockerfileFilePath: engines/pyfunc-ensembler-job/app.Dockerfile
Image: gcr.io/kaniko-project/executor
ImageVersion: v1.6.0
ResourceRequestsLimits:
Expand Down
4 changes: 2 additions & 2 deletions api/turing/config/example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,10 @@ BatchEnsemblingConfig:
BuildNamespace: default
BuildTimeoutDuration: 20m
DestinationRegistry: ghcr.io
BaseImageRef: ghcr.io/gojek/turing/batch-ensembler:latest
BaseImageRef: ghcr.io/gojek/turing/pyfunc-ensembler-job:latest
KanikoConfig:
BuildContextURI: git://github.com/gojek/turing.git#refs/heads/main
DockerfileFilePath: engines/batch-ensembler/app.Dockerfile
DockerfileFilePath: engines/pyfunc-ensembler-job/app.Dockerfile
Image: gcr.io/kaniko-project/executor
ImageVersion: v1.6.0
ResourceRequestsLimits:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
SHELL := /bin/bash

APP_NAME := batch-ensembler
APP_NAME := pyfunc-ensembler-job
CONDA_ENV_NAME ?= $(APP_NAME)
ACTIVATE_ENV = source $$(conda info --base)/etc/profile.d/conda.sh ; conda activate ./env/$(CONDA_ENV_NAME)

Expand Down Expand Up @@ -39,4 +39,4 @@ build-image: version
.PHONY: version
version:
$(eval VERSION=$(if $(OVERWRITE_VERSION),$(OVERWRITE_VERSION),v$(shell ../../scripts/vertagen/vertagen.sh -p ${APP_NAME}/)))
@echo "turing-batch-ensembler version:" $(VERSION)
@echo "turing-pyfunc-ensembler-job version:" $(VERSION)
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: batch-ensembler
name: pyfunc-ensembler-job
dependencies:
- python=3.8
- pip=21.0.1
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
]

setuptools.setup(
name='batch-ensembler',
name='pyfunc-ensembler-job',
packages=setuptools.find_packages(),
install_requires=requirements,
dev_requirements=dev_requirements,
Expand Down
8 changes: 8 additions & 0 deletions engines/pyfunc-ensembler-service/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.gitignore
.dockerignore

env/
tests/

.mypy_cache/
.pytest_cache/
6 changes: 6 additions & 0 deletions engines/pyfunc-ensembler-service/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
env/
.coverage
**/mlruns/
**/__pycache__

ensembler/*
16 changes: 16 additions & 0 deletions engines/pyfunc-ensembler-service/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
FROM continuumio/miniconda3 AS builder

RUN wget -qO- https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-367.0.0-linux-x86_64.tar.gz | tar xzf -
ENV PATH=$PATH:/google-cloud-sdk/bin
ENV CONDA_ENV_NAME=${CONDA_ENV_NAME}
ENV APP_NAME=${APP_NAME}

COPY . .
COPY ./temp-deps/sdk ./../../sdk

RUN conda env create -f ./environment.yaml && \
conda env update --name ${CONDA_ENV_NAME} --file /ensembler/conda.yaml && \
rm -rf /root/.cache

# Install conda-pack:
RUN conda install -c conda-forge conda-pack
33 changes: 33 additions & 0 deletions engines/pyfunc-ensembler-service/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
SHELL := /bin/bash

APP_NAME := pyfunc-ensembler-service
CONDA_ENV_NAME ?= $(APP_NAME)
ACTIVATE_ENV = source $$(conda info --base)/etc/profile.d/conda.sh ; conda activate $(CONDA_ENV_NAME)

.PHONY: setup
setup: $(CONDA_ENV_NAME)
$(CONDA_ENV_NAME):
@conda env update -f environment.yaml --prune
$(ACTIVATE_ENV) && pip install -r requirements.dev.txt

.PHONY: test
test:
@$(ACTIVATE_ENV) && \
python -m pytest \
--cov=pyfunc_ensembler_runner \
--cov-report term-missing \
-W ignore

.PHONY: build-image
build-image: version
@mkdir -p temp-deps
@cp -r ../../sdk temp-deps/
@$(eval IMAGE_TAG = $(if $(DOCKER_REGISTRY),$(DOCKER_REGISTRY)/,)${APP_NAME}:${VERSION})
@echo "Building docker image: ${IMAGE_TAG}"
@docker build . --tag ${IMAGE_TAG}
@rm -rf temp-deps

.PHONY: version
version:
$(eval VERSION=$(if $(OVERWRITE_VERSION),$(OVERWRITE_VERSION),v$(shell ../../scripts/vertagen/vertagen.sh -p ${APP_NAME}/)))
@echo "turing-pyfunc-ensembler-service version:" $(VERSION)
27 changes: 27 additions & 0 deletions engines/pyfunc-ensembler-service/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# PyFuncEnsembler Server for Real-Time Experiments

PyFuncEnsemblerRunner is a tool for deploying user-defined ensemblers (for use with Turing routers), written in
MLflow's `pyfunc` flavour.

## Usage
To run the ensembler as a webservice:
```bash
python -m pyfunc_ensembler_runner --mlflow_ensembler_dir $ENSEMBLER_DIR [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]

arguments:
--mlflow_ensembler_dir <path/to/ensembler/dir/> Path to the ensembler folder containing the mlflow files
--log-level <DEBUG||INFO||WARNING||ERROR||CRITICAL> Set the logging level
-h, --help Show this help message and exit
```

## Docker Image Building

To create a docker image locally, you'll need to first download the model artifacts from the MLflow's model registry:
```bash
gsutil cp -r gs://[bucket-name]/mlflow/[project_id]/[run_id]/artifacts/ensembler .
```

To build the docker image, run the following:
```bash
make build-image
```
Loading