E. Pasqua, B. Uyarer @ Delivery Hero
Orchestrating, scheduling and monitoring ML pipelines is a big challenge.
Apache Airflow can be your ally for handling this complexity.
Apache Airflow is an open source project written in Python for programmatically author, schedule and monitor batch execution of tasks.
You can design your pipelines according to a determined logic: decide which actions to perform, retry them if errors occur, skip tasks if dependencies are not met, access monitor and log status through a friendly and powerful web UI, and a lot more.
In this workshop we’ll go over basic Airflow concepts and we’ll setup an instance for orchestrating a training and an inference pipeline for a machine learning model.
- It assumes no previous Airflow knowledge.
- The main purpose is creating a basic train and inference pipeline with Airflow.
- It is not about a particular model / ML method.
- It's not an advanced Airflow workshop.
- It is not suitable for Python beginners.
- Docker installed.
- Any editor (Sublime, PyCharm, Vim, Atom).
- Verify that Docker works properly:
docker run hello-world
- Ensure that you allocated 4gb of RAM for the Docker Engine. (Can be done via desktop app, check Preferences section. After setting up, restart Docker App)
- Download the Airflow Docker image:
docker pull puckel/docker-airflow
- Download repository under the
$HOME
directory.git clone https://github.com/deliveryhero/pyconde2019-airflow-ml-workshop
Note: Airflow installation and setup (without using Docker) are provided as appendix files (Mac OS X Airflow Setup, Ubuntu Airflow Setup).
During the tutorial we assume that everyone follows the steps tailored for using a containerised version of Airflow.
You can find the official Airflow documentation here.