This project demonstrates a complete ML project and the development flow from initial exploration to continuous deployment at scale. The example is based on a Kaggle competition. Its goal is to predict the correct trip fare, using the public NYC Taxi dataset.
This example is intended to explain and demonstrate the overall MLOps flow by using the MLRun MLOps orchestration framework. It is not designed to dive into the individual components or models.
It is recommended to fork this repo into your GitHub account and clone it into your development environment.
The ML application development and productization flow consists of the following steps (demonstrated through notebooks):
- Exploratory data analysis (EDA) and modeling.
- Data and model pipeline development (data preparation, training, evaluation, and so on).
- Application & serving pipeline development (intercept requests, process data, inference, and so on).
- Scaling and automation (run at scale, hyper-parameter tuning, monitoring, pipeline automation, and so on).
- Continuous operations (automated tests, CI/CD integration, upgrades, retraining, live ops, and so on).
You can find the python source code under /src and the tests under /tests.
This project can run in different development environments:
- Local computer (using PyCharm, VSCode, Jupyter, etc.)
- Inside GitHub Codespaces
- Sagemaker Studio and Studio Labs (free edition) or other managed Jupyter environments
The project works with the MLRun service. Y ou can deploy the MLRun service (API, DB, UI, and execution environment) over Docker or, preferabley, over Kubernetes.
The make mlrun-docker
launches a local MLRun service using Docker compose (the MLRun UI can be viewed in: http://localhost:8060). Alternatively edit the mlrun.env
file to
configure a remote MLRun service (over Kubernetes).
For resource-constrained environments without Docker you can start the MLRun service as a process (no UI) with the make mlrun-api
command.
First, install the package dependencies and the environment.
Using pip (install the requirements):
make install-requirements
Your environment should include MLRUN_ENV_FILE=<absolute path to the ./mlrun.env file>
(point to the mlrun .env
file in this repo). See the mlrun client setup instructions for details.
Using conda (create the mlrun
conda env and install packages and env vars in it):
make conda-env
conda activate mlrun
Make sure all your tasks and Notebooks use the
mlrun
python environment!
Next, start or connect to the MLRun service:
Start a local Docker MLRun service by running make mlrun-docker
or edit the DBPATH and credentials in the mlrun.env
file to use a remote MLRun service.
This project is configured to run "as is" inside GitHub Codespaces (see the config files under /.devcontainer
).
After the codespaces environment starts, you need to start a local MLRun service or connect to a remote one.
- For a minimal, local MLRun (no UI), run:
make mlrun-api
- For a local Docker installation (requires 8 CPUs configuration or larger), run:
make mlrun-docker
. To view MLRun UI open theports
tab and browse toMLRun UI
. - For a remote MLRun service, edit the DBPATH and credentials in the
mlrun.env
file.
The local MLRun service must be started every time the codespaces environment is restarted.
First, load this project into Sagemaker by clicking or through Sagemaker UI.
After the project is loaded, open a console terminal and enter the project directory (using cd
command) and type:
make conda-env
For a minimal setup, run MLRun service as a local process (no UI):
conda activate mlrun && make mlrun-api
To use a remote MLRun service, edit the DBPATH and credentials in the mlrun.env
file.
Make sure all your tasks and Notebooks use the
mlrun
python environment !