Project Overview

ML pipeline structure

This project defines an ML pipeline for automated retraining and batch inference of an ML model on tabular data.

See the full pipeline structure below. The stacks README contains additional details on how ML pipelines are tested and deployed across each of the dev, staging, prod environments below.

Code structure

This project contains the following components:

Component	Description
ML Code	Example ML project code, with unit tested Python modules and notebooks using MLflow recipes
ML Resource Config as Code	ML pipeline resource config (training and batch inference job schedules, etc) defined through Terraform
CI/CD	Azure DevOps Pipelines to test and deploy ML code and resources

contained in the following files:

├── steps              <- MLflow recipe steps (Python modules) implementing ML pipeline logic, e.g. model training and evaluation. Most
│                         development work happens here. See https://mlflow.org/docs/latest/pipelines.html for details
│
├── notebooks          <- Databricks notebooks that run the MLflow recipe, i.e. run the logic in `steps`. Used to
│                         drive code execution on Databricks for CI/CD. In most cases, you do not need to modify
│                         these notebooks.
│
│── recipe.yaml      <- The main recipe configuration file that declaratively defines the attributes and behavior
│                         of each recipe step, such as the input dataset to use for training a model or the
│                         performance criteria for promoting a model to production.
│
├── profiles           <- Environment-specific (e.g. dev vs test vs prod) configurations for MLflow pipeline execution.
│
├── requirements.txt   <- Specifies Python dependencies for ML code (for example: model training, batch inference).
│
├── tests              <- Unit tests for the modules under `steps`.
├── .azure            <- Configuration folder for CI/CD using Azure DevOps Pipelines. The CI/CD workflows run the notebooks
│                         under `notebooks` to test and deploy model training code
│
├── databricks-config  <- ML resource (ML jobs, MLflow models) config definitions expressed as code, across staging/prod.
│   ├── staging
│   ├── prod

Next Steps

See the main README for additional links on how to work with this repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

project-overview.md

project-overview.md

Project Overview

ML pipeline structure

Code structure

Next Steps

Files

project-overview.md

Latest commit

History

project-overview.md

File metadata and controls

Project Overview

ML pipeline structure

Code structure

Next Steps