Skip to content

Latest commit

 

History

History
52 lines (42 loc) · 3.21 KB

project-overview.md

File metadata and controls

52 lines (42 loc) · 3.21 KB

Project Overview

(back to main README)

ML pipeline structure

This project defines an ML pipeline for automated retraining and batch inference of an ML model on tabular data.

See the full pipeline structure below. The stacks README contains additional details on how ML pipelines are tested and deployed across each of the dev, staging, prod environments below.

MLOps Stacks diagram

Code structure

This project contains the following components:

Component Description
ML Code Example ML project code, with unit tested Python modules and notebooks using MLflow recipes
ML Resource Config as Code ML pipeline resource config (training and batch inference job schedules, etc) defined through Terraform
CI/CD Azure DevOps Pipelines to test and deploy ML code and resources

contained in the following files:

├── steps              <- MLflow recipe steps (Python modules) implementing ML pipeline logic, e.g. model training and evaluation. Most
│                         development work happens here. See https://mlflow.org/docs/latest/pipelines.html for details
│
├── notebooks          <- Databricks notebooks that run the MLflow recipe, i.e. run the logic in `steps`. Used to
│                         drive code execution on Databricks for CI/CD. In most cases, you do not need to modify
│                         these notebooks.
│
│── recipe.yaml      <- The main recipe configuration file that declaratively defines the attributes and behavior
│                         of each recipe step, such as the input dataset to use for training a model or the
│                         performance criteria for promoting a model to production.
│
├── profiles           <- Environment-specific (e.g. dev vs test vs prod) configurations for MLflow pipeline execution.
│
├── requirements.txt   <- Specifies Python dependencies for ML code (for example: model training, batch inference).
│
├── tests              <- Unit tests for the modules under `steps`.
├── .azure            <- Configuration folder for CI/CD using Azure DevOps Pipelines. The CI/CD workflows run the notebooks
│                         under `notebooks` to test and deploy model training code
│
├── databricks-config  <- ML resource (ML jobs, MLflow models) config definitions expressed as code, across staging/prod.
│   ├── staging
│   ├── prod

Next Steps

See the main README for additional links on how to work with this repo.