Skip to content

A set of reference implementations for various ML related tasks

Notifications You must be signed in to change notification settings

ccozad/ml-reference-designs

Repository files navigation

Introduction

This repo contains a set of reference designs for various ML topics. A few examples of the types of things you can learn here:

Direct a large language model to answer based only on context from documents RAG Pipeline

Intelligently select complex objects in images Segmented Image

Predict future values in data that varies over time Time Series Prediction

Industry domain problems

Examples in this repo cover the following industry domain problems:

  • Accounting
    • Receipt processing
  • Botany
    • Group observations into n groups based on equal variance
  • Customer Service
    • Context aware chat bots
  • Medical
    • Breast cancer diagnosis
  • Real Estate
    • Price prediction
  • Retail
    • Product image classification
  • Technology
    • Deploy machine learning models to production
    • Compose workflows involving large language models (LLMs)
    • Store and search for embedding data in vectorstores
    • Intelligently select complex objects in images
    • Expand capabilities of large language models with custom tool calling
  • Transportation
    • Seasonal airline traffic prediction
  • Zoology
    • Group observations based on data density

Setup

Jupyter requirement

Some of the examples in this repo are meant to be run interactively using Jupyter-Lab or Jupiter-Notebooks. See https://jupyter.org/install

Examples that only have script files will have a README file with instructions.

Virtual environment

To avoid conflicts with your local environment, create a virtual environment and run the notebook within this environment.

Windows

python -m venv .venv
.venv\Scripts\activate
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
python -m ipykernel install --user --name=virtualenv

Most advances in machine learning are happing on Linux targeting Nvidia GPUs with CUDA support. Some advanced models such as Llama 3 may not work well (or at all) on Windows machines.

Working with LLMs

Workbook examples that include LLMs models are more complex than other examples and require additional setup work.

Llama 3

Selecting the new kernel

Then select the virtualenv kernel after launching Jupyter Lab with the command jupyter lab

Additional resources

For additional background see https://www.linkedin.com/pulse/how-use-virtual-environment-inside-jupyter-lab-sina-khoshgoftar

Contents

Getting Started

Various simple examples for getting started with different frameworks

Feature Engineering

Various recipes for common feature engineering tasks.

Image Processing

Recipes for working with images

Regression

Various examples that deal with predicting a value based on inputs

Classification

Various examples that deal with placing inputs into one or more categories

Clustering

Various examples that deal with grouping data points by a similarity metric.

Time Series

Various examples that deal with time based data

Computer Vision

Various examples that deal with processing image data.

Large Language Models

Examples that interact with large language models with billions of parameters that are often training across many commercial grade GPUs for many millions of hours.

Claude 3.5

LangChain

Llama 3

Phi-3

Deployment

Various tasks that deal with using trained models

Releases

No releases published

Packages

No packages published

Languages