This repo contains a set of reference designs for various ML topics. A few examples of the types of things you can learn here:
Direct a large language model to answer based only on context from documents
Intelligently select complex objects in images
Predict future values in data that varies over time
Examples in this repo cover the following industry domain problems:
- Accounting
- Receipt processing
- Botany
- Group observations into n groups based on equal variance
- Customer Service
- Context aware chat bots
- Medical
- Breast cancer diagnosis
- Real Estate
- Price prediction
- Retail
- Product image classification
- Technology
- Deploy machine learning models to production
- Compose workflows involving large language models (LLMs)
- Store and search for embedding data in vectorstores
- Intelligently select complex objects in images
- Expand capabilities of large language models with custom tool calling
- Transportation
- Seasonal airline traffic prediction
- Zoology
- Group observations based on data density
Some of the examples in this repo are meant to be run interactively using Jupyter-Lab or Jupiter-Notebooks. See https://jupyter.org/install
Examples that only have script files will have a README file with instructions.
To avoid conflicts with your local environment, create a virtual environment and run the notebook within this environment.
python -m venv .venv
.venv\Scripts\activate
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
python -m ipykernel install --user --name=virtualenv
Most advances in machine learning are happing on Linux targeting Nvidia GPUs with CUDA support. Some advanced models such as Llama 3 may not work well (or at all) on Windows machines.
Workbook examples that include LLMs models are more complex than other examples and require additional setup work.
Llama 3
- Download the model weights (requires an access request that is granted by Meta staff, may take 24 hours or more to be approved) https://huggingface.co/meta-llama/Meta-Llama-3-8B
- Model weights are GBs of data, store them in a drive with sufficient space
- Clone the model code https://github.com/meta-llama/llama3
- Change to the directory with the model code and pip install the model and dependencies
pip install -e .
Then select the virtualenv kernel after launching Jupyter Lab with the command jupyter lab
For additional background see https://www.linkedin.com/pulse/how-use-virtual-environment-inside-jupyter-lab-sina-khoshgoftar
Various simple examples for getting started with different frameworks
- Terminology
- PyTorch
- TensorFlow
Various recipes for common feature engineering tasks.
- Pandas essentials
- Handle missing data
- Convert class labels to numbers
- Imbalanced classification
- Choose Fourier features
- Kaggle predict home price feature prep
Recipes for working with images
Various examples that deal with predicting a value based on inputs
Various examples that deal with placing inputs into one or more categories
Various examples that deal with grouping data points by a similarity metric.
- Cluster seed types using K-Means and scikit-learn
- Cluster penguin species using DBSCAN and scikit-learn
Various examples that deal with time based data
Various examples that deal with processing image data.
- Image classification with PyTorch and Fashion MNIST
- Image segmentation using the Meta Segment Anything Model and OpenCV
Examples that interact with large language models with billions of parameters that are often training across many commercial grade GPUs for many millions of hours.
Various tasks that deal with using trained models