Welcome to your newly generated "ZenML LLM PEFT Finetuning project" project! This is a great way to get hands-on with ZenML using production-like template. The project contains a collection of ZenML steps, pipelines and other artifacts and useful resources that can serve as a solid starting point for finetuning open-source LLMs using ZenML.
Using these pipelines, we can run the data-preparation and model finetuning with a single command while using YAML files for configuration and letting ZenML take care of tracking our metadata and containerizing our pipelines.
This project heavily relies on the PEFT project by the amazing people at Huggingface and the microsoft/phi-2
model from the amazing people at Microsoft.
In this project, we provide a predefined configuration file to finetune models on the Viggio dataset. Before we're able to run any pipeline, we need to set up our environment as follows:
# Set up a Python virtual environment, if you haven't already
python3 -m venv .venv
source .venv/bin/activate
# Install requirements
pip install -r requirements.txt
Warning
All steps of this pipeline have a clean_gpu_memory(force=True)
at the beginning. This is used to ensure that the memory is properly cleared after previous steps.
This functionality might affect other GPU processes running on the same environment, so if you don't want to clean the GPU memory between the steps, you can delete those utility calls from all steps.
The easiest way to get started with just a single command is to run the finetuning pipeline with the orchestrator_finetune.yaml
configuration file, which will do data preparation, model finetuning, evaluation with Rouge and promotion:
python run.py --config orchestrator_finetune.yaml
When running the pipeline like this, the trained model will be stored in the ZenML artifact store.
Tip
To finetune the Llama 3.1 base model, please use the alternative configuration
files provided in the configs
folder.
For a remote finetune you can use llama3-1_finetune_remote.yaml
and for a
local finetune you can use llama3-1_finetune_local.yaml
.
Do you want to benefit from multi-GPU-training with Distributed Data Parallelism (DDP)? Then you can use other configuration files prepared for this purpose.
For example, orchestrator_finetune.yaml
can run a finetuning of the Microsoft Phi 2 powered by Hugging Face Accelerate on all GPUs available in the environment. To do so, just call:
python run.py --config orchestrator_finetune.yaml --accelerate
Under the hood, the finetuning step will spin up the accelerated job using the step code, which will run on all available GPUs.
To finetune an LLM on remote infrastructure, you can either use a remote orchestrator or a remote step operator. Follow these steps to set up a complete remote stack:
- Register the orchestrator (or step operator) and make sure to configure it in a way so that the finetuning step has access to a GPU with at least 24GB of VRAM. Check out our docs for more details.
- To access GPUs with this amount of VRAM, you might need to increase your GPU quota (AWS, GCP, Azure).
- The GPU instance that your finetuning will be running on will have CUDA drivers of a specific version installed. If that CUDA version is not compatible with the one provided by the default Docker image of the finetuning pipeline, you will need to modify it in the configuration file. See here for a list of available PyTorch images.
- Register a remote artifact store and container registry.
- Register a stack with all these components
zenml stack register llm-finetuning-stack -o <ORCHESTRATOR_NAME> \ -a <ARTIFACT_STORE_NAME> \ -c <CONTAINER_REGISTRY_NAME> \ [-s <STEP_OPERATOR_NAME>]
To fine-tune an LLM using your own datasets, consider adjusting the prepare_data
step to match your needs:
- This step loads, tokenizes, and stores the dataset from an external source to the artifact store defined in the ZenML Stack.
- The dataset can be loaded from Hugging Face by adjusting the
dataset_name
parameter in the configuration file. By default, the step code expects the dataset to have at least three splits:train
,validation
, andtest
. If your dataset uses different split naming, you'll need to make the necessary adjustments. - If you want to retrieve the dataset from other sources, you'll need to create the relevant code and prepare the splits in a Hugging Face dataset format for further processing.
- Tokenization occurs in the utility function
generate_and_tokenize_prompt
. It has a default way of formatting the inputs before passing them into the model. If this default logic doesn't fit your use case, you'll also need to adjust this function. - The return value is the path to the stored datasets (by default,
train
,val
, andtest_raw
splits). Note: The test set is not tokenized here and will be tokenized later during evaluation.
The project loosely follows the recommended ZenML project structure:
.
├── configs # pipeline configuration files
│ ├── orchestrator_finetune.yaml # default local or remote orchestrator configuration
│ └── remote_finetune.yaml # default step operator configuration
├── materializers
│ └── directory_materializer.py # custom materializer to push whole directories to the artifact store and back
├── pipelines # `zenml.pipeline` implementations
│ └── train.py # Finetuning and evaluation pipeline
├── steps # logically grouped `zenml.steps` implementations
│ ├── evaluate_model.py # evaluate base and finetuned models using Rouge metrics
│ ├── finetune.py # finetune the base model
│ ├── log_metadata.py # helper step to ensure that model metadata is always logged
│ ├── prepare_datasets.py # load and tokenize dataset
│ └── promote.py # promote good models to target environment
├── utils # utility functions
│ ├── callbacks.py # custom callbacks
│ ├── loaders.py # loaders for models and data
│ ├── logging.py # logging helpers
│ └── tokenizer.py # load and tokenize
├── .dockerignore
├── README.md # this file
├── requirements.txt # extra Python dependencies
└── run.py # CLI tool to run pipelines on ZenML Stack