From b3b196724dfb391b29ca55304aaf3fbcc00dc53f Mon Sep 17 00:00:00 2001 From: filipw Date: Thu, 13 Feb 2025 21:03:44 +0100 Subject: [PATCH] updated the Fine-tuning with MLX instructions --- md/04.Fine-tuning/FineTuning_MLX.md | 270 +++++++++++++--------------- 1 file changed, 120 insertions(+), 150 deletions(-) diff --git a/md/04.Fine-tuning/FineTuning_MLX.md b/md/04.Fine-tuning/FineTuning_MLX.md index abb7c6ce..21015a14 100644 --- a/md/04.Fine-tuning/FineTuning_MLX.md +++ b/md/04.Fine-tuning/FineTuning_MLX.md @@ -1,215 +1,185 @@ # **Fine-tuning Phi-3 with Apple MLX Framework** -We can complete Fine-tuning combined with Lora through the Apple MLX framework command line. (If you want to know more about the operation of MLX Framework, please read [Inference Phi-3 with Apple MLX Framework](../03.Inference/MLX_Inference.md) +Fine-tuning with **LoRA** can be performed using the **Apple MLX framework** via the command line. +For more details on how the MLX framework works, refer to [Inference Phi-3 with Apple MLX Framework](../03.Inference/MLX_Inference.md). -## **1. Data preparation** - -By default, MLX Framework requires the jsonl format of train, test, and eval, and is combined with Lora to complete fine-tuning jobs. - - -### ***Note:*** - -1. jsonl data format : +## 1. Data preparation +By default, the **MLX framework** requires training, testing, and evaluation data in **JSONL** format. It uses **LoRA** to perform fine-tuning. +### JSONL Data Format + ```json - {"text": "<|user|>\nWhen were iron maidens commonly used? <|end|>\n<|assistant|> \nIron maidens were never commonly used <|end|>"} {"text": "<|user|>\nWhat did humans evolve from? <|end|>\n<|assistant|> \nHumans and apes evolved from a common ancestor <|end|>"} {"text": "<|user|>\nIs 91 a prime number? <|end|>\n<|assistant|> \nNo, 91 is not a prime number <|end|>"} -.... - ``` -2. Our example uses [TruthfulQA's data](https://github.com/sylinrl/TruthfulQA/blob/main/TruthfulQA.csv) , but the amount of data is relatively insufficient, so the fine-tuning results are not necessarily the best. It is recommended that learners use better data based on their own scenarios to complete. - -3. The data format is combined with the Phi-3 template +Our example dataset is based on [TruthfulQA's data](https://github.com/sylinrl/TruthfulQA/blob/main/TruthfulQA.csv). However, this dataset is relatively small, so the fine-tuning results may not be optimal. We recommend using higher-quality datasets tailored to your specific use case for better results. -Please download data from this [link](../../code/04.Finetuning/mlx/) , please inculde all .jsonl in ***data*** folder +The dataset follows the **Phi-3 format**. +### Download data -## **2. Fine-tuning in your terminal** +You can download the dataset from this [link](../../code/04.Finetuning/mlx/) . +Make sure to place all `.jsonl` files inside the ***data*** folder. -Please run this command in terminal +## 2. Fine-tuning in your terminal +Run the following command in your terminal: ```bash - python -m mlx_lm.lora --model microsoft/Phi-3-mini-4k-instruct --train --data ./data --iters 1000 - -``` - - -## ***Note:*** - -1. This is LoRA fine-tuning, MLX framework not published QLoRA - -2. You can set config.yaml to change some arguments,such as - - -```yaml - - -# The path to the local model directory or Hugging Face repo. -model: "microsoft/Phi-3-mini-4k-instruct" -# Whether or not to train (boolean) -train: true - -# Directory with {train, valid, test}.jsonl files -data: "data" - -# The PRNG seed -seed: 0 - -# Number of layers to fine-tune -lora_layers: 32 - -# Minibatch size. -batch_size: 1 - -# Iterations to train for. -iters: 1000 - -# Number of validation batches, -1 uses the entire validation set. -val_batches: 25 - -# Adam learning rate. -learning_rate: 1e-6 - -# Number of training steps between loss reporting. -steps_per_report: 10 - -# Number of training steps between validations. -steps_per_eval: 200 - -# Load path to resume training with the given adapter weights. -resume_adapter_file: null - -# Save/load path for the trained adapter weights. -adapter_path: "adapters" - -# Save the model every N iterations. -save_every: 1000 - -# Evaluate on the test set after training -test: false - -# Number of test set batches, -1 uses the entire test set. -test_batches: 100 - -# Maximum sequence length. -max_seq_length: 2048 - -# Use gradient checkpointing to reduce memory use. -grad_checkpoint: true - -# LoRA parameters can only be specified in a config file -lora_parameters: - # The layer keys to apply LoRA to. - # These will be applied for the last lora_layers - keys: ["o_proj","qkv_proj"] - rank: 64 - scale: 1 - dropout: 0.1 - - ``` -Please run this command in terminal - +> **💡 Note:** +> This implementation performs LoRA fine-tuning using the MLX framework. It is not the officially published QLoRA method. +> To modify the training configuration, update the parameters in `config.yaml`, such as: +. +> ```yaml +># The path to the local model directory or Hugging Face repo. +>model: "microsoft/Phi-3-mini-4k-instruct" +># Whether or not to train (boolean) +>train: true +> +># Directory with {train, valid, test}.jsonl files +>data: "data" +> +># The PRNG seed +>seed: 0 +> +># Number of layers to fine-tune +>lora_layers: 32 +> +># Minibatch size. +>batch_size: 1 +> +># Iterations to train for. +>iters: 1000 +> +># Number of validation batches, -1 uses the entire validation set. +>val_batches: 25 +> +># Adam learning rate. +>learning_rate: 1e-6 +> +># Number of training steps between loss reporting. +>steps_per_report: 10 +> +># Number of training steps between validations. +>steps_per_eval: 200 +> +># Load path to resume training with the given adapter weights. +>resume_adapter_file: null +> +># Save/load path for the trained adapter weights. +>adapter_path: "adapters" +> +># Save the model every N iterations. +>save_every: 1000 +> +># Evaluate on the test set after training +>test: false +> +># Number of test set batches, -1 uses the entire test set. +>test_batches: 100 +> +># Maximum sequence length. +>max_seq_length: 2048 +> +># Use gradient checkpointing to reduce memory use. +>grad_checkpoint: true +> +># LoRA parameters can only be specified in a config file +>lora_parameters: +> # The layer keys to apply LoRA to. +> # These will be applied for the last lora_layers +> keys: ["o_proj","qkv_proj"] +> rank: 64 +> scale: 1 +> dropout: 0.1 +> ``` +> +> In that case, please run this command in the terminal: +> +> ```bash +> python -m mlx_lm.lora --config lora_config.yaml +> ``` + +## 3. Run Fine-Tuning Adapter to Test + +You can run the fine-tuned adapter in the terminal using the following command: ```bash - -python -m mlx_lm.lora --config lora_config.yaml - +python -m mlx_lm.generate --model microsoft/Phi-3-mini-4k-instruct --adapter-path ./adapters --max-token 2048 --prompt "Why do chameleons change colors?" --eos-token "<|end|>" ``` - -## **3. Run Fine-tuning adapter to test** - -You can run fine-tuning adapter in terminal,like this - +To compare the results, run the original model without fine-tuning: ```bash - -python -m mlx_lm.generate --model microsoft/Phi-3-mini-4k-instruct --adapter-path ./adapters --max-token 2048 --prompt "Why do chameleons change colors? " --eos-token "<|end|>" - +python -m mlx_lm.generate --model microsoft/Phi-3-mini-4k-instruct --max-token 2048 --prompt "Why do chameleons change colors?" --eos-token "<|end|>" ``` -and run original model to compare result +Try comparing the output from the fine-tuned model with the original model to see the differences. +## 4. Merge Adapters to Generate a New Model -```bash - -python -m mlx_lm.generate --model microsoft/Phi-3-mini-4k-instruct --max-token 2048 --prompt "Why do chameleons change colors? " --eos-token "<|end|>" +To merge the fine-tuned adapters into a new model, run the following command: +```bash +python -m mlx_lm.fuse --model microsoft/Phi-3-mini-4k-instruct ``` -You can try to compare the results of Fine-tuning with the original model - - -## **4. Merge adapters to generate new models** +## 5. Run Inference on the Merged Model +After merging the adapters, you can run inference on the newly generated model using the following command: ```bash - -python -m mlx_lm.fuse --model microsoft/Phi-3-mini-4k-instruct - +python -m mlx_lm.generate --model ./fused_model --max-token 2048 --prompt "What is the happiest place on Earth?" --eos-token "<|end|>" ``` -## **5. Running quantified fine-tuning models using ollama** +The model can now be used for inference with any framework that supports the **SafeTensors** format. +🎉 **Congratulations!** You’ve successfully mastered fine-tuning with the **MLX Framework**! -Before use, please configure your llama.cpp environment +## 6. (optional) Running Quantized Fine-Tuned Models Using Ollama +First configure your `llama.cpp` environment: ```bash - git clone https://github.com/ggerganov/llama.cpp.git - cd llama.cpp - pip install -r requirements.txt +``` -python convert.py 'Your meger model path' --outfile phi-3-mini-ft.gguf --outtype f16 +Now convert from the SafeTensors format to the Ollama format: +```bash +python convert_hf_to_gguf.py 'Your merged model path' --outfile phi-3-mini-ft.gguf --outtype q4_0 ``` -***Note:*** +> +> **💡 Note:** +> 1. The conversion process supports exporting the model in **fp32, fp16**, and various quantized formats such as **q4_0, q4_1, q8_0, and INT8**. +> 2. The merged model is missing `tokenizer.model`. Please download it from [Hugging Face](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct). -1. Now supports quantization conversion of fp32, fp16 and INT 8 -2. The merged model is missing tokenizer.model, please download it from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct +### Set Up Ollama Model File -set Ollma Model file(If not install ollama ,please read [Ollama QuickStart](../02.QuickStart/Ollama_QuickStart.md) +If you haven’t installed Ollama yet, refer to the [Ollama QuickStart Guide](../02.QuickStart/Ollama_QuickStart.md). +Create a `Modelfile` with the following content: ```txt - FROM ./phi-3-mini-ft.gguf PARAMETER stop "<|end|>" - ``` -run command in terminal +### Run the Model in Your Terminal +Execute the following commands to create and run the fine-tuned model in Ollama: ```bash - - ollama create phi3ft -f Modelfile - - ollama run phi3ft "Why do chameleons change colors?" - +ollama create phi3ft -f Modelfile +ollama run phi3ft "Why do chameleons change colors?" ``` - -Congratulations! Master fine-tuning with the MLX Framework - - - - - - - - - -