This catalog contains code for instruction/chat tuning of the LongLLaMA models. Using this code we managed to tune the LongLLaMA-3Bv1.1 using one A100 80GB GPU in 44 hours. For tuning, we used OpenOrca (instructions) and zetavg/ShareGPT-Processed (chat) datasets. We call the created model LongLLaMA-Instruct-3Bv1.1. We provide a Colab demo of the model.
For more about LongLLaMA see the paper Focused Transformer: Contrastive Training for Context Scaling.
Required packages are located in requirements.txt
.
Example configs are in files:
- example_inst_ft_3b_low_budget.sh - only instruction tuning, smaller context
- example_instchat_ft_3bv1.1_low_budget.sh - instruction and chat tuning, config used for LongLLaMA-Instruct-3Bv1.1, the chat prompt was inspired by LongChat
To tune the model, simply run one of the scripts from the repo root directory. To manage the tuning process we use Hugging Face trainer.
For example, to create your own LongLLaMA-Instruct-3Bv1.1 run ./instruction_fine_tuning/example_instchat_ft_3bv1.1_low_budget.sh
.
- arguments.py - see this file for the description of additional (non-Hugging Face) parameters
- data_processing.py - used to process the data, this includes filtering, mixing chat and instruction data, padding etc.
- fine_tuning.py - main script that runs the trainer
- misc/trainer_state_of_LongLLaMA-Instruct-3v1.1.json - tuning log for LongLLaMA-Instruct-3Bv1.1
The code is available under Apache License, Version 2.0.
Note that for fine-tuning we used OpenOrca and zetavg/ShareGPT-Processed datasets. Those datasets contain outputs of GPT models, which can affect the licensing of the models trained on them.
Note that the fine-tuning scripts are for models previously fine-tuned with FoT. In particular, we do not use the FoT method during instruction fine-tuning. In order to maintain the model's ability to utilize long context, we randomly decide (for short inputs) how much data will be loaded to memory and how much will stay in the last context window. We achieve this by randomly padding the input. One may think of this as a modified version of FoT without negatives and with only current and previous context.
Sometimes Hugging Face Trainer can pick the logger by default. If you run into problems, you can manually set the logger by adding --report_to "tensorboard"
inside the script.
If you plan to use this codebase for different models, please note how the padding is applied. Note also that attention is masked for padding tokens.