The data collection process is illustrated below:
We fed GPT-3.5 with captions from 3K images and descriptions of 22 visual tasks. This produced 66K instructions, each corresponding to a specific visual task and a visual foundation model (tool). Subsequently, we eliminated duplicate instructions and retained 41K sound instructions. To teach the model to utilize tools in a predefined manner, we followed the prompt format used in Visual ChatGPT and converted these instructions into a conversational format. Concurrently, we generated negative data without tool usage by randomly sampling 3K instructions from alpaca_gpt4_data
and converting them to the defined format. Using the generated 71K instructions, we finetuned the Vicuna using LoRA and got our GPT4Tools, which can automatically decide, control, and utilize distinct tools in a conversation.
Each sample follows the below format:
{
'instruction': xxx,
'input': xxx,
'output': xxx,
}
Data file name | Size | OneDrive | Google Driver |
---|---|---|---|
gpt4tools_71k.json | 229 MB | link | link |
gpt4tools_val_seen.json | -- | link | link |
gpt4tools_test_unseen.json | -- | link | link |
-
gpt4tools_71k.json
contains 71K instruction-following data we used for fine-tuning the GPT4Tools model. -
gpt4tools_val_seen.json
is the manually cleaned instruction data used for validation, which includes instructions related to tools ofgpt4tools_71k.json
. -
gpt4tools_test_unseen.json
cleaned instruction data used for testing, including instructions related to some tools that are absented ingpt4tools_71k.json
.
During generation using GPT-3.5, the openai api_key should be set in the env (OPENAI_API_KEY).
- Raw Data Generation
python3 gpt4tools/data/get_instruction.py \
--caption-path <your_caption_data_path> \
--instruction-path <instruction_data_path>
- Cleaning, and Instructional Data Consutruction
python3 gpt4tools/data/generate_annoations.py \
--input-path <instruction_data_path> \
--output-path <annotations_path> \
--caption-path <your_caption_data_path> \
--alpaca-path <your_alpaca_instruction_path> \
--filter \
--complement \
--insert-alpaca