Dataset

The data collection process is illustrated below:
We fed GPT-3.5 with captions from 3K images and descriptions of 22 visual tasks. This produced 66K instructions, each corresponding to a specific visual task and a visual foundation model (tool). Subsequently, we eliminated duplicate instructions and retained 41K sound instructions. To teach the model to utilize tools in a predefined manner, we followed the prompt format used in Visual ChatGPT and converted these instructions into a conversational format. Concurrently, we generated negative data without tool usage by randomly sampling 3K instructions from alpaca_gpt4_data and converting them to the defined format. Using the generated 71K instructions, we finetuned the Vicuna using LoRA and got our GPT4Tools, which can automatically decide, control, and utilize distinct tools in a conversation.

Each sample follows the below format:

{
    'instruction': xxx,
    'input': xxx,
    'output': xxx,
}

Download

Data file name	Size	OneDrive	Google Driver
gpt4tools_71k.json	229 MB	link	link
gpt4tools_val_seen.json	--	link	link
gpt4tools_test_unseen.json	--	link	link

gpt4tools_71k.json contains 71K instruction-following data we used for fine-tuning the GPT4Tools model.
gpt4tools_val_seen.json is the manually cleaned instruction data used for validation, which includes instructions related to tools of gpt4tools_71k.json.
gpt4tools_test_unseen.json cleaned instruction data used for testing, including instructions related to some tools that are absented in gpt4tools_71k.json.

Generation

During generation using GPT-3.5, the openai api_key should be set in the env (OPENAI_API_KEY).

Raw Data Generation

python3 gpt4tools/data/get_instruction.py \
        --caption-path <your_caption_data_path> \
	    --instruction-path <instruction_data_path>

Cleaning, and Instructional Data Consutruction

python3 gpt4tools/data/generate_annoations.py \
        --input-path <instruction_data_path> \
        --output-path <annotations_path> \
	    --caption-path <your_caption_data_path> \
	    --alpaca-path <your_alpaca_instruction_path> \
	    --filter \
	    --complement \
	    --insert-alpaca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data.md

data.md

Dataset

Download

Generation

Files

data.md

Latest commit

History

data.md

File metadata and controls

Dataset

Download

Generation