Skip to content

Relaxed-System-Lab/ZI-ISAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zero-Indexing-Internet-Search-Augmented-Generation

ZI-ISAG

ZI-ISAG is an internet search augmented generation paradigm for LLMs. Key features include:

  • Dynamically integrate the latest online information (without maintaining any index for any fixed corpus).
  • The core component is an extractor LLM which can accurately and efficiently extract relevant information from the tagged content.
  • Extractor LLM only outputs the associated TAGs in the tagged content, significantly improving efficiency of the extraction process.

This project was made possible thanks to a collaboration with: 01_hkust_fdu_tsinghua


Content

Paradigm

zi-isag-paradigm

Extractor LLM

extractor-llm-case

Enviroment

We use conda to manage our enviroment with CUDA 12.1 and cuDNN 9.1.0.

Create and activate env zi-isag:

conda create -p /YOUR_PATH_TO_CONDA_ENV/zi-isag python=3.10.14
conda activate /YOUR_PATH_TO_CONDA_ENV/zi-isag

Build the env with requirements.txt

python -m pip install -r requirements.txt

Instruction Set Construction

The construction includes the following steps:

  • Split and Tag
  • Extract Info-Tag
  • Double Check
  • Portion and Construct

All details of these steps could be found in data/construct.py

Train

The training computation was conducted on four NVIDIA H800 GPUs, each with 80GB of memory by the DeepSpeed parallel training system. A example could be seen in train/sft/train_zi-isag.sh:

deepspeed --include localhost:0,1,2,3 main_no_padding.py \
   --data_path  ../../data/instruction_set/zi-isag/example \
   --data_split 1,0,0 \
   --model_name_or_path YOUR_MODLE_PATH \
   --max_seq_len 16384 \
   --learning_rate 5e-6 \
   --weight_decay 0. \
   --num_train_epochs 3  \
   --gradient_accumulation_steps 4 \
   --lr_scheduler_type cosine_with_min_lr \
   --num_warmup_steps 0 \
   --seed 1234 \
   --gradient_checkpointing \
   --zero_stage 3 \
   --deepspeed \
   --output_dir ./model_ckpts/ \
   --exp_name YOUR_EXP_NAME \
   --tensorboard_path ./save_tensorboard \
   --data_output_path ./YOUR_DATA_OUTPUT \
   --enable_tensorboard

Note that we set --per_device_train_batch_size to 1 by default, so no padding is needed. If you want to use a --per_device_train_batch_size larger than 1, you may edit train/sft/train_zi-isag.sh, the create_dataset_split function within train/utils/data/data_utils_no_padding.py, and make any other modifications you may require.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published