Skip to content

vikranth22446/InferCept

 
 

Repository files navigation

INFERCEPT: Efficient Intercept Support for Augmented Large Language Model Inference

This repo contains implementation of InferCept. Please refer to our paper for more details.

Instructions

To install InferCept to your environment:

# After cloning the repo
cd infercept/
pip install -e .

To enable the serving system to hook on augmentation calls, register your aug-stop token in vllm/utils.py. You can register multiple keys at once:

def get_api_stop_strings() -> List[str]:
  return ["<stop token 1>", "<stop token 2>"]

To reproduce paper results, check exps folder.

Citation

If you use InferCept for your research, please cite our paper:

@inproceedings{
  abhyankar2024infer,
  title={INFERCEPT: Efficient Intercept Support for Augmented Large Language Model
Inference},
  author={Reyna Abhyankar and Zijian He and Vikranth Srivatsa and Hao Zhang and Yiying Zhang},
  booktitle={Forty-first International Conference on Machine Learning},
  year={2024},
  month=Jul,
  address={Vienna, Austria},
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 48.9%
  • Python 43.7%
  • Cuda 6.2%
  • Shell 1.0%
  • C++ 0.2%
  • C 0.0%