Linyi Yang*1,2 Shuibai Zhang*1 Zhuohao Yu*3
Guangsheng Bao2 Yidong Wang1,3 Jindong Wang4 Ruochen Xu4 Wei Ye1
Xing Xie4 Weizhu Chen4 Yue Zhang†1,2
*: Co-first Authors †: Corresponding Authors
1 School of Engineering, Westlake University, 2 Westlake Institute for Advanced Study,
3
Peking University, 4 Microsoft
This is the official repository for Supervised Knowledge Makes Large Language Models Better In-context Learners.
Paper: Supervised Knowledge Makes Large Language Models Better In-context Learners
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The recent progress in large-scale generative models has further expanded their use in real-world language applications. However, the critical challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. While previous in-context learning research has focused on enhancing models to adhere to users' specific instructions and quality expectations, and to avoid undesired outputs, little to no work has explored the use of task-Specific fine-tuned Language Models (SLMs) to improve LLMs' in-context learning during the inference stage. Our primary contribution is the establishment of a simple yet effective framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks. Using our proposed plug-in method, enhanced versions of Llama 2 and ChatGPT surpass their original versions regarding generalizability and factuality. We offer a comprehensive suite of resources, including 16 curated datasets, prompts, model checkpoints, and LLM outputs across 9 distinct tasks. Our empirical analysis sheds light on the advantages of incorporating discriminative models into LLMs and highlights the potential of our methodology in fostering more reliable LLMs.
This repository contains:
- The codes for implementing SuperContext
- The prompts used for each task
- The model weights of Pre-trained Language Models
- The codes and configs for fine-tuning models.
- [2024/01/23] SuperContext was accepted to ICLR 2024!
- [2023/12/27] Our paper has been collected by HuggingFace's daily selection on Twitter.
We welcome contributions to SuperContext. If you'd like to contribute, please follow these steps:
- Fork the repository.
- Create a new branch with your changes.
- Submit a pull request with a clear description of your changes.
@inproceedings{yang2024supervised,
title={Supervised Knowledge Makes Large Language Models Better In-context Learners},
author={Linyi Yang, Shuibai Zhang, Zhuohao Yu, Guangsheng Bao, Yidong Wang, Jindong Wang, Ruochen Xu, Wei Ye, Xing Xie, Weizhu Chen, Yue Zhang},
booktitle={The Eighteenth International Conference on Learning Representations (ICLR 2024)},
year={2024}
}
For model weights of LLaMA-based models, we follow the LLaMA license. See MODEL_LICENSE.
The rest of this repo is under Apache License 2.0. See LICENSE.