Lemur is an openly accessible language model optimized for both natural language and coding capabilities to serve as the backbone of versatile language agents. As language models continue to evolve from conversational chatbots to functional agents that can act in the real world, they need both strong language understanding and the ability to execute actions. Lemur balances natural language and coding skills to enable agents to follow instructions, reason for tasks, and take grounded actions.
Please refer to our paper and code for more details:
- [Paper] Lemur: Harmonizing Natural Language and Code for Language Agents
- [Blog] Introducing Lemur: Open Foundation Models for Language Agents
- [18 October, 2023]: π We open-sourced code for OpenAgents: An Open Platform for Language Agents in the Wild.
- [11 October, 2023]: π We released the research paper and codebase. We will continue updating this repository.
- [23 August, 2023]: π We released the weights of
OpenLemur/lemur-70b-v1
, andOpenLemur/lemur-70b-chat-v1
! Check it out in HuggingFace Hub.
We released our models on the HuggingFace Hub:
Most existing open-source models specialize in either natural language or code. Lemur combines both strengths by:
- Pretraining Llama-2-70B on a 90B token corpus with the 10:1 ratio of code to text and obtaining Lemur-70B-v1
- Instruction tuning Lemur-70B-v1 on 300K examples covering both text and code and obtaining Lemur-70B-Chat-v1
This two-stage training produces state-of-the-art performance averaged across diverse language and coding benchmarks, surpassing other available open-source models and narrowing the gap between open-source and commercial models on agent abilities.
First, we have to install all the libraries listed in requirements.txt
conda create -n xchat python=3.10
conda activate xchat
conda install pytorch==2.0.1 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install -c "nvidia/label/cuda-11.8.0" cuda-nvcc
Then, install the xchat package:
git clone git@github.com:OpenLemur/Lemur.git
cd Lemur
pip install -e .
For the base model lemur-70b-v1, you can use it in this way:
Click me
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("OpenLemur/lemur-70b-v1")
model = AutoModelForCausalLM.from_pretrained("OpenLemur/lemur-70b-v1", device_map="auto", load_in_8bit=True)
# Text Generation Example
prompt = "The world is "
input = tokenizer(prompt, return_tensors="pt")
output = model.generate(**input, max_length=50, num_return_sequences=1)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
# Code Generation Example
prompt = """
def factorial(n):
if n == 0:
return 1
"""
input = tokenizer(prompt, return_tensors="pt")
output = model.generate(**input, max_length=200, num_return_sequences=1)
generated_code = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_code)
We instruction-finetune lemur-70b-v1 model with ChatML format to obtain lemur-70b-chat-v1. You can use lemur-70b-chat-v1 in this way:
Click me
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("OpenLemur/lemur-70b-chat-v1")
model = AutoModelForCausalLM.from_pretrained("OpenLemur/lemur-70b-chat-v1", device_map="auto", load_in_8bit=True)
# Text Generation Example
prompt = """<|im_start|>system
You are a helpful, respectful, and honest assistant.
<|im_end|>
<|im_start|>user
What's a lemur's favorite fruit?<|im_end|>
<|im_start|>assistant
"""
input = tokenizer(prompt, return_tensors="pt")
output = model.generate(**input, max_length=50, num_return_sequences=1)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
# Code Generation Example
prompt = """<|im_start|>system
Below is an instruction that describes a task. Write a response that appropriately completes the request.
<|im_end|>
<|im_start|>user
Write a Python function to merge two sorted lists into one sorted list without using any built-in sort functions.<|im_end|>
<|im_start|>assistant
"""
input = tokenizer(prompt, return_tensors="pt")
output = model.generate(**input, max_length=200, num_return_sequences=1)
generated_code = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_code)
We evaluated Lemur across:
- 8 language and code datasets like MMLU, BBH, GSM8K, HumanEval, and Spider to validate balanced capabilities
- 13 interactive agent datasets to test skills like tool usage, adapting to feedback from environments or humans, and exploring partially observable digital or physical environments.
We build the evaluation suite based on open-instruct. We will keep updating more tasks and models.
Currently, we support the following tasks:
- [β ] MMLU
- [β ] BBH
- [β ] GSM8K
- [β ] HumanEval
- [β ] MBPP
- [π§] Spider
- [π§] MultiPL-E
- [π§] DS-1000
- [π§] ...
We use the evaluation frameworks provided by MINT, InterCode, and WebArena to evaluate interactive agent skills.
We use vLLM to serve the Lemur model. However, the official FastChat codebase does not yet support Lemur-Chat. Therefore, we provide a docker to serve vLLM for Lemur. Please refer to vllm_lemur.sh for more detailed information.
bash scripts/deploy/vllm_lemur.sh
We fork MINT codebase to share the configs we used. Please refer to this config folder for more details. Please run vllm with vllm_lemur.sh
script.
We fork WebArena codebase to enable vLLM evaluation. To run the evaluation on WebArena, please refer to our forked WebArena codebase.
We fork InterCode codebase and do modifications to enable Lemur evaluation. Please refer to this script folder for more details. Please run text-generation-inference
with tgi_lemur.sh
script.
If you find our work helpful, please cite us:
@misc{xu2023lemur,
title={Lemur: Harmonizing Natural Language and Code for Language Agents},
author={Yiheng Xu and Hongjin Su and Chen Xing and Boyu Mi and Qian Liu and Weijia Shi and Binyuan Hui and Fan Zhou and Yitao Liu and Tianbao Xie and Zhoujun Cheng and Siheng Zhao and Lingpeng Kong and Bailin Wang and Caiming Xiong and Tao Yu},
year={2023},
eprint={2310.06830},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
The Lemur project is an open collaborative research effort between XLang Lab and Salesforce Research. We thank the following institutions for their gift support: