🏰 LLM Zoo

As new animal species are being discovered in the world of natural language processing (NLP) 🌍 every day, it becomes necessary to establish a zoo 🦁 to accommodate them.

This project collects below information of various open- and closed-source LLMs (after the release of ChatGPT):

Release time
Model size
Languages supported
Domain
Training data
Links to resources: GitHub, HuggingFace, Demo, Paper, Official blog

📰 News

[2023.05.03] First release! We will regularly update 🔄 the repository to keep track of the latest LLMs. We welcome 👐 any contributions to this project. Please feel free to open an issue or submit a pull request to include new LLMs or update the information of existing LLMs 🙏.

📖 Open-Sourced LLMs

Release Time	Model	Version	Size	Backbone	Langs	Domain	Training Data	GitHub	HF	Paper	Demo	Official Blog
2023.02.27	LLaMA	llama-7b/13b/33b/65b	7B/13B/33B/65B	-	en	General	detail 1T tokens (English CommonCrawl, C4, Github, Wikipedia, Gutenberg and Books3, ArXiv, Stack Exchange)	[link]	[link]	[link]	-	[link]
2023.03.13	Alpaca	alpaca-7b/13b	7B/13B	LLaMA	en	General	detail 52k instruction-following data generated by InstructGPT [link]	[link]	[link]	-	[link]	[link]
2023.03.13	Vicuna	vicuna-7b/13b-delta-v1.1	7B/13B	LLaMA	en	General	detail 70K samples from sharedGPT	[link]	[link]	-	[link]	[link]
2023.03.14	ChatGLM	chatglm-6b	6B	GLM	zh, en	General	detail supervised fine-tuning, feedback bootstrap, and reinforcement learning with human feedback	[link]	[link]	-	-	[link]
2023.03.14	ChatGLM	chatglm-130b	130B	GLM	zh, en	General	detail supervised fine-tuning, feedback bootstrap, and reinforcement learning with human feedback	[link]	-	[link]	[link]	[link]
2023.03.16	Guanaco	-	7B	LLaMA	ja, zh, en, de	General	detail multilingual datasets [link]	[link]	[link]	-	-	-
2023.03.24	Dolly	dolly-v1-6b	6B	GPT-J-6B	en	General	detail 52k stanford alpaca instruction-following data [link]	-	[link]	-	-	[link]
2023.03.24	ChatDoctor	-	7B	LLaMA	en	Medicine	detail 52K stanford alpaca [link], 100K HealthCareMagic [link], 10K icliniq [link], 5K GenMedGPT-5k [link]	[link]	-	[link]	[link]	-
2023.03.25	LuoTuo	Chinese-alpaca-lora	7B	LLaMA	zh, en	General	detail Translated 52k stanford alpaca instruction-following data [link], guanaco [link]	[link]	[link]	-	-	-
2023.03.26	BELLE	BELLE-7B-0.2M/0.6M/1M/2M	7B	BLOOMZ-7B1-mt	zh, en	General	detail 0.2M/0.6M/1M/2M Chinese data [link], 52k stanford alpaca instruction-following data [link]	[link]	[link]	[link]	-	-
2023.03.28	Linly (伶荔)	Linly-Chinese-LLaMA 7b/13b/33b	7B/13B/33B	LLaMA	zh	General	detail Chinese-English parallel corpora [link], Chinese Wikipedia, community interaction, news data [link], scientific literature [link]	[link]	[link]	-	-	-
2023.03.28	Linly (伶荔)	Linly-ChatFlow 7b/13b	7B/13B	LLaMA	zh	General	detail BELLE [link], pCLUE [link], CSL [link], GuanacoDataset [link], Chain-of-Thought [link], news_commentary [link], firefly [link]	[link]	[link]	-	-	[link]
2023.04.01	BAIZE	baize-7B/13B/30B	7B/13B/30B	LLaMA	en	General	detail 52K Stanford Alpaca [link], 54K Quora [link], 57K StackOverFlow [link]	[link]	[link]	[link]	[link]	-
2023.04.03	Koala	-	13B	LLaMA	en	General	detail ShareGPT, HC3 [link], OIG [link], Stanford alpaca [link], Anthropic HH [link], OpenAI WebGPT [link], OpenAI Summarization [link]	-	[link]	-	[link]	[link]
2023.04.03	BAIZE	baize-healthcare-7b	7B	LLaMA	en	Medicine	detail 54K Quora [link], 47K medical dialogs [link]	[link]	[link]	-	-	-
2023.04.06	Firefly (流萤)	firefly-1b4/2b6	1.4B/2.6B	BLOOM-ZH	zh	General	detail Chinese question-answering pairs [link], [link]	[link]	[link]	-	-	-
2023.04.08	Phoenix	Phoenix-chat-7b	7B	BLOOMZ	multi	General	detail conversation data [link]	[link]	[link]	-	-	-
2023.04.09	Phoenix	Phoenix-inst-chat-7b	7B	BLOOMZ	multi	General	detail conversation data [link], instruction data	[link]	[link]	-	-	-
2023.04.10	Chimera	chimera-chat-7b/13b	7B/13B	LLaMA	latin	General	detail conversation data [link]	[link]	[link]	-	-	-
2023.04.11	Chimera	chimera-inst-chat-7b/13b	7B/13B	LLaMA	latin	General	detail conversation data [link], instruction data	[link]	[link]	-	-	-
2023.04.12	Dolly	dolly-v2-12b	12B	pythia-12b	en	General	detail 15k human-generated prompt/response pairs [link]	[link]	[link]	-	-	[link]
2023.04.14	MedAlpaca	medalpaca 7b/13b	7B/13B	LLaMA	en	Medicine	detail question-answering pairs from flash card, wikidoc, stackexchange and ChatDoctor	[link]	[link]	[link]	-	-
2023.04.19	BELLE	BELLE-LLaMA-7B/13B-2M	7B/13B	LLaMA	zh, en	General	detail 2M Chinese data [link], 52k stanford alpaca instruction-following data [link]	[link]	[link]	[link]	-	-
2023.04.21	MOSS	moss-moon-003-base	16B	CodeGen	zh, en	General	detail 100B Chinese tokens and 20B English tokens	[link]	[link]	-	[link]	[link]
2023.04.21	MOSS	moss-moon-003-sft	16B	moss-moon-003-base	zh, en	General	detail 1.1M multi-turn conversational data (generated from ChatGPT) [link]	[link]	[link]	-	[link]	[link]
2023.04.21	MOSS	moss-moon-003-sft-plugin	16B	moss-moon-003-base	zh, en	General	detail 1.1M multi-turn conversational data [link], 300K plugin-augmented data (generated by InstructGPT) [link]	[link]	[link]	-	[link]	[link]
2023.04.22	HuggingChat	oasst-sft-6-llama-30b	30B	LLaMA	multi	General	detail human-generated, human-annotated assistant-style conversation corpus consisting of 161k messages in 35 languages [link]	[link]	[link]	-	[link]	-
2023.06.19	KnowLM	zhixi-13b	13B	LLaMA	zh, en	General	detail human-generated, machine-generated and Knowledge Graph-generated in Chinese and English [link]	[link]	[link]	-	-	-
2023.06.21	BayLing(百聆)	BayLing-7b/13b	7B/13B	LLaMA	zh, en	General	detail 160K human-generated, machine-generated multi-turn interactive translation corpus, alpaca instructions and sharegpt conversations [link]	[link]	[link]	[link]	[link]	[link]
2023.07.18	LLaMA 2	llama-2-7b/13b/70b-(chat)	7B/13B/70B	-	en	General	detail 2T tokens (Most in English, a new mix of data from publicly available sources)	[link]	[link]	[link]	-	[link]

📕 Closed-Sourced LLMs

Release Time	Model	Version	Size	Langs	Domain	Demo	Official Blog	Paper
2022.11.30	ChatGPT	gpt-3.5-turbo	-	multi	general	[link]	[link]	-
2023.03.14	Claude	Claude Instant Claude-v1	-	multi	general	[link]	[link]	-
2023.03.14	GPT	gpt-4	-	multi	general	[link]	[link]	[link]
2023.03.16	Ernie Bot (文心一言)	-	-	zh, en	general	[link]	[link]	-
2023.03.21	Bard	-	-	multi	general	[link]	[link]	-
2023.03.30	BloombergGPT	-	50B	en	finance	-	[link]	[link]
2023.04.11	Tongyi Qianwen (通义千问)	-	-	multi	general	[link]	[link]	-
2023.07.07	OmModel（欧姆大模型）	-	-	multi	general	[link]	[link]	-
2023.07.11	Claude 2	Claude-v2	-	multi	general	-	[link]	[link]

🏗 TODO List

Include open-sourced LLMs
Include closed-sourced LLMs
Include a systematic review of common training data
Include interesting use cases of various LLMs
Performance of LLMs on various evaluation tasks

📝 Citation

If you find this repository useful, please consider citing.

@software{li2023llmzoo,
  title = {LLM Zoo}
  author = {Li, Xingxuan and Zhang, Wenxuan and Bing, Lidong},
  url = {https://github.com/DAMO-NLP-SG/LLM-Zoo},
  year = {2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏰 LLM Zoo

📰 News

📖 Open-Sourced LLMs

📕 Closed-Sourced LLMs

🏗 TODO List

📝 Citation

About

Releases

Packages

Contributors 5

License

DAMO-NLP-SG/LLM-Zoo

Folders and files

Latest commit

History

Repository files navigation

🏰 LLM Zoo

📰 News

📖 Open-Sourced LLMs

📕 Closed-Sourced LLMs

🏗 TODO List

📝 Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Packages