This is a collection of papers, blogs and projects about research agents powered by large language models (LLMs).
This repository will be continuously updated to track the resources of LLM research agents.
format:
- [time] [title](paper link)
- author1, author2, and author3...
- publisher
- code
- experimental environments and datasets
- [Aug 2024] The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
- Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, David Ha
- https://github.com/SakanaAI/AI-Scientist
- [Apr 2024] ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
- Jinheon Baek, Sujay Kumar Jauhar, Silviu Cucerzan, Sung Ju Hwang
- [Sep 2024] Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
- Chenglei Si, Diyi Yang, Tatsunori Hashimoto
- https://github.com/NoviScl/AI-Researcher
- [Oct 2024] Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents
- Long Li, Weiwen Xu, Jiayan Guo, Ruochen Zhao, Xingxuan Li, Yuqian Yuan, Boqiang Zhang, Yuming Jiang, Yifei Xin, Ronghao Dang, Deli Zhao, Yu Rong, Tian Feng, Lidong Bing
- https://github.com/DAMO-NLP-SG/CoI-Agent
- [Oct 2024] SciPIP: An LLM-based Scientific Paper Idea Proposer
- Wenxiao Wang, Lihui Gu, Liye Zhang, Yunxiang Luo, Yi Dai, Chen Shen, Liang Xie, Binbin Lin, Xiaofei He & Jieping Ye
- https://github.com/cheerss/SciPIP
- [Jun 2024] AutoSurvey: Large Language Models Can Automatically Write Surveys
- Yidong Wang, Qi Guo, Wenjin Yao, Hongbo Zhang, Xin Zhang, Zhen Wu, Meishan Zhang, Xinyu Dai, Min Zhang, Qingsong Wen, Wei Ye, Shikun Zhang, Yue Zhang
- [Dec 2024] LLMs for Literature Review: Are we there yet?
- Shubham Agarwal, Gaurav Sahu, Abhay Puri, Issam H. Laradji, Krishnamurthy DJ Dvijotham, Jason Stanley, Laurent Charlin, Christopher Pal
- [Oct 2023] SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
- Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, Karthik Narasimhan
- https://github.com/swe-bench/SWE-bench
- [Oct 2023] MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation
- Qian Huang, Jian Vora, Percy Liang, Jure Leskovec
- https://github.com/snap-stanford/MLAgentBench
- [Feb 2024] Benchmarking Data Science Agents
- Yuge Zhang, Qiyang Jiang, Xingyu Han, Nan Chen, Yuqing Yang, Kan Ren
- [Jul 2024] OpenHands: An Open Platform for AI Software Developers as Generalist Agents
- Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, Graham Neubig
- https://github.com/All-Hands-AI/OpenHands
- [Sep 2024] DSBench: How Far Are Data Science Agents to Become Data Science Experts?
- Liqiang Jing, Zhehui Huang, Xiaoyang Wang, Wenlin Yao, Wenhao Yu, Kaixin Ma, Hongming Zhang, Xinya Du, Dong Yu
- https://github.com/LiqiangJing/DSBench
- [Oct 2024] MLE-Bench: Evaluating Machine Learning Agents on Machine Learning Engineering
- Jun Shern Chan, Neil Chowdhury, Oliver Jaffe, James Aung, Dane Sherburn, Evan Mays, Giulio Starace, Kevin Liu, Leon Maksin, Tejal Patwardhan, Lilian Weng, Aleksander Mądry
- https://github.com/openai/mle-bench
- [Oct 2024] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
- Ziru Chen, Shijie Chen, Yuting Ning, Qianheng Zhang, Boshi Wang, Botao Yu, Yifei Li, Zeyi Liao, Chen Wei, Zitong Lu, Vishal Dey, Mingyi Xue, Frazier N. Baker, Benjamin Burns, Daniel Adu-Ampratwum, Xuhui Huang, Xia Ning, Song Gao, Yu Su, Huan Sun
- [Nov 2024] RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts
- Hjalmar Wijk, Tao Lin, Joel Becker, Sami Jawhar, Neev Parikh, Thomas Broadley, Lawrence Chan, Michael Chen, Josh Clymer, Jai Dhyani, Elena Ericheva, Katharyn Garcia, Brian Goodrich, Nikola Jurkovic, Megan Kinniment, Aron Lajko, Seraphina Nix, Lucas Sato, William Saunders, Maksym Taran, Ben West, Elizabeth Barnes
- [Jun 2024] Let’s Get to the Point: LLM-Supported Planning, Drafting, and Revising of Research-Paper Blog Posts
- Marissa Radensky, Daniel S. Weld, Joseph Chee Chang, Pao Siangliulue, Jonathan Bragg
- [May 2024] Is LLM a Reliable Reviewer? A Comprehensive Evaluation of LLM on Automatic Paper Reviewing Tasks
- Ruiyang Zhou, Lu Chen, Kai Yu
- [Dec 2024] Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review
- Rui Ye, Xianghe Pang, Jingyi Chai, Jiaao Chen, Zhenfei Yin, Zhen Xiang, Xiaowen Dong, Jing Shao, Siheng Chen
- gpt-researcher: an autonomous agent designed for comprehensive web and local research on any given task.
- AI-Scientist: Fully Automated Open-Ended Scientific Discovery.
- MLE-Agent: Your intelligent companion for seamless AI engineering and research.
- Data-Agent: a comprehensive toolkit designed for efficient data operations.
- Code-Agent: the LLM engine writes its actions in code.
- AIDE: the Machine Learning Engineer Agent.
- OpenHands: a platform for software development agents powered by AI.
- VisionAgent: an agent framework which can generate code to solve your vision task.