This is the official repo for SIGIR 2024 tutorial: Empowering LLMs: Tool Learning with Real-World Interactions. More details can be found in https://rulegreen.github.io/services/tools-meet-llm/
We record the recent progress of tool learning based on LLMs. We list works following the structure of tutorail, and will constantly update it, welcome to raise a issue to add new works!!
table of contents
-
What Are Tools Anyway? A Survey from the Language Model Perspective
2024/03
-
The Rise and Potential of Large Language Model Based Agents: A Survey
-
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security
defnition and scope of tools
relevant cognitive tools
-
TPE: Towards Better Compositional Reasoning over Conceptual Tools with Multi-persona Collaboration π₯π₯π₯
-
Large Language Models as Source Planner for Personalized Knowledge-grounded Dialogues π₯π₯π₯
-
Meta-Reasoning: Monitoring and Control of Thinking and Reasoning π₯π₯π₯π₯π₯ personally like this
relevant physical tools
-
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs π₯π₯π₯ important work, also for dialogues
-
Toolformer: Language Models Can Teach Themselves to Use Tools
-
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
-
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
see above
-
Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees π₯π₯π₯
-
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
-
ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases multi-modal
-
ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models
-
ART: Automatic multi-step reasoning and tool-use for large language models
-
Toolformer: Language Models Can Teach Themselves to Use Tools
-
WizardLM: Empowering Large Language Models to Follow Complex Instructions
-
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
-
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
-
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
-
AdaPlanner: Adaptive Planning from Feedback with Language Models
-
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
-
Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach
-
User Behavior Simulation with Large Language Model based Agents
-
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
-
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback
-
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
-
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning
-
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
-
Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives
-
Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization
-
AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning
-
SOTOPIA-Ο: Interactive Learning of Socially Intelligent Language Agents
-
[CToolEval: A Chinese Benchmark for LLM-Powered Agent Evaluation in Real-World API Interactions] chinese benchmark
-
MINT: Evaluating llms in multi-turn interaction with tools and language feedback
-
Metatool benchmark for large language models: Deciding whether to use tools and which to use
-
Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives
-
Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization
-
CRITIC: LARGE LANGUAGE MODELS CAN SELFCORRECT WITH TOOL-INTERACTIVE CRITIQUING first to use external feedback from tools to critic/refine outputs of LLMs? [code]
-
Reflexion: Language Agents with Verbal Reinforcement Learning
-
Chat with the Environment: Interactive Multimodal Perception Using Large Language Models
-
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models
-
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
-
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents
-
ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models
-
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
-
Making Language Models Better Tool Learners with Execution Feedback
-
Craft: Customizing llms by creating and retrieving from specialized toolsets
-
CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models
-
EASYTOOL: ENHANCING LLM-BASED AGENTS WITH CONCISE TOOL INSTRUCTION optimizate tool documentation
-
TOOLVERIFIER: Generalization to New Tools via Self-Verification finetune a model to select tool based on desc, and propose questions to self-refine decisions
-
CRUXEval: A Benchmark for Code Reasoning Understanding and Execution
-
Toolrerank: Adaptive and hierarchy-aware reranking for tool retrieval
-
Empowering Large Language Model Agents through Action Learning
-
UniMS-RAG: A Unified Multi-source Retrieval-Augmented Generation for Personalized Dialogue Systems π₯π₯π₯
-
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection π₯π₯π₯π₯π₯
-
UniRetriever: Multi-task Candidates Selection for Various Context-Adaptive Conversational Retrieval
-
Active Retrieval Augmented Generation
EMNLP 2023
π₯π₯π₯ interesting and useful -> may can be used in dialogues -
Learning Retrieval Augmentation for Personalized Dialogue Generation
EMNLP 2023
-
PK-ICR: Persona-Knowledge Interactive Multi-Context Retrieval for Grounded Dialogue
EMNLP 2023
-
Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions
ACL 2023
-
Self-Knowledge Guided Retrieval Augmentation for Large Language Models
EMNLP 2023
-
A Solution-based LLM API-using Methodology for Academic Information Seeking
-
Voyager: An Open-Ended Embodied Agent with Large Language Models
2023.05
-
SCIENCEWORLD: Is your Agent Smarter than a 5th Grader?
22.03
Interactive Env
-
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
2020.10
-
On the tool manipulation capability of open-source large language models
-
Code as Policies: Language Model Programs for Embodied Control
-
Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks
-
Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents
-
Language Models Meet World Models: Embodied Experiences Enhance Language Models
-
FACTOOL: Factuality Detection in Generative AI A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios [code]
defnition and scope of tools
- 0 Multi-modal and Multi-agent Tool Learning
- 1 Safe, Trustworthy and Personalized Tool Learning
- 2 Emerging Trends and Future Opportunities
-
AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction π₯π₯π₯π₯π₯
-
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
-
[OS-Copilot: Towards Generalist Computer Agents with Self-Improvement]
-
[Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents]
-
Scaling Large-Language-Model-based Multi-Agent Collaboration
multi-agent
-
MOBILE-AGENT: AUTONOMOUS MULTI-MODAL MOBILE DEVICE AGENT WITH VISUAL PERCEPTION
multi-modal
-
WEBARENA: A REALISTIC WEB ENVIRONMENT FOR BUILDING AUTONOMOUS AGENTS
multi-modal
-
SECGPT: An Execution Isolation Architecture for LLM-Based Systems π₯π₯π₯π₯
-
[Towards Tool Use Alignment of Large Language Models]
safety and autonmy
-
ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages
ACL 2024
-
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agent
-
Metacognitive Retrieval-Augmented Large Language Models tools conflicts
-
TORA: A TOOL-INTEGRATED REASONING AGENT FOR MATHEMATICAL PROBLEM SOLVING π₯π₯π₯ [code]
@inproceedings{toolmeetllm,
author = {Wang, Hongru and Qin, Yujia and Lin, Yankai and Pan, Jeff Z. and Wong, Kam-Fai},
title = {Empowering Large Language Models: Tool Learning for Real-World Interaction},
year = {2024},
isbn = {9798400704314},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3626772.3661381},
doi = {10.1145/3626772.3661381},
abstract = {Since the advent of large language models (LLMs), the field of tool learning has remained very active in solving various tasks in practice, including but not limited to information retrieval. This half-day tutorial provides basic concepts of this field and an overview of recent advancements with several applications. In specific, we start with some foundational components and architecture of tool learning (i.e., cognitive tool and physical tool), and then we categorize existing studies in this field into tool-augmented learning and tool-oriented learning, and introduce various learning methods to empower LLMs this kind of capability. Furthermore, we provide several cases about when, what, and how to use tools in different applications. We end with some open challenges and several potential research directions for future studies. We believe this tutorial is suited for both researchers at different stages (introductory, intermediate, and advanced) and industry practitioners who are interested in LLMs and tool learning.},
booktitle = {Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval},
pages = {2983β2986},
numpages = {4},
keywords = {language agents, large language models, tool learning},
location = {Washington DC, USA},
series = {SIGIR '24}
}