From 05cb66aa8fd67851df2639e7a1a0c4c59a68fc73 Mon Sep 17 00:00:00 2001 From: Dries Smit Date: Thu, 15 Dec 2022 13:53:56 +0200 Subject: [PATCH 1/3] feat: Add first two papers. --- Research Papers/Shallow learning/README.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/Research Papers/Shallow learning/README.md b/Research Papers/Shallow learning/README.md index b3e9800..5ad8def 100644 --- a/Research Papers/Shallow learning/README.md +++ b/Research Papers/Shallow learning/README.md @@ -246,6 +246,8 @@ We introduce a compact graph-theoretic representation for multi-party game theor
Marginal contribution nets: A compact representation scheme for coalitional games by Ieong S, Shoham Y. In Proceedings of the 6th ACM Conference on Electronic Commerce, 2005. We present a new approach to representing coalitional games based on rules that describe the marginal contributions of the agents. This representation scheme captures characteristics of the interactions among the agents in a natural and concise manner. We also develop efficient algorithms for two of the most important solution concepts, the Shapley value and the core, under this representation. The Shapley value can be computed in time linear in the size of the input. The emptiness of the core can be determined in time exponential only in the treewidth of a graphical interpretation of our representation.
-
+
Tunably decentralized algorithms for cooperative target observation Luke, Sean and Sullivan, Keith and Panait, Liviu and Balan, Gabriel. 4th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2005), 2005. Multi-agent problem domains may require distributed algorithms for a variety of reasons: local sensors, limitations of communication, and availability of distributed computational resources. In the absence of these constraints, centralized algorithms are often more efficient, simply because they are able to take advantage of more information. We introduce a variant of the cooperative target observation domain which is free of such constraints. We propose two algorithms, inspired by K-means clustering and hill-climbing respectively, which are scalable in degree of decentralization. Neither algorithm consistently outperforms the other across over all problem domain settings. Surprisingly, we find that hill-climbing is sensitive to degree of decentralization, while K-means is not. We also experiment with a combination of the two algorithms which draws strength from each.
-
+
Multi-attribute coalitional games by Ieong S, Shoham Y. In Proceedings of the 7th ACM Conference on Electronic Commerce, 2006. We study coalitional games where the value of cooperation among the agents are solely determined by the attributes the agents possess, with no assumption as to how these attributes jointly determine this value. This framework allows us to model diverse economic interactions by picking the right attributes. We study the computational complexity of two coalitional solution concepts for these games — the Shapley value and the core. We show how the positive results obtained in this paper imply comparable results for other games studied in the literature.
-
Bayesian Coalitional Games by Ieong S, Shoham Y. AAAI, 2008. We introduce Bayesian Coalitional Games (BCGs), a generalization of classical coalitional games to settings with uncertainties. We define the semantics of BCG using the partition model, and generalize the notion of payoffs to contracts among agents. To analyze these games, we extend the solution concept of the core under three natural interpretations—ex ante, ex interim, and ex post—which coincide with the classical definition of the core when there is no uncertainty. In the special case where agents are risk-neutral, we show that checking for core emptiness under all three interpretations can be simplified to linear feasibility problems similar to that of their classical counterpart.
-
@@ -330,6 +332,8 @@ We introduce a compact graph-theoretic representation for multi-party game theor
Towards a relation between learning agents and evolutionary dynamics by Karl Tuyls, Tom Lenaerts, Katja Verbeeck, Sam Maes. BNAIC, 2002. Modeling learning agents in the context of Multi-agent Systems requires insight in the type and form of interactions with the environment and other agents in the system. Usually, these agents are modeled similar to the different players in a standard game theoretical model. In this paper we examine whether evolutionary game theory, and more specifically the replicator dynamics, is an adequate theoretical model for the study of the dynamics of reinforcement learning agents in a multi-agent system. As a first step in this direction we extend the results of [1, 9] to a more general reinforcement learning framework, i.e. Learning Automata.
-
+
Guaranteeing Coevolutionary Objective Measures Sean Luke, R. Paul Wiegand. FOGA, 2002. The task of understanding the dynamics of coevolutionary algorithms or comparing performance between such algorithms is complicated by the fact the internal fitness measures are subjective. Though several techniques have been proposed to use external or objective measures to help in analysis, there are clearly properties of fitness payoff, like intransitivity, for which these techniques are ineffective. We feel that a principled approach to this problem is to first establish the theoretical bounds to guarantee objective measures in one CEA model; from there one can later examine the effects of deviating from the assumptions made by these bounds. To this end, we present a model of competitive fitness assessment with a single population and non-parametric selection (such as tournament selection), and show minimum conditions and examples under which an objective measure exists, and when the dynamics of the coevolutionary algorithm are identical to those of a traditional EA.
-
+
When Evolving Populations is Better than Coevolving Individuals: The Blind Mice Problem by Thomas Miconi. IJCAI, 2003. This paper is about the evolutionary design of multi-agent systems. An important part of recent research in this domain has been focusing on collaborative revolutionary methods. We expose possible drawbacks of these methods, and show that for a non-trivial problem called the "blind mice" problem, a classical GA approach in which whole populations are evaluated, selected and crossed together (with a few tweaks) finds an elegant and non-intuitive solution more efficiently than cooperative coevolution. The difference in efficiency grows with the number of agents within the simulation. We propose an explanation for this poorer performance of cooperative coevolution, based on the intrinsic fragility of the evaluation process. This explanation is supported by theoretical and experimental arguments.
-
Exploring the Explorative Advantage of the Cooperative Coevolutionary (1+1) EA by Thomas Jansen, R. Paul Wiegand. GECCO, 2003. Using a well-known cooperative coevolutionary function optimization framework, a very simple cooperative coevolutionary (1+1) EA is defined. This algorithm is investigated in the context of expected optimization time. The focus is on the impact the cooperative coevolutionary approach has and on the possible advantage it may have over more traditional evolutionary approaches. Therefore, a systematic comparison between the expected optimization times of this coevolutionary algorithm and the ordinary (1+1) EA is presented. The main result is that separability of the objective function alone is is not sufficient to make the cooperative coevolutionary approach beneficial. By presenting a clear structured example function and analyzing the algorithms’ performance, it is shown that the cooperative coevolutionary approach comes with new explorative possibilities. This can lead to an immense speed-up of the optimization.
-
@@ -609,3 +613,4 @@ The papers below were found to be difficult to categorise and therefore are pres + From 7023b034441b107f1b953ccd8e85f7a3946cd6f1 Mon Sep 17 00:00:00 2001 From: Dries Smit Date: Fri, 16 Dec 2022 04:22:33 +0200 Subject: [PATCH 2/3] feat: Add first page of shallow learning papers. --- Research Papers/Shallow learning/README.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/Research Papers/Shallow learning/README.md b/Research Papers/Shallow learning/README.md index 5ad8def..e75c57a 100644 --- a/Research Papers/Shallow learning/README.md +++ b/Research Papers/Shallow learning/README.md @@ -445,6 +445,8 @@ Halpern begins by surveying possible formal systems for representing uncertainty
Learning to Communicate and Act using Hierarchical Reinforcement Learning by Mohammad Ghavamzadeh, Sridhar Mahadevan. AAMAS, 2004. In this paper, we address the issue of rational communication behavior among autonomous agents. The goal is for agents to learn a policy to optimize the communication needed for proper coordination, given the communication cost. We extend our previously reported cooperative hierarchical reinforcement learning (HRL) algorithm to include communication decisions and propose a new multiagent HRL algorithm, called COM-Cooperative HRL. In this algorithm, we define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those levels of the hierarchy which include cooperative subtasks are called cooperation levels. Coordination skills among agents are learned faster by sharing information at the cooperation levels, rather than the level of primitive actions. We add a communication level to the hierarchical decomposition of the problem below each cooperation level. Before making a decision at a cooperative subtask, agents decide if it is worthwhile to perform a communication action. A communication action has a certain cost and provides each agent at a certain cooperation level with the actions selected by the other agents at the same level. We demonstrate the efficacy of the COM-Cooperative HRL algorithm as well as the relation between the communication cost and the learned communication policy using a multiagent taxi domain.
-
+
Hierarchical Multi-Agent Reinforcement Learning Mohammad Ghavamzadeh, Sridhar Mahadevan and Rajbala Makar. Autonomous Agents and Multi-Agent Systems, 2006. n this paper we investigate the use of hierarchical reinforcement learning to speed up the acquisition of cooperative multi-agent tasks. We extend the MAXQ framework to the multi-agent case. Each agent uses the same MAXQ hierarchy to decompose a task into sub-tasks. Learning is decentralized, with each agent learning three interrelated skills: how to perform subtasks, which order to do them in, and how to coordinate with other agents. Coordination skills among agents are learned by using joint actions at the highest level(s) of the hierarchy. The Q nodes at the highest level(s) of the hierarchy are configured to represent the joint task-action space among multiple agents. In this approach, each agent only knows what other agents are doing at the level of sub-tasks, and is unaware of lower level (primitive) actions. This hierarchical approach allows agents to learn coordination faster by sharing information at the level of sub-tasks, rather than attempting to learn coordination taking into account primit ive joint state-action values. We apply this hierarchical multi-agent reinforcement learning algorithm to a complex AGV scheduling task and compare its performance and speed with other learning approaches, including flat multi-agent, single agent using MAXQ, selfish multiple agents using MAXQ (where each agent acts independently without communicating with the other agents), as well as several well-known AGV heuristics like "first come first serve", "highest queue first" and "nearest station first". We also compare the tradeoffs in learning speed vs. performance of modeling joint action values at multiple levels in the MAXQ hierarchy.
-
+
### Incomplete Information Games @@ -511,6 +513,15 @@ Halpern begins by surveying possible formal systems for representing uncertainty ### Robotic Teams +
Automatic programming of behavior-based robots using reinforcement learning Sriclhar ahadevan an Jonathan Connell. AAAI-91 Proceedings, 1991. This paper describes a general approach for automatically programming a behavior-based robot. New behaviors are learned by trial and error using a performance feedback function as reinforcement. Two algorithms for behavior learning are described that combine Q learning, a well-known scheme for propagating reinforcement values temporally across actions, with statistical clustering and Hamming distance, two ways of propagating reinforcement values spatially across states. A real behavior-based robot called OBELIX is described that learns several component behaviors in an example task involving pushing boxes. A simulator for the box pushing task is also used to gather data on the learning techniques. A detailed experimental study using the real robot and the simulator suggests two conclusions. +1. The learning techniques are able to learn the individual behaviors, sometimes outperforming a handcoded program. + +2. Using a behavior-based architecture speeds up reinforcement learning by converting the problem of learning a complex task into that of learning a simpler set of special-purpose reactive subtasks.
-
+ +
Reward Functions for Accelerated Learning Maja J Mataric. ICML'94: Proceedings of the Eleventh International Conference on International Conference on Machine Learning, 1994. This paper discusses why traditional reinforcement learning methods, and algorithms applied to those models, result in poor performance in situated domains characterized by multiple goals, noisy state, and inconsistent reinforcement. We propose a methodology for designing reinforcement functions that take advantage of implicit domain knowledge in order to accelerate learning in such domains. The methodology involves the use of heterogeneous reinforcement functions and progress estimators, and applies to learning in domains with a single agent or with multiple agents. The methodology is experimentally validated on a group of mobile robots learning a foraging task.
-
+ +
Learning to Behave Socially Mataric, M. Robotics and Autonomous Systems, 1997. This paper discusses the challenges of learning to behave socially in the dynamic, noisy, situated and embodied mobile multi-robot domain. Using the methodology for synthesizing basis behaviors as a substrate for generating a large repertoire of higher-level group interactions, in this paper we describe how, given the substrate, greedy agents can learn social rules that benefit the group as a whole. We describe three sources of reinforcement and show their effectiveness in learning non-greedy social rules. We then demonstrate the learning approach on a group of four mobile robots learning to yield and share information in a foraging task.
-
+
Learning Roles: Behavioral Diversity in Robot Teams by Tucker Balch. AAAI, 1997. This paper describes research investigating behavioral specialization in learning robot teams. Each agent is provided a common set of skills (motor schema-based behavioral assemblages) from which it builds a taskachieving strategy using reinforcement learning. The agents learn individually to activate particular behavioral assemblages given their current situation and a reward signal. The experiments, conducted in robot soccer simulations, evaluate the agents in terms of performance, policy convergence, and behavioral diversity. The results show that in many cases, robots will autorustically diversify by choosing heterogeneous behaviors. The degree of diversification and the performance of the team depend on the reward structure. When the entire team is jointly rewarded or penalized (global reinforcement), teams tend towards heterogeneous behavior. When agents are provided feedback individually (local reinforcement), they converge to identical policies.
-
Reinforcement Learning in the Multi-Robot Domain by MAJA J. MATARIC. Autonomous Robots, 1997. This paper describes a formulation of reinforcement learning that enables learning in noisy, dynamic environments such as in the complex concurrent multi-robot learning domain. The methodology involves minimizing the learning space through the use of behaviors and conditions, and dealing with the credit assignment problem through shaped reinforcement in the form of heterogeneous reinforcement functions and progress estimators. We experimentally validate the approach on a group of four mobile robots learning a foraging task
-
@@ -552,6 +563,9 @@ using Extended Optimal Response by Nobuo Suematsu, Akira Hayashi. AAMAS, 200 ### Theses +
Interaction and intelligent behavior Mataric, Maja J. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994. This thesis addresses situated, embodied agents interacting in complex domains. It focuses on two problems: I) synthesis and analysis of intelligent group behavior, and 2) learning in complex group environments. Behaviors are proposed as the appropriate level for control and learning. Basic behaviors are introduced as building blocks for synthesizing and analyzing system behavior. The thesis describes the process of selecting such basic behaviors, formally specifying them, algorithmically implementing them, and empirically evaluating them. All of the proposed ideas are validated with a group of up to 20 mobile robots owing a basic behavior set censisting of: avoidance, fallowing, aggregation, dispersion, and homing. The set of basic behaviors acts as a substrate for achieving more complex high-level goals and tasks. Two behavior combination operators are introduced, and verified by combining subsets of the above basic behavior set to implement collective flocking and foraging. A methodology is introduced for automatically constructing higher-level behaviors by learning to select among the basic behavior set. A novel formulation of reinforcement kerning in proposed that makes behavior selection learnable in noisy, uncertain multi-agent environments with stochastic dynamics. It consists of using conditions and behaviors for more robust control and minimised state-space, and reinforcement shaping methodology that enables principled embedding of domain knowledge with two types of shaping functions: heterogeneous reward functions and progress estimators. The methodology outperforms two alternatives when tested on a collection of robots learning to forage. The proposed formulation enables and accelerates leaning in complex multi-robot domains. The generality of the approach makes it compatible with the existing reinforcement learning algorithms, allowing it to accelerate learning in a variety of domains and applications. The presented methodologies end results are aimed at extending our understanding of synthesis, analysis, and learning of group behavior. +
-
+
Layered Learning in Multi-Agent Systems by Peter Stone. PhD thesis, 1998. Multi-agent systems in complex, real-time domains require agents to act effectively both autonomously and as part of a team. This dissertation addresses multi-agent systems consisting of teams of autonomous agents acting in real-time, noisy, collaborative, and adversarial environments. Because of the inherent complexity of this type of multi-agent system, this thesis investigates the use of machine learning within multi-agent systems. The dissertation makes four main contributions to the fields of Machine Learning and Multi-Agent Systems. First, the thesis defines a team member agent architecture within which a exible teamstructure is presented, allowing agents to decompose the task space into exible roles and allowing them to smoothly switch roles while acting. Team organization is achieved by the introduction of a locker-room agreement as a collection of conventions followed by all team members. It defines agent roles, team formations, and pre-compiled multi-agent plans. In addition, the team member agent architecture includes a communication paradigm for domains with single-channel, low-bandwidth, unreliable communication. The communication paradigm facilitates team coordination while being robust to lost messages and active interference from opponents. Second, the thesis introduces layered learning, a general-purpose machine learning paradigm for complex domains in which learning a mapping directly from agents' sensors to their actuators is intractable. Given a hierarchical task decomposition, layered learning allows for learning at each level of the hierarchy, with learning at each level directly affecting learning at the next higher level. Third, the thesis introduces a new multi-agent reinforcement learning algorithm, namely team-partitioned, opaque-transition reinforcement learning (TPOT-RL). TPOT-RL is designed for domains in which agents cannot necessarily observe the state changes when other team members act. It exploits local, action-dependent features to aggressively generalize its input representation for learning and partitions the task among the agents, allowing them to simultaneously learn collaborative policies by observing the long-term effects of their actions. Fourth, the thesis contributes a fully functioning multi-agent system that incorporates learning in a real-time, noisy domain with teammates and adversaries. Detailed algorithmic descriptions of the agents' behaviors as well as their source code are included in the thesis. Empirical results validate all four contributions within the simulated robotic soccer domain. The generality of the contributions is verified by applying them to the real robotic soccer, and network routing domains. Ultimately, this dissertation demonstrates that by learning portions of their cognitive processes, selectively communicating, and coordinating their behaviors via common knowledge, a group of independent agents can work towards a common goal in a complex, real-time, noisy, collaborative, and adversarial environment.
-
Multiagent Learning in the Presence of Agents with Limitations by Michael Bowling. Thesis, 2003. Learning to act in a multiagent environment is a challenging problem. Optimal behavior for one agent depends upon the behavior of the other agents, which are learning as well. Multiagent environments are therefore non-stationary, violating the traditional assumption underlying single-agent learning. In addition, agents in complex tasks may have limitations, such as physical constraints or designer-imposed approximations of the task that make learning tractable. Limitations prevent agents from acting optimally, which complicates the already challenging problem. A learning agent must effectively compensate for its own limitations while exploiting the limitations of the other agents. My thesis research focuses on these two challenges, namely multiagent learning and limitations, and includes four main contributions. First, the thesis introduces the novel concepts of a variable learning rate and the WoLF (Win or Learn Fast) principle to account for other learning agents. The WoLF principle is capable of making rational learning algorithms converge to optimal policies, and by doing so achieves two properties, rationality and convergence, which had not been achieved by previous techniques. The converging effect of WoLF is proven for a class of matrix games, and demonstrated empirically for a wide-range of stochastic games. Second, the thesis contributes an analysis of the effect of limitations on the game-theoretic concept of Nash equilibria. The existence of equilibria is important if multiagent learning techniques, which often depend on the concept, are to be applied to realistic problems where limitations are unavoidable. The thesis introduces a general model for the effect of limitations on agent behavior, which is used to analyze the resulting impact on equilibria. The thesis shows that equilibria do exist for a few restricted classes of games and limitations, but even well-behaved limitations do not preserve the existence of equilibria, in general. Third, the thesis introduces GraWoLF, a general-purpose, scalable, multiagent learning algorithm. GraWoLF combines policy gradient learning techniques with the WoLF variable learning rate. The effectiveness of the learning algorithm is demonstrated in both a card game with an intractably large state space, and an adversarial robot task. These two tasks are complex and agent limitations are prevalent in both. Fourth, the thesis describes the CMDragons robot soccer team strategy for adapting to an unknown opponent. The strategy uses a notion of plays as coordinated team plans. The selection of team plans is the decision point for adapting the team to its current opponent, based on the outcome of previously executed plays. The CMDragons were the first RoboCup robot team to employ online learning to autonomously alter its behavior during the course of a game. These four contributions demonstrate that it is possible to effectively learn to act in the presence of other learning agents in complex domains when agents may have limitations. The introduced learning techniques are proven effective in a class of small games, and demonstrated empirically across a wide range of settings that increase in complexity
-
From 6a79e83c9cdfbf3203f278083de458baffe9963e Mon Sep 17 00:00:00 2001 From: Dries Date: Fri, 20 Jan 2023 08:39:00 +0200 Subject: [PATCH 3/3] feat: Update conference names. --- Research Papers/Shallow learning/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Research Papers/Shallow learning/README.md b/Research Papers/Shallow learning/README.md index 6907756..c0acc0b 100644 --- a/Research Papers/Shallow learning/README.md +++ b/Research Papers/Shallow learning/README.md @@ -304,7 +304,7 @@ This paper investigates a relatively new direction in Multiagent Reinforcement L
Marginal contribution nets: A compact representation scheme for coalitional games by Ieong S, Shoham Y. In Proceedings of the 6th ACM Conference on Electronic Commerce, 2005. We present a new approach to representing coalitional games based on rules that describe the marginal contributions of the agents. This representation scheme captures characteristics of the interactions among the agents in a natural and concise manner. We also develop efficient algorithms for two of the most important solution concepts, the Shapley value and the core, under this representation. The Shapley value can be computed in time linear in the size of the input. The emptiness of the core can be determined in time exponential only in the treewidth of a graphical interpretation of our representation.
-
-
Tunably decentralized algorithms for cooperative target observation Luke, Sean and Sullivan, Keith and Panait, Liviu and Balan, Gabriel. 4th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2005), 2005. Multi-agent problem domains may require distributed algorithms for a variety of reasons: local sensors, limitations of communication, and availability of distributed computational resources. In the absence of these constraints, centralized algorithms are often more efficient, simply because they are able to take advantage of more information. We introduce a variant of the cooperative target observation domain which is free of such constraints. We propose two algorithms, inspired by K-means clustering and hill-climbing respectively, which are scalable in degree of decentralization. Neither algorithm consistently outperforms the other across over all problem domain settings. Surprisingly, we find that hill-climbing is sensitive to degree of decentralization, while K-means is not. We also experiment with a combination of the two algorithms which draws strength from each.
-
+
Tunably decentralized algorithms for cooperative target observation Luke, Sean and Sullivan, Keith and Panait, Liviu and Balan, Gabriel. AAMAS, 2005. Multi-agent problem domains may require distributed algorithms for a variety of reasons: local sensors, limitations of communication, and availability of distributed computational resources. In the absence of these constraints, centralized algorithms are often more efficient, simply because they are able to take advantage of more information. We introduce a variant of the cooperative target observation domain which is free of such constraints. We propose two algorithms, inspired by K-means clustering and hill-climbing respectively, which are scalable in degree of decentralization. Neither algorithm consistently outperforms the other across over all problem domain settings. Surprisingly, we find that hill-climbing is sensitive to degree of decentralization, while K-means is not. We also experiment with a combination of the two algorithms which draws strength from each.
-
Multi-attribute coalitional games by Ieong S, Shoham Y. In Proceedings of the 7th ACM Conference on Electronic Commerce, 2006. We study coalitional games where the value of cooperation among the agents are solely determined by the attributes the agents possess, with no assumption as to how these attributes jointly determine this value. This framework allows us to model diverse economic interactions by picking the right attributes. We study the computational complexity of two coalitional solution concepts for these games — the Shapley value and the core. We show how the positive results obtained in this paper imply comparable results for other games studied in the literature.
-
@@ -532,7 +532,7 @@ Halpern begins by surveying possible formal systems for representing uncertainty
Learning to Communicate and Act using Hierarchical Reinforcement Learning by Mohammad Ghavamzadeh, Sridhar Mahadevan. AAMAS, 2004. In this paper, we address the issue of rational communication behavior among autonomous agents. The goal is for agents to learn a policy to optimize the communication needed for proper coordination, given the communication cost. We extend our previously reported cooperative hierarchical reinforcement learning (HRL) algorithm to include communication decisions and propose a new multiagent HRL algorithm, called COM-Cooperative HRL. In this algorithm, we define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those levels of the hierarchy which include cooperative subtasks are called cooperation levels. Coordination skills among agents are learned faster by sharing information at the cooperation levels, rather than the level of primitive actions. We add a communication level to the hierarchical decomposition of the problem below each cooperation level. Before making a decision at a cooperative subtask, agents decide if it is worthwhile to perform a communication action. A communication action has a certain cost and provides each agent at a certain cooperation level with the actions selected by the other agents at the same level. We demonstrate the efficacy of the COM-Cooperative HRL algorithm as well as the relation between the communication cost and the learned communication policy using a multiagent taxi domain.
-
-
Hierarchical Multi-Agent Reinforcement Learning Mohammad Ghavamzadeh, Sridhar Mahadevan and Rajbala Makar. Autonomous Agents and Multi-Agent Systems, 2006. n this paper we investigate the use of hierarchical reinforcement learning to speed up the acquisition of cooperative multi-agent tasks. We extend the MAXQ framework to the multi-agent case. Each agent uses the same MAXQ hierarchy to decompose a task into sub-tasks. Learning is decentralized, with each agent learning three interrelated skills: how to perform subtasks, which order to do them in, and how to coordinate with other agents. Coordination skills among agents are learned by using joint actions at the highest level(s) of the hierarchy. The Q nodes at the highest level(s) of the hierarchy are configured to represent the joint task-action space among multiple agents. In this approach, each agent only knows what other agents are doing at the level of sub-tasks, and is unaware of lower level (primitive) actions. This hierarchical approach allows agents to learn coordination faster by sharing information at the level of sub-tasks, rather than attempting to learn coordination taking into account primit ive joint state-action values. We apply this hierarchical multi-agent reinforcement learning algorithm to a complex AGV scheduling task and compare its performance and speed with other learning approaches, including flat multi-agent, single agent using MAXQ, selfish multiple agents using MAXQ (where each agent acts independently without communicating with the other agents), as well as several well-known AGV heuristics like "first come first serve", "highest queue first" and "nearest station first". We also compare the tradeoffs in learning speed vs. performance of modeling joint action values at multiple levels in the MAXQ hierarchy.
-
+
Hierarchical Multi-Agent Reinforcement Learning Mohammad Ghavamzadeh, Sridhar Mahadevan and Rajbala Makar. AAMAS, 2006. n this paper we investigate the use of hierarchical reinforcement learning to speed up the acquisition of cooperative multi-agent tasks. We extend the MAXQ framework to the multi-agent case. Each agent uses the same MAXQ hierarchy to decompose a task into sub-tasks. Learning is decentralized, with each agent learning three interrelated skills: how to perform subtasks, which order to do them in, and how to coordinate with other agents. Coordination skills among agents are learned by using joint actions at the highest level(s) of the hierarchy. The Q nodes at the highest level(s) of the hierarchy are configured to represent the joint task-action space among multiple agents. In this approach, each agent only knows what other agents are doing at the level of sub-tasks, and is unaware of lower level (primitive) actions. This hierarchical approach allows agents to learn coordination faster by sharing information at the level of sub-tasks, rather than attempting to learn coordination taking into account primit ive joint state-action values. We apply this hierarchical multi-agent reinforcement learning algorithm to a complex AGV scheduling task and compare its performance and speed with other learning approaches, including flat multi-agent, single agent using MAXQ, selfish multiple agents using MAXQ (where each agent acts independently without communicating with the other agents), as well as several well-known AGV heuristics like "first come first serve", "highest queue first" and "nearest station first". We also compare the tradeoffs in learning speed vs. performance of modeling joint action values at multiple levels in the MAXQ hierarchy.
-

@@ -602,7 +602,7 @@ Halpern begins by surveying possible formal systems for representing uncertainty ### Robotic Teams -
Automatic programming of behavior-based robots using reinforcement learning Sriclhar ahadevan an Jonathan Connell. AAAI-91 Proceedings, 1991. This paper describes a general approach for automatically programming a behavior-based robot. New behaviors are learned by trial and error using a performance feedback function as reinforcement. Two algorithms for behavior learning are described that combine Q learning, a well-known scheme for propagating reinforcement values temporally across actions, with statistical clustering and Hamming distance, two ways of propagating reinforcement values spatially across states. A real behavior-based robot called OBELIX is described that learns several component behaviors in an example task involving pushing boxes. A simulator for the box pushing task is also used to gather data on the learning techniques. A detailed experimental study using the real robot and the simulator suggests two conclusions. +
Automatic programming of behavior-based robots using reinforcement learning Sriclhar ahadevan an Jonathan Connell. AAAI-91, 1991. This paper describes a general approach for automatically programming a behavior-based robot. New behaviors are learned by trial and error using a performance feedback function as reinforcement. Two algorithms for behavior learning are described that combine Q learning, a well-known scheme for propagating reinforcement values temporally across actions, with statistical clustering and Hamming distance, two ways of propagating reinforcement values spatially across states. A real behavior-based robot called OBELIX is described that learns several component behaviors in an example task involving pushing boxes. A simulator for the box pushing task is also used to gather data on the learning techniques. A detailed experimental study using the real robot and the simulator suggests two conclusions. 1. The learning techniques are able to learn the individual behaviors, sometimes outperforming a handcoded program. 2. Using a behavior-based architecture speeds up reinforcement learning by converting the problem of learning a complex task into that of learning a simpler set of special-purpose reactive subtasks.
-