From 05cb66aa8fd67851df2639e7a1a0c4c59a68fc73 Mon Sep 17 00:00:00 2001
From: Dries Smit <dries.epos@gmail.com>
Date: Thu, 15 Dec 2022 13:53:56 +0200
Subject: [PATCH 1/6] feat: Add first two papers.

---
 Research Papers/Shallow learning/README.md | 5 +++++
 1 file changed, 5 insertions(+)
diff --git a/Research Papers/Shallow learning/README.md b/Research Papers/Shallow learning/README.md
index b3e9800..5ad8def 100644
--- a/Research Papers/Shallow learning/README.md	
+++ b/Research Papers/Shallow learning/README.md	
@@ -246,6 +246,8 @@ We introduce a compact graph-theoretic representation for multi-party game theor
 
 <details> <summary> <a href="https://www.cs.cmu.edu/~sandholm/cs15-892F15/MarginalContributionEC05.pdf"> Marginal contribution nets: A compact representation scheme for coalitional games </a>by Ieong S, Shoham Y. In Proceedings of the 6th ACM Conference on Electronic Commerce, 2005. <a href="link">  </a> </summary> We present a new approach to representing coalitional games based on rules that describe the marginal contributions of the agents. This representation scheme captures characteristics of the interactions among the agents in a natural and concise manner. We also develop efficient algorithms for two of the most important solution concepts, the Shapley value and the core, under this representation. The Shapley value can be computed in time linear in the size of the input. The emptiness of the core can be determined in time exponential only in the treewidth of a graphical interpretation of our representation. <br> - </details>
 
+<details> <summary> <a href="https://cs.gmu.edu/~sean/papers/luke05tunable.pdf"> Tunably decentralized algorithms for cooperative target observation </a>Luke, Sean and Sullivan, Keith and Panait, Liviu and Balan, Gabriel. 4th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2005), 2005. <a href="link">  </a> </summary> Multi-agent problem domains may require distributed algorithms for a variety of reasons: local sensors, limitations of communication, and availability of distributed computational resources. In the absence of these constraints, centralized algorithms are often more efficient, simply because they are able to take advantage of more information. We introduce a variant of the cooperative target observation domain which is free of such constraints. We propose two algorithms, inspired by K-means clustering and hill-climbing respectively, which are scalable in degree of decentralization. Neither algorithm consistently outperforms the other across over all problem domain settings. Surprisingly, we find that hill-climbing is sensitive to degree of decentralization, while K-means is not. We also experiment with a combination of the two algorithms which draws strength from each. <br> - </details>
+
 <details> <summary> <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2006/01/ieong06multi.pdf"> Multi-attribute coalitional games </a>by Ieong S, Shoham Y. In Proceedings of the 7th ACM Conference on Electronic Commerce, 2006. <a href="link">  </a> </summary> We study coalitional games where the value of cooperation among the agents are solely determined by the attributes the agents possess, with no assumption as to how these attributes jointly determine this value. This framework allows us to model diverse economic interactions by picking the right attributes. We study the computational complexity of two coalitional solution concepts for these games — the Shapley value and the core. We show how the positive results obtained in this paper imply comparable results for other games studied in the literature. <br> - </details>
 
 <details> <summary> <a href="https://www.aaai.org/Papers/AAAI/2008/AAAI08-015.pdf"> Bayesian Coalitional Games </a>by Ieong S, Shoham Y. AAAI, 2008. <a href="link">  </a> </summary> We introduce Bayesian Coalitional Games (BCGs), a generalization of classical coalitional games to settings with uncertainties. We define the semantics of BCG using the partition model, and generalize the notion of payoffs to contracts among agents. To analyze these games, we extend the solution concept of the core under three natural interpretations—ex ante, ex interim, and ex post—which coincide with the classical definition of the core when there is no uncertainty. In the special case where agents are risk-neutral, we show that checking for core emptiness under all three interpretations can be simplified to linear feasibility problems similar to that of their classical counterpart. <br> - </details>
@@ -330,6 +332,8 @@ We introduce a compact graph-theoretic representation for multi-party game theor
 
 <details> <summary> <a href="https://d1wqtxts1xzle7.cloudfront.net/30801544/Tuybnaic02-with-cover-page-v2.pdf?Expires=1668008453&Signature=D2CzKtWwSHhSIlxHOl7NcfNf3Be7IomezobJzZPLDBT2~3nA28zKf7xPQkQq13XFBfGgEIhx3IvzL9OHu4abVVSf9TxdFZwCaNg7JODf81a8~bBg2y9CITtTYBtmpw8gxQw9mXc4dpHBEc9dKwjLi18zC47x2e9gr4ZX3uYeRu6JflBxR6FmqwvlNzR4VxPvTv0DwgKdnALkVedwDLaGUlE7iQEd5VQgNhy8ZF-76bZ8qhGWv4FNdrFY5bjVAbJ2nz4vYcM2AAc6qNE~if9VjBARd1hkg0-3U7WLUDk2UnRzBz2rn9Z7ra75pN2MQB0VtpXAQHuh8gG5Od~MBOIfuA__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA"> Towards a relation between learning agents and evolutionary dynamics </a>by Karl Tuyls, Tom Lenaerts, Katja Verbeeck, Sam Maes. BNAIC, 2002. <a href="link">  </a> </summary> Modeling learning agents in the context of Multi-agent Systems requires insight in the type and form of interactions with the environment and other agents in the system. Usually, these agents are modeled similar to the different players in a standard game theoretical model. In this paper we examine whether evolutionary game theory, and more specifically the replicator dynamics, is an adequate theoretical model for the study of the dynamics of reinforcement learning agents in a multi-agent system. As a first step in this direction we extend the results of [1, 9] to a more general reinforcement learning framework, i.e. Learning Automata. <br> - </details>
 
+<details> <summary> <a href="https://cs.gmu.edu/~sean/papers/foga02-msr.pdf"> Guaranteeing Coevolutionary Objective Measures </a>Sean Luke, R. Paul Wiegand. FOGA, 2002. <a href="link">  </a> </summary> The task of understanding the dynamics of coevolutionary algorithms or comparing performance between such algorithms is complicated by the fact the internal fitness measures are subjective. Though several techniques have been proposed to use external or objective measures to help in analysis, there are clearly properties of fitness payoff, like intransitivity, for which these techniques are ineffective. We feel that a principled approach to this problem is to first establish the theoretical bounds to guarantee objective measures in one CEA model; from there one can later examine the effects of deviating from the assumptions made by these bounds. To this end, we present a model of competitive fitness assessment with a single population and non-parametric selection (such as tournament selection), and show minimum conditions and examples under which an objective measure exists, and when the dynamics of the coevolutionary algorithm are identical to those of a traditional EA. <br> - </details>
+
 <details> <summary> <a href="https://www.ijcai.org/Proceedings/03/Papers/095.pdf"> When Evolving Populations is Better than Coevolving Individuals: The Blind Mice Problem </a>by Thomas Miconi. IJCAI, 2003. <a href="link">  </a> </summary> This paper is about the evolutionary design of multi-agent systems. An important part of recent research in this domain has been focusing on collaborative revolutionary methods. We expose possible drawbacks of these methods, and show that for a non-trivial problem called the "blind mice" problem, a classical GA approach in which whole populations are evaluated, selected and crossed together (with a few tweaks) finds an elegant and non-intuitive solution more efficiently than cooperative coevolution. The difference in efficiency grows with the number of agents within the simulation. We propose an explanation for this poorer performance of cooperative coevolution, based on the intrinsic fragility of the evaluation process. This explanation is supported by theoretical and experimental arguments.  <br> - </details>
 
 <details> <summary> <a href="http://www.tesseract.org/paul/papers/gecco03-ccea.pdf"> Exploring the Explorative Advantage of the Cooperative Coevolutionary (1+1) EA </a>by Thomas Jansen, R. Paul Wiegand. GECCO, 2003. <a href="link">  </a> </summary> Using a well-known cooperative coevolutionary function optimization framework, a very simple cooperative coevolutionary (1+1) EA is defined. This algorithm is investigated in the context of expected optimization time. The focus is on the impact the cooperative coevolutionary approach has and on the possible advantage it may have over more traditional evolutionary approaches. Therefore, a systematic comparison between the expected optimization times of this coevolutionary algorithm and the ordinary (1+1) EA is presented. The main result is that separability of the objective function alone is is not sufficient to make the cooperative coevolutionary approach beneficial. By presenting a clear structured example function and analyzing the algorithms’ performance, it is shown that the cooperative coevolutionary approach comes with new explorative possibilities. This can lead to an immense speed-up of the optimization. <br> - </details>
@@ -609,3 +613,4 @@ The papers below were found to be difficult to categorise and therefore are pres
 <!-- BREAK -->
 
 <!-- <details> <summary> <a href="link"> title </a>by authors. Conference, year. <a href="link">  </a> </summary> abstract <br> - </details> -->
+

From 7023b034441b107f1b953ccd8e85f7a3946cd6f1 Mon Sep 17 00:00:00 2001
From: Dries Smit <dries.epos@gmail.com>
Date: Fri, 16 Dec 2022 04:22:33 +0200
Subject: [PATCH 2/6] feat: Add first page of shallow learning papers.

---
 Research Papers/Shallow learning/README.md | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/Research Papers/Shallow learning/README.md b/Research Papers/Shallow learning/README.md
index 5ad8def..e75c57a 100644
--- a/Research Papers/Shallow learning/README.md	
+++ b/Research Papers/Shallow learning/README.md	
@@ -445,6 +445,8 @@ Halpern begins by surveying possible formal systems for representing uncertainty
 
 <details> <summary> <a href="https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1183&context=cs_faculty_pubs"> Learning to Communicate and Act using Hierarchical Reinforcement Learning </a>by Mohammad Ghavamzadeh, Sridhar Mahadevan. AAMAS, 2004. <a href="link">  </a> </summary> In this paper, we address the issue of rational communication behavior among autonomous agents. The goal is for agents to learn a policy to optimize the communication needed for proper coordination, given the communication cost. We extend our previously reported cooperative hierarchical reinforcement learning (HRL) algorithm to include communication decisions and propose a new multiagent HRL algorithm, called COM-Cooperative HRL. In this algorithm, we define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those levels of the hierarchy which include cooperative subtasks are called cooperation levels. Coordination skills among agents are learned faster by sharing information at the cooperation levels, rather than the level of primitive actions. We add a communication level to the hierarchical decomposition of the problem below each cooperation level. Before making a decision at a cooperative subtask, agents decide if it is worthwhile to perform a communication action. A communication action has a certain cost and provides each agent at a certain cooperation level with the actions selected by the other agents at the same level. We demonstrate the efficacy of the COM-Cooperative HRL algorithm as well as the relation between the communication cost and the learned communication policy using a multiagent taxi domain. <br> - </details>
 
+<details> <summary> <a href="https://mohammadghavamzadeh.github.io/PUBLICATIONS/jaamas06.pdf"> Hierarchical Multi-Agent Reinforcement Learning </a>Mohammad Ghavamzadeh, Sridhar Mahadevan and Rajbala Makar. Autonomous Agents and Multi-Agent Systems, 2006. <a href="link">  </a> </summary> n this paper we investigate the use of hierarchical reinforcement learning to speed up the acquisition of cooperative multi-agent tasks. We extend the MAXQ framework to the multi-agent case. Each agent uses the same MAXQ hierarchy to decompose a task into sub-tasks. Learning is decentralized, with each agent learning three interrelated skills: how to perform subtasks, which order to do them in, and how to coordinate with other agents. Coordination skills among agents are learned by using joint actions at the highest level(s) of the hierarchy. The Q nodes at the highest level(s) of the hierarchy are configured to represent the joint task-action space among multiple agents. In this approach, each agent only knows what other agents are doing at the level of sub-tasks, and is unaware of lower level (primitive) actions. This hierarchical approach allows agents to learn coordination faster by sharing information at the level of sub-tasks, rather than attempting to learn coordination taking into account primit ive joint state-action values. We apply this hierarchical multi-agent reinforcement learning algorithm to a complex AGV scheduling task and compare its performance and speed with other learning approaches, including flat multi-agent, single agent using MAXQ, selfish multiple agents using MAXQ (where each agent acts independently without communicating with the other agents), as well as several well-known AGV heuristics like "first come first serve", "highest queue first" and "nearest station first". We also compare the tradeoffs in learning speed vs. performance of modeling joint action values at multiple levels in the MAXQ hierarchy. <br> - </details>
+
 <br/>
 
 ### Incomplete Information Games
@@ -511,6 +513,15 @@ Halpern begins by surveying possible formal systems for representing uncertainty
 
 ### Robotic Teams
 
+<details> <summary> <a href="https://aaai.org/Papers/AAAI/1991/AAAI91-120.pdf"> Automatic programming of behavior-based robots using reinforcement learning </a>Sriclhar ahadevan an Jonathan Connell. AAAI-91 Proceedings, 1991. <a href="link">  </a> </summary> This paper describes a general approach for automatically programming a behavior-based robot. New behaviors are learned by trial and error using a performance feedback function as reinforcement. Two algorithms for behavior learning are described that combine Q learning, a well-known scheme for propagating reinforcement values temporally across actions, with statistical clustering and Hamming distance, two ways of propagating reinforcement values spatially across states. A real behavior-based robot called OBELIX is described that learns several component behaviors in an example task involving pushing boxes. A simulator for the box pushing task is also used to gather data on the learning techniques. A detailed experimental study using the real robot and the simulator suggests two conclusions.
+1. The learning techniques are able to learn the individual behaviors, sometimes outperforming a handcoded program.
+
+2. Using a behavior-based architecture speeds up reinforcement learning by converting the problem of learning a complex task into that of learning a simpler set of special-purpose reactive subtasks. <br> - </details>
+
+<details> <summary> <a href="https://www.sci.brooklyn.cuny.edu/~sklar/teaching/boston-college/s01/mc375/ml94.pdf"> Reward Functions for Accelerated Learning </a>Maja J Mataric. ICML'94: Proceedings of the Eleventh International Conference on International Conference on Machine Learning, 1994. <a href="link">  </a> </summary> This paper discusses why traditional reinforcement learning methods, and algorithms applied to those models, result in poor performance in situated domains characterized by multiple goals, noisy state, and inconsistent reinforcement. We propose a methodology for designing reinforcement functions that take advantage of implicit domain knowledge in order to accelerate learning in such domains. The methodology involves the use of heterogeneous reinforcement functions and progress estimators, and applies to learning in domains with a single agent or with multiple agents. The methodology is experimentally validated on a group of mobile robots learning a foraging task. <br> - </details>
+
+<details> <summary> <a href="http://csc.ucdavis.edu/~dynlearn/dynlearn/RoMADS/mataric01/index.html"> Learning to Behave Socially </a>Mataric, M. Robotics and Autonomous Systems, 1997. <a href="link">  </a> </summary> This paper discusses the challenges of learning to behave socially in the dynamic, noisy, situated and embodied mobile multi-robot domain. Using the methodology for synthesizing basis behaviors as a substrate for generating a large repertoire of higher-level group interactions, in this paper we describe how, given the substrate, greedy agents can learn social rules that benefit the group as a whole. We describe three sources of reinforcement and show their effectiveness in learning non-greedy social rules. We then demonstrate the learning approach on a group of four mobile robots learning to yield and share information in a foraging task. <br> - </details>
+
 <details> <summary> <a href="https://www.aaai.org/Papers/Workshops/1997/WS-97-03/WS97-03-002.pdf"> Learning Roles: Behavioral Diversity in Robot Teams </a>by Tucker Balch. AAAI, 1997. <a href="">  </a> </summary> This paper describes research investigating behavioral specialization in learning robot teams. Each agent is provided a common set of skills (motor schema-based behavioral assemblages) from which it builds a taskachieving strategy using reinforcement learning. The agents learn individually to activate particular behavioral assemblages given their current situation and a reward signal. The experiments, conducted in robot soccer simulations, evaluate the agents in terms of performance, policy convergence, and behavioral diversity. The results show that in many cases, robots will autorustically diversify by choosing heterogeneous behaviors. The degree of diversification and the performance of the team depend on the reward structure. When the entire team is jointly rewarded or penalized (global reinforcement), teams tend towards heterogeneous behavior. When agents are provided feedback individually (local reinforcement), they converge to identical policies. <br> - </details>
 
 <details> <summary> <a href="https://data.exppad.com/public/papers/Machine%20Learning/MMARL/Mataric%20(1997)%3A%20Reinforcement%20Learning%20in%20the%20Multi-Robot%20Domain.pdf"> Reinforcement Learning in the Multi-Robot Domain </a>by MAJA J. MATARIC. Autonomous Robots, 1997. <a href="link">  </a> </summary> This paper describes a formulation of reinforcement learning that enables learning in noisy, dynamic environments such as in the complex concurrent multi-robot learning domain. The methodology involves minimizing the learning space through the use of behaviors and conditions, and dealing with the credit assignment problem through shaped reinforcement in the form of heterogeneous reinforcement functions and progress estimators. We experimentally validate the approach on a group of four mobile robots learning a foraging task <br> - </details>
@@ -552,6 +563,9 @@ using Extended Optimal Response </a>by Nobuo Suematsu, Akira Hayashi. AAMAS, 200
 
 ### Theses
 
+<details> <summary> <a href="https://dspace.mit.edu/handle/1721.1/12012"> Interaction and intelligent behavior </a>Mataric, Maja J. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994. <a href="link">  </a> </summary> This thesis addresses situated, embodied agents interacting in complex domains. It focuses on two problems: I) synthesis and analysis of intelligent group behavior, and 2) learning in complex group environments. Behaviors are proposed as the appropriate level for control and learning. Basic behaviors are introduced as building blocks for synthesizing and analyzing system behavior. The thesis describes the process of selecting such basic behaviors, formally specifying them, algorithmically implementing them, and empirically evaluating them. All of the proposed ideas are validated with a group of up to 20 mobile robots owing a basic behavior set censisting of: avoidance, fallowing, aggregation, dispersion, and homing. The set of basic behaviors acts as a substrate for achieving more complex high-level goals and tasks. Two behavior combination operators are introduced, and verified by combining subsets of the above basic behavior set to implement collective flocking and foraging. A methodology is introduced for automatically constructing higher-level behaviors by learning to select among the basic behavior set. A novel formulation of reinforcement kerning in proposed that makes behavior selection learnable in noisy, uncertain multi-agent environments with stochastic dynamics. It consists of using conditions and behaviors for more robust control and minimised state-space, and reinforcement shaping methodology that enables principled embedding of domain knowledge with two types of shaping functions: heterogeneous reward functions and progress estimators. The methodology outperforms two alternatives when tested on a collection of robots learning to forage. The proposed formulation enables and accelerates leaning in complex multi-robot domains. The generality of the approach makes it compatible with the existing reinforcement learning algorithms, allowing it to accelerate learning in a variety of domains and applications. The presented methodologies end results are aimed at extending our understanding of synthesis, analysis, and learning of group behavior. 
+ <br> - </details>
+
 <details> <summary> <a href="http://reports-archive.adm.cs.cmu.edu/anon/1998/CMU-CS-98-187.pdf"> Layered Learning in Multi-Agent Systems </a>by Peter Stone. PhD thesis, 1998. <a href="link">  </a> </summary> Multi-agent systems in complex, real-time domains require agents to act effectively both autonomously and as part of a team. This dissertation addresses multi-agent systems consisting of teams of autonomous agents acting in real-time, noisy, collaborative, and adversarial environments. Because of the inherent complexity of this type of multi-agent system, this thesis investigates the use of machine learning within multi-agent systems. The dissertation makes four main  contributions to the fields of Machine Learning and Multi-Agent Systems. First, the thesis defines a team member agent architecture within which a exible teamstructure is presented, allowing agents to decompose the task space into exible roles and allowing them to smoothly switch roles while acting. Team organization is achieved by the introduction of a locker-room agreement as a collection of conventions followed by all team members. It defines agent roles, team formations, and pre-compiled multi-agent plans. In addition, the team member agent architecture includes a communication paradigm for domains with single-channel, low-bandwidth, unreliable communication. The communication paradigm facilitates team coordination while being robust to lost messages and active interference from opponents. Second, the thesis introduces layered learning, a general-purpose machine learning paradigm for complex domains in which learning a mapping directly from agents' sensors to their actuators is intractable. Given a hierarchical task decomposition, layered learning allows for learning at each level of the hierarchy, with learning at each level directly affecting learning at the next higher level. Third, the thesis introduces a new multi-agent reinforcement learning algorithm, namely team-partitioned, opaque-transition reinforcement learning (TPOT-RL). TPOT-RL is designed for domains in which agents cannot necessarily observe the state changes when other team members act. It exploits local, action-dependent features to aggressively generalize its input representation for learning and partitions the task among the agents, allowing them to simultaneously learn collaborative policies by observing the long-term effects of their actions. Fourth, the thesis contributes a fully functioning multi-agent system that incorporates learning in a real-time, noisy domain with teammates and adversaries. Detailed algorithmic descriptions of the agents' behaviors as well as their source code are included in the thesis. Empirical results validate all four contributions within the simulated robotic soccer domain. The generality of the contributions is verified by applying them to the real robotic soccer, and network routing domains. Ultimately, this dissertation demonstrates that by learning portions of their cognitive processes, selectively communicating, and coordinating their behaviors via common knowledge, a group of independent agents can work towards a common goal in a complex, real-time, noisy, collaborative, and adversarial environment. <br> - </details>
 
 <details> <summary> <a href="https://apps.dtic.mil/sti/pdfs/ADA461188.pdf"> Multiagent Learning in the Presence of Agents with Limitations </a>by Michael Bowling. Thesis, 2003. <a href="link">  </a> </summary> Learning to act in a multiagent environment is a challenging problem. Optimal behavior for one agent depends upon the behavior of the other agents, which are learning as well. Multiagent environments are therefore non-stationary, violating the traditional assumption underlying single-agent learning. In addition, agents in complex tasks may have limitations, such as physical constraints or designer-imposed approximations of the task that make learning tractable. Limitations prevent agents from acting optimally, which complicates the already challenging problem. A learning agent must effectively compensate for its own limitations while exploiting the limitations of the other agents. My thesis research focuses on these two challenges, namely multiagent learning and limitations, and includes four main contributions. First, the thesis introduces the novel concepts of a variable learning rate and the WoLF (Win or Learn Fast) principle to account for other learning agents. The WoLF principle is capable of making rational learning algorithms converge to optimal policies, and by doing so achieves two properties, rationality and convergence, which had not been achieved by previous techniques. The converging effect of WoLF is proven for a class of matrix games, and demonstrated empirically for a wide-range of stochastic games. Second, the thesis contributes an analysis of the effect of limitations on the game-theoretic concept of Nash equilibria. The existence of equilibria is important if multiagent learning techniques, which often depend on the concept, are to be applied to realistic problems where limitations are unavoidable. The thesis introduces a general model for the effect of limitations on agent behavior, which is used to analyze the resulting impact on equilibria. The thesis shows that equilibria do exist for a few restricted classes of games and limitations, but even well-behaved limitations do not preserve the existence of equilibria, in general. Third, the thesis introduces GraWoLF, a general-purpose, scalable, multiagent learning algorithm. GraWoLF combines policy gradient learning techniques with the WoLF variable learning rate. The effectiveness of the learning algorithm is demonstrated in both a card game with an intractably large state space, and an adversarial robot task. These two tasks are complex and agent limitations are prevalent in both. Fourth, the thesis describes the CMDragons robot soccer team strategy for adapting to an unknown opponent. The strategy uses a notion of plays as coordinated team plans. The selection of team plans is the decision point for adapting the team to its current opponent, based on the outcome of previously executed plays. The CMDragons were the first RoboCup robot team to employ online learning to autonomously alter its behavior during the course of a game. These four contributions demonstrate that it is possible to effectively learn to act in the presence of other learning agents in complex domains when agents may have limitations. The introduced learning techniques are proven effective in a class of small games, and demonstrated empirically across a wide range of settings that increase in complexity <br> - </details>

From 8cce21fd593973d4eb9d3c9d677062a496d08fe6 Mon Sep 17 00:00:00 2001
From: Dries Smit <dries.epos@gmail.com>
Date: Fri, 16 Dec 2022 07:18:34 +0200
Subject: [PATCH 3/6] feat: Add second page of shallow learning papers.

---
 Research Papers/Shallow learning/README.md | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/Research Papers/Shallow learning/README.md b/Research Papers/Shallow learning/README.md
index e75c57a..229c1d4 100644
--- a/Research Papers/Shallow learning/README.md	
+++ b/Research Papers/Shallow learning/README.md	
@@ -19,6 +19,8 @@ Jeffrey Kephartt, Edmund H. Durfee. ACM-EC, 1999. <a href="link">  </a> </summar
 
 <details> <summary> <a href="https://ora.ox.ac.uk/objects/uuid:f776a391-7ac0-4489-9e12-8b7db59dfc4b/download_file?safe_filename=Survey.pdf&file_format=application%2Fpdf&type_of_work=Working+paper"> Auction theory: A guide to the literature </a>by Klemperer P. Journal of economic surveys, 1999. <a href="link">  </a> </summary> This paper provides an elementary, non-technical, survey of auction theory, by introducing and describing some of the critical papers in the subject. (The most important of these are reproduced in a companion book, The Economic Theory of Auctions, Paul Klemperer (ed.), Edward Elgar (pub.), forthcoming.) We begin with the most fundamental concepts, and then introduce the basic analysis of optimal auctions, the revenue equivalence theorem, and marginal revenues. Subsequent sections addrms risk-aversion, affiliation, asymmetries, entry, collusion, multi-unit auctions, double auctions, royalties, incentive contracts, and other topics. Appendices contain technical details, some simple worked examples, and bibliographies.  <br> - </details>
 
+<details> <summary> <a href="https://dl.acm.org/doi/pdf/10.5555/2933718.2933870"> Evolving agent societies that avoid social dilemma </a>Manisha Mundhe and Sandip Sen. GECCO'00: Proceedings of the 2nd Annual Conference on Genetic and Evolutionary Computation, 2000. <a href="link">  </a> </summary> The social sciences literature abound in problems of providing and maintaining a public good in a society composed of self-interested individuals [8]. Public goods are social benefits that can be accessed by individuals irrespective of their personal contributions. In our previous work we have demonstrated the use of genetic algorithms (GAs) for generating an optimized agent society that can circumvent a particularly problematic social dilemma. In that approach, each chromosome represented the entire agent society and the GA found the best co-adapted society. Though encouraging, this result is less exciting than the possibility of evolving a set of co-adapted chromosomes where each chromosome represent an agent, and hence the population represents the society. In this paper, we describe our approach to using such an adaptive systems approach to using GAs for evolving agent societies. We present experimental results from several domains including the classic problem of the Tragedy of the Commons [17]. <br> - </details>
+
 <details> <summary> <a href="https://hpi.de/fileadmin/user_upload/fachgebiete/plattner/teaching/Dynamic_Pricing/kep00.pdf"> Dynamic pricing by software agents </a>by Jeffrey O. Kephart, James E. Hanson, Amy R. Greenwald. Computer Networks, 2000. <a href="link">  </a> </summary> We envision a future in which the global economy and the Internet will merge, evolving into an information economy bustling with billions of economically motivated software agents that exchange information goods and services with humans and other agents. Economic software agents will differ in important ways from their human counterparts, and these dierences may have significant beneficial or harmful effects upon the global economy. It is therefore important to consider the economic incentives and behaviors of economic software agents, and to use every available means to anticipate their collective interactions. We survey research conducted by the Information Economies group at IBM Research aimed at understanding collective interactions among agents that dynamically price information goods or services. In particular, we study the potential impact of widespread shopbot usage on prices, the price dynamics that may ensue from various mixtures of automated pricing agents (or _pricebots_), the potential use of machine-learning algorithms to improve profits, and more generally the interplay among learning, optimization, and dynamics in agentbased information economies. These studies illustrate both beneficial and harmful collective behaviors that can arise in such systems, suggest possible cures for some of the undesired phenomena, and raise fundamental theoretical issues, particularly in the realms of multi-agent learning and dynamic optimization. <br> - </details>
 
 <details> <summary> <a href="https://deepblue.lib.umich.edu/bitstream/handle/2027.42/50440/DynamicBundling.pdf?sequence=1"> Pricing information bundles in a dynamic environment </a>by J. Kephart, C. Brooks, and R. Das. ACMEC, 2001. <a href="link">  </a> </summary> We explore a scenario in which a monopolist producer of information goods seeks to maximize its profits in a market where consumer demand shifts frequently and unpredictably. The producer may set an arbitrarily complex price schedule---a function that maps the set of purchased items to a price. However, lacking direct knowledge of consumer demand, it cannotcompute the optimal schedule. Instead, it attempts to optimize profits via trial and error. By means of a simple model of consumer demand and a modified version of a simple nonlinear optimization routine, we study a variety of parametrizations of the price schedule and quantify some of the relationships among learnability, complexity, and profitability. In particular, we show that fixed pricing or simple two-parameter dynamic pricing schedules are preferred when demand shifts frequently, but that dynamic pricing based on more complex schedules tends to be most profitable when demand shifts very infrequently. <br> - </details>
@@ -170,6 +172,9 @@ Using Utility Graphs </a>by Valentin Robu, D.J.A. Somefun, J.A. La Poutre. AAMAS
 
 <details> <summary> <a href="https://www.ma.imperial.ac.uk/~dturaev/Hart0.pdf"> A simple adaptive procedure leading to correlated equilibrium </a>by Hart S, Mas‐Colell A. Econometrica, 2000. <a href="link">  </a> </summary> We propose a new and simple adaptive procedure for playing a game: ‘‘regret-matching.’’ In this procedure, players may depart from their current play with probabilities that are proportional to measures of regret for not having used other strategies in the past. It is shown that our adaptive procedure guarantees that, with probability one, the empirical distributions of play converge to the set of correlated equilibria of the game. <br> - </details>
 
+<details> <summary> <a href="https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=8616d861b029b7367aef209c92c0388920022e46"> Evaluating concurrent reinforcement learners </a>Manisha Mundhe and Sandip Sen. MultiAgent Systems, 2000. <a href="link">  </a> </summary> 
+Assumptions underlying the convergence proofs of reinforcement learning (RL) algorithms like Q-learning are violated when multiple interacting agents adapt their strategies online as a result of learning. Empirical investigations in several domains, however, have produced encouraging results. We evaluate the convergence behavior of concurrent reinforcement learning agents using game matrices as studied by Claus and Boutilier (1998). Variants of simple RL algorithms are evaluated for convergence under increasing number of agents per group, scale up of game matrix size, delayed feedback and game matrix characteristics. Our results show surprising departures from that observed by Claus and Boutilier, particular for larger problem sizes. <br> - </details>
+
 <details> <summary> <a href="https://www.cs.cmu.edu/~mmv/papers/01ijcai-mike.pdf"> Rational and Convergent Learning in Stochastic Games </a>by Michael Bowling, Manuela Veloso. 2001. </summary> This paper investigates the problem of policy learning in multiagent environments using the stochastic game framework, which we briefly overview. We introduce two properties as desirable for a learning agent when in the presence of other learning agents, namely rationality and convergence. We examine existing reinforcement learning algorithms according to these two properties and notice that they fail to simultaneously meet both criteria. We then contribute a new learning algorithm, WoLF policy hillclimbing, that is based on a simple principle: “learn quickly while losing, slowly while winning.” The algorithm is proven to be rational and we present empirical results for a number of stochastic games showing the algorithm converges.  <br> - </details>
 
 <details> <summary> <a href="https://dspace.mit.edu/bitstream/handle/1721.1/3688/CS023.pdf?sequence=2&isAllowed=y"> Playing is believing: the role of beliefs in multi-agent learning </a> by Yu-Han Chang, Leslie Pack Kaelbling. NeurIPS, 2001. </summary> We propose a new classification for multi-agent learning algorithms, with each league of players characterized by both their possible strategies and possible beliefs. Using this classification, we review the optimality of existing algorithms and discuss some insights that can be gained. We propose an incremental improvement to the existing algorithms that seems to achieve average payoffs that are at least the Nash equilibrium payoffs in the long-run against fair opponents. <br> - </details>
@@ -302,6 +307,8 @@ We introduce a compact graph-theoretic representation for multi-party game theor
 
 <details> <summary> <a href="https://www.econstor.eu/bitstream/10419/94705/1/wp407.pdf"> Nash Equilibrium and Evolution by Imitation </a>by Bjornerstedt J., and Weibull, J. The Rational Foundations of Economic Behavior, 1995. <a href="link">  </a> </summary> Nash's "mass action" interpretation of his equilibrium concept does not presume that the players know the game or are capable of sophisticated calculations. Instead, players are repeatedly and randomly drawn from large populations to play the game, one population for each player position, and base their strategy choice on observed payoffs. The present paper examines in some detail such an interpretation in a dass of population dynamics based on adaptation by way of imitation of successful behaviors. Drawing from results in evolutionary game theory, implications of dynamic stability for aggregate Nash equilibrium play are discussed.  <br> - </details>
 
+<details> <summary> <a href="http://web.ist.utl.pt/adriano.simoes/tese/referencias/Michalewicz%20Z.%20Genetic%20Algorithms%20+%20Data%20Structures%20=%20Evolution%20Programs%20%283ed%29.PDF"> Genetic Algorithms + Data Structures = Evolution Programs </a>Zbigniew Michalewicz. 3rd edition, 1996. <a href="link">  </a> </summary> Classic introduction to the evolution programming techniques. <br> - </details>
+
 <details> <summary> <a href="https://www.aaai.org/Papers/Symposia/Spring/1996/SS-96-01/SS96-01-005.pdf"> Evolutionary computing in multi-agent environments </a>by Lawrence Bull, Terence C Fogarty. AAAI, 1996. <a href="link">  </a> </summary> The fields of Artificial Intelligence and Artificial Life have both focused on complex systems in which agents must cooperate to achieve certain goals. In our work we examine the performance of the genetic algorithm when applied to systems of this type. That is, we examine the use of population-based evolutionary computing techniques within cooperative multi-agent environments. In extending the genetic algorithm to such environments we introduce three macro-level operators to reduce the amount of knowledge required a priori; the joining of agents (symbiogenesis), the transfer of genetic material between agents and the speciation of initially homogeneous agents. These operators are used in conjunction with a generic rule-based framework, a simplified version of Pittsburgh-style classifier systems, which we alter to allow for direct systemic communication to evolve between the thus represented agents. In this paper we use a simulated trail following task to demonstrate these techniques, finding that they can give improved performance <br> - </details>
 
 <details> <summary> <a href="https://www.researchgate.net/profile/Forrest-Bennett-Iii/publication/2345524_Discovery_by_Genetic_Programming_of_a_Cellular_Automata_Rule_that_is_Better_than_any_Known_Rule_for_the_Majority_Classification_Problem/links/0912f50a3c6bd0ccd2000000/Discovery-by-Genetic-Programming-of-a-Cellular-Automata-Rule-that-is-Better-than-any-Known-Rule-for-the-Majority-Classification-Problem.pdf"> Discovery by Genetic Programming of a Cellular Automata Rule that is Better than any Known Rule for the Majority Classification Problem </a>by David Andre, Forrest H Bennett, John R. Koza. Genetic programming, 1996. <a href="link">  </a> </summary> It is difficult to program cellular automata. This is especially true when the desired computation requires global communication and global integration of information across great distances in the cellular space. Various human-written algorithms have appeared in the past two decades for the vexatious majority classification task for one-dimensional two-state cellular automata. This paper describes how genetic programming with automatically defined functions evolved a rule for this task with an accuracy of 82.326%. This level of accuracy exceeds that of the original 1978 Gacs-Kurdyumov-Levin (GKL) rule, all other known human-written rules, and all other known rules produced by automated methods. The rule evolved by genetic programming is qualitatively different from all previous rules in that it employs a larger and more intricate repertoire of domains and particles to represent and communicate information across the cellular space. <br> - </details>
@@ -330,6 +337,9 @@ We introduce a compact graph-theoretic representation for multi-party game theor
 
 <details> <summary> <a href="https://watermark.silverchair.com/282794.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAAsYwggLCBgkqhkiG9w0BBwagggKzMIICrwIBADCCAqgGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMBRcVXHaCrhh9DPsJAgEQgIICecvIjI-AyHPqDEtoqxG45etk8Kr3qZnjlgYcF9HmV5Zgk3mbn6FJ-OzHZWLMlN-KdRwXGkIjDVCEOB8A-qFzdZz484BYJeQqM-d1usHshJkVd8nuz2mGVsBsS1PT9YPlw68yuUWF1UzLK41IyXZmZk4Pu58zMpQSken2DXuBZzC5R0btcbkHVhC_PuRR2Empi2m-xjW8dxWV5R3sbNiDnQQFLjm7BZZXTEE1qV1GvUCW0DdIDhcWJpa-LP6NZ1G0Evo4nlcVC4h1ZNzgizSe5oKTU3hJjeltMD35akait35q5EiPh6xMFqhykSJAAu-krMio05VNfHqAZ-n9vrW5g2W6z38eSBMCPmOx4VaLh9-vCyxHTIosjdbrsKISJ1F2t1AZitrUPniIOLr07aPZBVQHT1lxegrDKEQ-CzvB5Zg7cyVgyBRHzYKswHkCNij_3teAAB7Wi8DSmJWVKOovllkEuXayDtwEy8cktVZsgCGunE9Mz1wZKB0Pj6UGiccWtbGTS63nTNBgFh4ZntPAChV6x3olsXbzqcn0snIVvDtUzS_ziOyIGSD1WrHcZCQUFcWiZXPg059UQHKxunkIJCoZTqEhq5Aicg0MBOG7RPfBMIMv4GZtEPtNRuo3Ax1kFbdmlN2_FwnbSAtMYnh5an1Z3SjVeWkKCko2FBP6yigjm2vOIJBs0Q1s3sCNAFIoa7fSIH2jgmykzWsFQdRe-0ke7GHPs0UhzzDIYogT569IVE7FUeyY3lAHMFTcH3k8XlSCtnCu84J3nXoghqfHArjcDNZnTEg9mxu5W17_tnJzHhuU-viP7hzQu0gFaej7yaaCLRKkClTkuQ"> Evolution of biological information </a>by Thomas D. Schneider. Nucleic Acids Research, 2000. <a href="link">  </a> </summary> How do genetic systems gain information by evolutionary processes? Answering this question precisely requires a robust, quantitative measure of information. Fortunately, 50 years ago Claude Shannon defined information as a decrease in the uncertainty of a receiver. For molecular systems, uncertainty is closely related to entropy and hence has clear connections to the Second Law of Thermodynamics. These aspects of information theory have allowed the development of a straightforward and practical method of measuring information in genetic control systems. Here this method is used to observe information gain in the binding sites for an artificial ‘protein’ in a computer simulation of evolution. The simulation begins with zero information and, as in naturally occurring genetic systems, the information measured in the fully evolved binding sites is close to that needed to locate the sites in the genome. The transition is rapid, demonstrating that information gain can occur by punctuated equilibrium. <br> - </details>
 
+<details> <summary> <a href="https://dl.acm.org/doi/pdf/10.5555/2955239.2955398"> A Collective Genetic Algorithm </a>Thomas Miconi. ARTIFICIAL LIFE, ADAPTIVE BEHAVIOR, AND AGENTS, 2001. <a href="link">  </a> </summary> We take a look at the problem of collective evolution, and set the following goal : designing an algorithm that could allow a given population of agents to evolve incrementally, while they are performing their (possibly collaborative) task, with nothing more than a global fitness function
+to guide this evolution. We propose a simple algorithm that does just that, and apply it to a simple test problem (aggregation among animats controlled by feed-forward neural networks). We then show that under this form, this algorithm can only generate homogeneous systems. Seeing this as an unacceptable limitation, we modify our system in order to allow it to generate heterogeneous populations, in which semihomogeneous sub-populations (i.e. sub-species) emerge and grow (or regress) naturally until a stable state is reached. We successfully apply this modified algorithm to a very simple toyproblem of simulated chemistry. <br> - </details>
+
 <details> <summary> <a href="https://d1wqtxts1xzle7.cloudfront.net/30801544/Tuybnaic02-with-cover-page-v2.pdf?Expires=1668008453&Signature=D2CzKtWwSHhSIlxHOl7NcfNf3Be7IomezobJzZPLDBT2~3nA28zKf7xPQkQq13XFBfGgEIhx3IvzL9OHu4abVVSf9TxdFZwCaNg7JODf81a8~bBg2y9CITtTYBtmpw8gxQw9mXc4dpHBEc9dKwjLi18zC47x2e9gr4ZX3uYeRu6JflBxR6FmqwvlNzR4VxPvTv0DwgKdnALkVedwDLaGUlE7iQEd5VQgNhy8ZF-76bZ8qhGWv4FNdrFY5bjVAbJ2nz4vYcM2AAc6qNE~if9VjBARd1hkg0-3U7WLUDk2UnRzBz2rn9Z7ra75pN2MQB0VtpXAQHuh8gG5Od~MBOIfuA__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA"> Towards a relation between learning agents and evolutionary dynamics </a>by Karl Tuyls, Tom Lenaerts, Katja Verbeeck, Sam Maes. BNAIC, 2002. <a href="link">  </a> </summary> Modeling learning agents in the context of Multi-agent Systems requires insight in the type and form of interactions with the environment and other agents in the system. Usually, these agents are modeled similar to the different players in a standard game theoretical model. In this paper we examine whether evolutionary game theory, and more specifically the replicator dynamics, is an adequate theoretical model for the study of the dynamics of reinforcement learning agents in a multi-agent system. As a first step in this direction we extend the results of [1, 9] to a more general reinforcement learning framework, i.e. Learning Automata. <br> - </details>
 
 <details> <summary> <a href="https://cs.gmu.edu/~sean/papers/foga02-msr.pdf"> Guaranteeing Coevolutionary Objective Measures </a>Sean Luke, R. Paul Wiegand. FOGA, 2002. <a href="link">  </a> </summary> The task of understanding the dynamics of coevolutionary algorithms or comparing performance between such algorithms is complicated by the fact the internal fitness measures are subjective. Though several techniques have been proposed to use external or objective measures to help in analysis, there are clearly properties of fitness payoff, like intransitivity, for which these techniques are ineffective. We feel that a principled approach to this problem is to first establish the theoretical bounds to guarantee objective measures in one CEA model; from there one can later examine the effects of deviating from the assumptions made by these bounds. To this end, we present a model of competitive fitness assessment with a single population and non-parametric selection (such as tournament selection), and show minimum conditions and examples under which an objective measure exists, and when the dynamics of the coevolutionary algorithm are identical to those of a traditional EA. <br> - </details>
@@ -374,6 +384,8 @@ We introduce a compact graph-theoretic representation for multi-party game theor
 
 <details> <summary> <a href="https://www.tau.ac.il/~samet/papers/learning-to-play.pdf">  Learning to play games in extensive form by valuation </a>by Phillipe Jehiel, Dov Samet. NAJ Economics, 2001. <a href="link">  </a> </summary> Game theoretic models of learning which are based on the strategic form of the game cannot explain learning in games with large extensive form. We study learning in such games by using valuation of moves. A valuation for a player is a numeric assessment of her moves that purports to reflect their desirability. We consider a myopic player, who chooses moves with the highest valuation. Each time the game is played, the player revises her valuation by assigning the payoff obtained in the play to each of the moves she has made. We show for a repeated win–lose game that if the player has a winning strategy in the stage game, there is almost surely a time after which she always wins. When a player has more than two payoffs, a more elaborate learning procedure is required. We consider one that associates with each move the average payoff in the rounds in which this move was made. When all players adopt this learning procedure, with some perturbations, then, with probability 1 there is a time after which strategies that are close to subgame perfect equilibrium are played. A single player who adopts this procedure can guarantee only her individually rational payoff <br> - </details>
 
+<details> <summary> <a href="https://www.aaai.org/Papers/Symposia/Fall/2004/FS-04-02/FS04-02-004.pdf"> Stochastic Direct Reinforcement: Application to Simple Games with Recurrence </a>John Moody, Yufeng Liu, Matthew Saffell, and Kyoungju Youn. In Proceedings of Artificial Multiagent Learning, 2004. <a href="link">  </a> </summary> We investigate repeated matrix games with stochastic players as a microcosm for studying dynamic, multi-agent interactions using the Stochastic Direct Reinforcement (SDR) policy gradient algorithm. SDR is a generalization of Recurrent Reinforcement Learning (RRL) that supports stochastic policies. Unlike other RL algorithms, SDR and RRL use recurrent policy gradients to properly address temporal credit assignment resulting from recurrent structure. Our main goals in this paper are to (1) distinguish recurrent memory from standard, non-recurrent memory for policy gradient RL, (2) compare SDR with Q-type learning methods for simple games, (3) distinguish reactive from endogenous dynamical agent behavior and (4) explore the use of recurrent learning for interacting, dynamic agents. We find that SDR players learn much faster and hence outperform recently-proposed Q-type learners for the simple game Rock, Paper, Scissors (RPS). With more complex, dynamic SDR players and opponents, we demonstrate that recurrent representations and SDR's recurrent policy gradients yield better performance than non-recurrent players. For the Itterated Prisoners Dilemma, we show that non-recurrent SDR agents learn only to defect (Nash equilibrium), while SDR agents with recurrent gradients can learn a variety of interesting behaviors, including cooperation. <br> - </details>
+
 <br/>
 
 ### Fictitious Play
@@ -513,6 +525,8 @@ Halpern begins by surveying possible formal systems for representing uncertainty
 
 ### Robotic Teams
 
+<details> <summary> <a href="https://cse.sc.edu/~huhns/journalpapers/JASISv37n3MINDS.pdf"> An Intelligent System for Document Retrieval in Distributed Office Environments * </a>Uttam Mukhopadhyay, Larry M. Stephens, Michael N. Huhns, and Ronald D. Bonnell. Readings in Distributed Artificial Intelligence, 1986. <a href="link">  </a> </summary> MINDS (Multiple Intelligent Node Document Servers) is a distributed system of knowledge-based query engines for efficiently retrieving multimedia documents in an office environment of distributed workstations. By learning document distribution patterns, as well as user interests and preferences during system usage, it customizes document retrievals for each user. A two-layer learning system has been implemented for MINDS. The knowledge base used by the query engine is learned at the lower level with the help of heuristics for assigning credit and recommending adjustments; these heuristics are incrementally refined at the upper level. <br> - </details>
+
 <details> <summary> <a href="https://aaai.org/Papers/AAAI/1991/AAAI91-120.pdf"> Automatic programming of behavior-based robots using reinforcement learning </a>Sriclhar ahadevan an Jonathan Connell. AAAI-91 Proceedings, 1991. <a href="link">  </a> </summary> This paper describes a general approach for automatically programming a behavior-based robot. New behaviors are learned by trial and error using a performance feedback function as reinforcement. Two algorithms for behavior learning are described that combine Q learning, a well-known scheme for propagating reinforcement values temporally across actions, with statistical clustering and Hamming distance, two ways of propagating reinforcement values spatially across states. A real behavior-based robot called OBELIX is described that learns several component behaviors in an example task involving pushing boxes. A simulator for the box pushing task is also used to gather data on the learning techniques. A detailed experimental study using the real robot and the simulator suggests two conclusions.
 1. The learning techniques are able to learn the individual behaviors, sometimes outperforming a handcoded program.
 
@@ -520,6 +534,8 @@ Halpern begins by surveying possible formal systems for representing uncertainty
 
 <details> <summary> <a href="https://www.sci.brooklyn.cuny.edu/~sklar/teaching/boston-college/s01/mc375/ml94.pdf"> Reward Functions for Accelerated Learning </a>Maja J Mataric. ICML'94: Proceedings of the Eleventh International Conference on International Conference on Machine Learning, 1994. <a href="link">  </a> </summary> This paper discusses why traditional reinforcement learning methods, and algorithms applied to those models, result in poor performance in situated domains characterized by multiple goals, noisy state, and inconsistent reinforcement. We propose a methodology for designing reinforcement functions that take advantage of implicit domain knowledge in order to accelerate learning in such domains. The methodology involves the use of heterogeneous reinforcement functions and progress estimators, and applies to learning in domains with a single agent or with multiple agents. The methodology is experimentally validated on a group of mobile robots learning a foraging task. <br> - </details>
 
+<details> <summary> <a href="http://www.ee.nmt.edu/~elosery/spring_2007/ee382/final_reports/team2.pdf"> Cooperative multi-robot box-pushing </a>M.J. Mataric, M. Nilsson and K.T. Simsarin. Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots, 1995. <a href="link">  </a> </summary> abstract <br> - </details>
+
 <details> <summary> <a href="http://csc.ucdavis.edu/~dynlearn/dynlearn/RoMADS/mataric01/index.html"> Learning to Behave Socially </a>Mataric, M. Robotics and Autonomous Systems, 1997. <a href="link">  </a> </summary> This paper discusses the challenges of learning to behave socially in the dynamic, noisy, situated and embodied mobile multi-robot domain. Using the methodology for synthesizing basis behaviors as a substrate for generating a large repertoire of higher-level group interactions, in this paper we describe how, given the substrate, greedy agents can learn social rules that benefit the group as a whole. We describe three sources of reinforcement and show their effectiveness in learning non-greedy social rules. We then demonstrate the learning approach on a group of four mobile robots learning to yield and share information in a foraging task. <br> - </details>
 
 <details> <summary> <a href="https://www.aaai.org/Papers/Workshops/1997/WS-97-03/WS97-03-002.pdf"> Learning Roles: Behavioral Diversity in Robot Teams </a>by Tucker Balch. AAAI, 1997. <a href="">  </a> </summary> This paper describes research investigating behavioral specialization in learning robot teams. Each agent is provided a common set of skills (motor schema-based behavioral assemblages) from which it builds a taskachieving strategy using reinforcement learning. The agents learn individually to activate particular behavioral assemblages given their current situation and a reward signal. The experiments, conducted in robot soccer simulations, evaluate the agents in terms of performance, policy convergence, and behavioral diversity. The results show that in many cases, robots will autorustically diversify by choosing heterogeneous behaviors. The degree of diversification and the performance of the team depend on the reward structure. When the entire team is jointly rewarded or penalized (global reinforcement), teams tend towards heterogeneous behavior. When agents are provided feedback individually (local reinforcement), they converge to identical policies. <br> - </details>

From fa137885e2143f22e3ce500efb45296b001e1e23 Mon Sep 17 00:00:00 2001
From: Dries Smit <dries.epos@gmail.com>
Date: Fri, 16 Dec 2022 08:07:45 +0200
Subject: [PATCH 4/6] feat: Add third page of shallow learning papers.

---
 Research Papers/Shallow learning/README.md | 23 ++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/Research Papers/Shallow learning/README.md b/Research Papers/Shallow learning/README.md
index 229c1d4..8b52fd6 100644
--- a/Research Papers/Shallow learning/README.md	
+++ b/Research Papers/Shallow learning/README.md	
@@ -221,6 +221,8 @@ We introduce a compact graph-theoretic representation for multi-party game theor
 
 <details> <summary> <a href="http://www.ens.utulsa.edu/~sandip/jetai.pdf"> Individual learning of coordination knowledge </a>by Sandip Sen, Mahendra Sekaran. Journal of Experimental and Theoretical Artificial Intelligence, 1998. <a href="link">  </a> </summary> Social agents, both human and computational, inhabiting a world containing multiple active agents, need to coordinate their activities. This is because agents share resources, and without proper coordination or “rules of the road”, everybody will be interfering with the plans of others. As such, we need coordination schemes that allow agents to effectively achieve local goals without adversely affecting the problem-solving capabilities of other agents. Researchers in the field of Distributed Artificial Intelligence (DAI) have developed a variety of coordination schemes under different assumptions about agent capabilities and relationships. Whereas some of these research have been motivated by human cognitive biases, others have approached it as an engineering problem of designing the most effective coordination architecture or protocol. We evaluate individual and concurrent learning by multiple, autonomous agents as a means for acquiring coordination knowledge. We show that a uniform reinforcement learning algorithm suffices as a coordination mechanism in both cooperative and adversarial situations. Using a number of multiagent learning scenarios with both tight and loose coupling between agents and with immediate as well as delayed feedback, we demonstrate that agents can consistently develop effective policies to coordinate their actions without explicit information sharing. We demonstrate the viability of using both the Q-learning algorithm and genetic algorithm based classifier systems with different payoff schemes, namely the bucket brigade algorithm (BBA) and the profit sharing plan (PSP), for developing agent coordination on two different multi-agent domains. In addition, we show that a semi-random scheme for action selection is preferable to the more traditional fitness proportionate selection scheme used in classifier systems. <br> - </details>
 
+<details> <summary> <a href="https://www.researchgate.net/publication/2374246_Learning_Agents_in_a_Homo_Egualis_Society"> Learning Agents in a Homo Egualis Society </a> Ann Nowe, Katja Verbeeck and Tom Lenaerts. Technical report, 2001. <a href="link">  </a> </summary> Coordination is an important issue in multi-agent systems. A possible approach to tackle coordination, that recently received quite a lot of attention, is to learn the effects of interaction in the joint action space. However joint action spaces violate generally accepted requirements of multi-agents systems, such as: distributed control, asynchronous actions, incomplete information, cost of communication. Moreover we argue that on top of the drawbacks of joint action spaces, basic problems still remain unsolved. In this paper we propose an approach based on human sociality. In particular, we investigate a Homo egualis setting inspired on anthropology. As shown in the paper Homo egualis results in a periodical policy that is fair to all agents involved and that results in an optimal solution with respect to the global goal. <br> - </details>
+
 <details> <summary> <a href="https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=29893127a377d37f16662dc482e012ff900bd55a"> Coordinated Reinforcement Learning </a>by Carlos Guestrin, Michail Lagoudakis, Ronald Parr. AAAI, 2002. <a href="link">  </a> </summary> We present several new algorithms for multiagent reinforcement learning. A common feature of these algorithms is a parameterized, structured representation of a policy or value function. This structure is leveraged in an approach we call coordinated reinforcement learning, by which agents coordinate both their action selection activities and their parameter updates. Within the limits of our parametric representations, the agents will determine a jointly optimal action without explicitly considering every possible action in their exponentially large joint action space. Our methods differ from many previous reinforcement learning approaches to multiagent coordination in that structured communication and coordination between agents appears at the core of both the learning algorithm and the execution architecture. Our experimental results, comparing our approach to other RL methods, illustrate both the quality of the policies obtained and the additional benefits of coordination <br> - </details>
 
 <details> <summary> <a href="https://www.aaai.org/Papers/AAAI/2002/AAAI02-050.pdf"> Reinforcement Learning of Coordination in Cooperative Multiagent Systems </a>by Kapetanakis S. and Kudenko D. AAAI, 2002. <a href="link">  </a> </summary> We report on an investigation of reinforcement learning techniques for the learning of coordination in cooperative multiagent systems. Specifically, we focus on a novel action selection strategy for Q-learning (Watkins 1989). The new technique is applicable to scenarios where mutual observation of actions is not possible. To date, reinforcement learning approaches for such independent agents did not guarantee convergence to the optimal joint action in scenarios with high miscoordination costs. We improve on previous results (Claus & Boutilier 1998) by demonstrating empirically that our extension causes the agents to converge almost always to the optimal joint action even in these difficult cases. <br> - </details>
@@ -239,6 +241,8 @@ We introduce a compact graph-theoretic representation for multi-party game theor
 
 <details> <summary> <a href="https://www.ai.rug.nl/~mwiering/GROUP/ARTICLES/TR-29-97-soccer.pdf"> Learning team strategies with multiple policy-sharing agents: A soccer case study </a>by R. Salustowicz, M. Wiering, and J. Schmidhuber. Technical report, 1997. <a href="link">  </a> </summary> We use simulated soccer to study multiagent learning. Each team's players (agents) share action set and policy, but may behave differently due to position-dependent inputs. All agents making up a team are rewarded or punished collectively in case of goals. We conduct simulations with varying team sizes, and compare several learning algorithms: TD-Q learning with linear neural networks (TD-Q), Probabilistic Incremental Program Evolution (PIPE), and a PIPE version that learns by coevolution (CO-PIPE). TD-Q is based on learning evaluation functions (EFs) mapping input/action pairs to expected reward. PIPE and CO-PIPE search policy space directly. They use adaptive probability distributions to synthesize programs that calculate action probabilities from current inputs. Our results show that linear TD-Q encounters several difficulties in learning appropriate shared EFs. PIPE and CO-PIPE, however, do not depend on EFs and find good policies faster and more reliably. This suggests that in some multiagent learning scenarios direct search in policy space can offer advantages over EF-based approaches. <br> - </details>
 
+<details> <summary> <a href="https://pure.iiasa.ac.at/id/eprint/5608/1/IR-98-040.pdf"> Evolution of indirect reciprocity by image scoring </a>Martin A. Nowak and Karl Sigmund. Nature, 1998. <a href="link">  </a> </summary> Darwinian evolution has to provide an explanation for cooperative behaviour. Theories of cooperation are based on kin selection (dependent on genetic relatedness), group selection and reciprocal altruism. The idea of reciprocal altruism usually involves direct reciprocity: repeated encounters between the same individuals allow for the return of an altruistic act by the recipient. Here we present a new theoretical framework, which is based on indirect reciprocity and does not require the same two individuals ever to meet again. Individual selection can nevertheless favour cooperative strategies directed towards recipients that have helped others in the past. Cooperation pays because it confers the image of a valuable community member to the cooperating individual. We present computer simulations and analytic models that specify the conditions required for evolutionary stability of indirect reciprocity. We show that the probability of knowing the 'image' of the recipient must exceed the cost-to-benefit ratio of the altruistic act. We propose that the emergence of indirect reciprocity was a decisive step for the evolution of human societies. <br> - </details>
+
 <details> <summary> <a href="https://link.springer.com/content/pdf/10.1007/3-540-39963-1.pdf?pdf=button"> Evolving Behaviors for Cooperating Agents </a>by Jeffrey K. Bassett, Kenneth A. De Jong. Symposium on Methodologies for Intelligent Systems, 2000. <a href="link">  </a> </summary> A good deal of progress has been made in the past few years in the design and implementation of control programs for autonomous agents. A natural extension of this work is to consider solving difficult tasks with teams of cooperating agents. Our interest in this area is motivated in part by our involvement in a Navy-sponsored micro air vehicle (MAV) project in which the goal is to solve difficult surveillance tasks using a large team of small inexpensive autonomous air vehicles rather than a few expensive piloted vehicles. Our approach to developing control programs for these MAVs is to use evolutionary computation techniques to evolve behavioral rule sets. In this paper we describe our architecture for achieving this, and we present some of our initial results. <br> - </details>
 
 <details> <summary> <a href="https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=839146&casa_token=Qb2aIlZRh9IAAAAA:1BrtA4ESZr_iJJJ4vhG6HZstOeQ-qfuqfs3iIGu2HutZb5bXpR9KyJTslQNF4YeYJ9l-Jw-mGac5BDk&tag=1"> Advantages of Cooperation Between Reinforcement Learning Agents in Difficult Stochastic Problems </a>by Hamid R. Berenji, David Vengerov. International Conference on Fuzzy Systems, 2000. <a href="link">  </a> </summary> This paper presents the first results in understanding the reasons for cooperative advantage between reinforcement learning agents. We consider a cooporation method which consists of using and updating a common policy. We tested this method on a complex fuzzy reinforcement learning problem and found that cooperation brings larger than expected benefits. More precisely, we found that K cooperative agents oach learning for N time steps outperform K independent agents each learning in a separate world for K*N time steps. In this paper, we explain the observed phenomenon and determine the necessary conditions for its presence in a wide class of reinforccment learning problems.  <br> - </details>
@@ -249,6 +253,12 @@ We introduce a compact graph-theoretic representation for multi-party game theor
 
 <details> <summary> <a href="https://proceedings.neurips.cc/paper/2003/file/c8067ad1937f728f51288b3eb986afaa-Paper.pdf"> All learning is local: Multi-agent learning in global reward games </a>by Yu-Han Chang, Tracey Ho, Leslie Pack Kaelbling. NeurIPS, 2003. <a href="link">  </a> </summary> In large multiagent games, partial observability, coordination, and credit assignment persistently plague attempts to design good learning algorithms. We provide a simple and efficient algorithm that in part uses a linear system to model the world from a single agent’s limited perspective, and takes advantage of Kalman filtering to allow an agent to construct a good training signal and learn an effective policy <br> - </details>
 
+<details> <summary> <a href="https://faculty.cc.gatech.edu/~turk/bio_sim/articles/ant_foraging_revisited.pdf"> Ant Foraging Revisited </a>Liviu A. Panait and Sean Luke. In Proceedings of the Ninth International Conference on the Simulation and Synthesis of Living Systems (ALIFE9), 2004. <a href="link">  </a> </summary> Most previous artificial ant foraging algorithms have to date relied to some degree on a priori knowledge of the environment, in the form of explicit gradients generated by the nest, by hard-coding the nest location in an easily-discoverable place, or by imbuing the artificial ants with the knowledge of the nest direction. In contrast, the work presented solve ant foraging problems using two pheromones, one applied when searching for food and the other when returning food items to the nest. This replaces the need to use complicated nest-discovery devices with simpler mechanisms based on pheromone information, which in turn reduces the ant system complexity. The resulting algorithm is orthogonal and simple, yet ants are able to establish increasingly efficient trails from the nest to the food in the presence of obstacles. The algorithm replaces the blind addition of new amounts of pheromones with an adjustment mechanism that resembles dynamic programming. <br> - </details>
+
+<details> <summary> <a href="https://cs.gmu.edu/~eclab/projects/mason/publications/aamas-ant.pdf"> A Pheromone-Based Utility Model for Collaborative Foraging </a>L. Panait, S. Luke. AAMAS-2004 — Proceedings of the Third International Joint Conference on Autonomous Agents and Multi Agent Systems, 2004. <a href="link">  </a> </summary> Multi-agent research often borrows from biology, where remarkable examples of collective intelligence may be found. One interesting example is ant colonies’ use of pheromones as a joint communication mechanism. In this paper we propose two pheromone-based algorithms for artificial agent foraging, trail-creation, and other tasks. Whereas practically all previous work in this area has focused on biologically-plausible but ad-hoc single pheromone models, we have developed a formalism which uses multiple pheromones to guide cooperative tasks. This model bears some similarity to reinforcement learning. However, our model takes advantage of symmetries common to foraging environments which enables it to achieve much faster reward propagation than reinforcement learning does. Using this approach we demonstrate cooperative behaviors well beyond the previous ant-foraging work, including the ability to create optimal foraging paths in the presence of obstacles, to cope with dynamic environments, and to follow tours with multiple waypoints.We believe that this model may be used for more complex problems still. <br> - </details>
+
+<details> <summary> <a href="https://cs.gmu.edu/~sean/papers/gecco04-bias.pdf"> A Sensitivity Analysis of a Cooperative Coevolutionary Algorithm Biased for Optimization </a>Liviu Panait, R. Paul Wiegand, and Sean Luke. In Genetic and Evolutionary Computation Conference — GECCO-2004. Springer, 2004. <a href="link">  </a> </summary> Recent theoretical work helped explain certain optimization-related pathologies in cooperative coevolutionary algorithms (CCEAs). Such explanations have led to adopting specific and constructive strategies for improving CCEA optimization performance by biasing the algorithm toward ideal collaboration. This paper investigates how sensitivity to the degree of bias (set in advance) is affected by certain algorithmic and problem properties. We discover that the previous static biasing approach is quite sensitive to a number of problem properties, and we propose a stochastic alternative which alleviates this problem. We believe that finding appropriate biasing rates is more feasible with this new biasing technique. <br> - </details>
+
 <details> <summary> <a href="https://www.cs.cmu.edu/~sandholm/cs15-892F15/MarginalContributionEC05.pdf"> Marginal contribution nets: A compact representation scheme for coalitional games </a>by Ieong S, Shoham Y. In Proceedings of the 6th ACM Conference on Electronic Commerce, 2005. <a href="link">  </a> </summary> We present a new approach to representing coalitional games based on rules that describe the marginal contributions of the agents. This representation scheme captures characteristics of the interactions among the agents in a natural and concise manner. We also develop efficient algorithms for two of the most important solution concepts, the Shapley value and the core, under this representation. The Shapley value can be computed in time linear in the size of the input. The emptiness of the core can be determined in time exponential only in the treewidth of a graphical interpretation of our representation. <br> - </details>
 
 <details> <summary> <a href="https://cs.gmu.edu/~sean/papers/luke05tunable.pdf"> Tunably decentralized algorithms for cooperative target observation </a>Luke, Sean and Sullivan, Keith and Panait, Liviu and Balan, Gabriel. 4th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2005), 2005. <a href="link">  </a> </summary> Multi-agent problem domains may require distributed algorithms for a variety of reasons: local sensors, limitations of communication, and availability of distributed computational resources. In the absence of these constraints, centralized algorithms are often more efficient, simply because they are able to take advantage of more information. We introduce a variant of the cooperative target observation domain which is free of such constraints. We propose two algorithms, inspired by K-means clustering and hill-climbing respectively, which are scalable in degree of decentralization. Neither algorithm consistently outperforms the other across over all problem domain settings. Surprisingly, we find that hill-climbing is sensitive to degree of decentralization, while K-means is not. We also experiment with a combination of the two algorithms which draws strength from each. <br> - </details>
@@ -273,6 +283,8 @@ We introduce a compact graph-theoretic representation for multi-party game theor
 
 <details> <summary> <a href="https://www.jair.org/index.php/jair/article/view/10339/24717"> Decision-Theoretic Bidding Based on Learned Density Models in Simultaneous, Interacting Auctions </a>by P. Stone, R. S. P., M. L. Littman, J. A. Csirik, and D. McAlleste. JAIR, 2003. <a href="link">  </a> </summary> Auctions are becoming an increasingly popular method for transacting business, especially over the Internet. This article presents a general approach to building autonomous bidding agents to bid in multiple simultaneous auctions for interacting goods. A core component of our approach learns a model of the empirical price dynamics based on past data and uses the model to analytically calculate, to the greatest extent possible, optimal bids. We introduce a new and general boosting-based algorithm for conditional density estimation problems of this kind, i.e., supervised learning problems in which the goal is to estimate the entire conditional distribution of the real-valued label. This approach is fully implemented as ATTac-2001, a top-scoring agent in the second Trading Agent Competition (TAC-01). We present experiments demonstrating the effectiveness of our boosting-based price predictor relative to several reasonable alternatives. <br> - </details>
 
+<details> <summary> <a href="https://proceedings.neurips.cc/paper/2006/file/0f21f0349462cacdc5796990d37760ae-Paper.pdf"> Learning from Multiple Sources∗ </a>Koby Crammer, Michael Kearns and Jennifer Wortman. Proceedings of the Third International Joint Conference on Autonomous Agents and Multi Agent Systems, 2004. <a href="link">  </a> </summary> We consider the problem of learning accurate models from multiple sources of “nearby” data. Given distinct samples from multiple data sources and estimates of the dissimilarities between these sources, we provide a general theory of which samples should be used to learn models for each source. This theory is applicable in a broad decision-theoretic learning framework, and yields general results for classification and regression. A key component of our approach is the development of approximate triangle inequalities for expected loss, which may be of independent interest. We discuss the related problem of learning parameters of a distribution from multiple data sources. Finally, we illustrate our theory through a series of synthetic simulations. <br> - </details>
+
 <br/>
 
 ### Dispersion Games
@@ -340,6 +352,8 @@ We introduce a compact graph-theoretic representation for multi-party game theor
 <details> <summary> <a href="https://dl.acm.org/doi/pdf/10.5555/2955239.2955398"> A Collective Genetic Algorithm </a>Thomas Miconi. ARTIFICIAL LIFE, ADAPTIVE BEHAVIOR, AND AGENTS, 2001. <a href="link">  </a> </summary> We take a look at the problem of collective evolution, and set the following goal : designing an algorithm that could allow a given population of agents to evolve incrementally, while they are performing their (possibly collaborative) task, with nothing more than a global fitness function
 to guide this evolution. We propose a simple algorithm that does just that, and apply it to a simple test problem (aggregation among animats controlled by feed-forward neural networks). We then show that under this form, this algorithm can only generate homogeneous systems. Seeing this as an unacceptable limitation, we modify our system in order to allow it to generate heterogeneous populations, in which semihomogeneous sub-populations (i.e. sub-species) emerge and grow (or regress) naturally until a stable state is reached. We successfully apply this modified algorithm to a very simple toyproblem of simulated chemistry. <br> - </details>
 
+<details> <summary> <a href="https://pdxscholar.library.pdx.edu/cgi/viewcontent.cgi?referer=&httpsredir=1&article=1119&context=compsci_fac"> A Comparison of Evolutionary and Coevolutionary Search </a>Ludo Pagie and Melanie Mitchell. International Journal of Computational Intelligence and Applications, 2001. <a href="link">  </a> </summary> We present a comparative study of an evolutionary and a coevolutionary search model. In the latter, strategies for solving a problem coevolve with training cases. We find that the coevolutionary model has a relatively large efficacy: 41 out of 50 (82%) of the simulations produce high quality strategies. In contrast, the evolutionary model has a very low efficacy: 1 out of 50 runs (2%) produce high quality strategies. We show that the increased efficacy in the coevolutionary model results from the direct exploitation of lowquality strategies by the population of training cases. We also present evidence that the generality of the high-quality strategies can suffer as a result of this same exploitation. <br> - </details>
+
 <details> <summary> <a href="https://d1wqtxts1xzle7.cloudfront.net/30801544/Tuybnaic02-with-cover-page-v2.pdf?Expires=1668008453&Signature=D2CzKtWwSHhSIlxHOl7NcfNf3Be7IomezobJzZPLDBT2~3nA28zKf7xPQkQq13XFBfGgEIhx3IvzL9OHu4abVVSf9TxdFZwCaNg7JODf81a8~bBg2y9CITtTYBtmpw8gxQw9mXc4dpHBEc9dKwjLi18zC47x2e9gr4ZX3uYeRu6JflBxR6FmqwvlNzR4VxPvTv0DwgKdnALkVedwDLaGUlE7iQEd5VQgNhy8ZF-76bZ8qhGWv4FNdrFY5bjVAbJ2nz4vYcM2AAc6qNE~if9VjBARd1hkg0-3U7WLUDk2UnRzBz2rn9Z7ra75pN2MQB0VtpXAQHuh8gG5Od~MBOIfuA__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA"> Towards a relation between learning agents and evolutionary dynamics </a>by Karl Tuyls, Tom Lenaerts, Katja Verbeeck, Sam Maes. BNAIC, 2002. <a href="link">  </a> </summary> Modeling learning agents in the context of Multi-agent Systems requires insight in the type and form of interactions with the environment and other agents in the system. Usually, these agents are modeled similar to the different players in a standard game theoretical model. In this paper we examine whether evolutionary game theory, and more specifically the replicator dynamics, is an adequate theoretical model for the study of the dynamics of reinforcement learning agents in a multi-agent system. As a first step in this direction we extend the results of [1, 9] to a more general reinforcement learning framework, i.e. Learning Automata. <br> - </details>
 
 <details> <summary> <a href="https://cs.gmu.edu/~sean/papers/foga02-msr.pdf"> Guaranteeing Coevolutionary Objective Measures </a>Sean Luke, R. Paul Wiegand. FOGA, 2002. <a href="link">  </a> </summary> The task of understanding the dynamics of coevolutionary algorithms or comparing performance between such algorithms is complicated by the fact the internal fitness measures are subjective. Though several techniques have been proposed to use external or objective measures to help in analysis, there are clearly properties of fitness payoff, like intransitivity, for which these techniques are ineffective. We feel that a principled approach to this problem is to first establish the theoretical bounds to guarantee objective measures in one CEA model; from there one can later examine the effects of deviating from the assumptions made by these bounds. To this end, we present a model of competitive fitness assessment with a single population and non-parametric selection (such as tournament selection), and show minimum conditions and examples under which an objective measure exists, and when the dynamics of the coevolutionary algorithm are identical to those of a traditional EA. <br> - </details>
@@ -548,8 +562,13 @@ Halpern begins by surveying possible formal systems for representing uncertainty
 
 <details> <summary> <a href="https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=e10cabac2e5e90dafd9ccea24578bf0cd3f987d0"> Heterogeneity in the Coevolved Behaviors of Mobile Robots: The Emergence of Specialists </a>by Mitchell A. Potter, Lisa A. Meeden, Alan C. Schult. IJCAI, 2001. <a href="link">  </a> </summary> Many mobile robot tasks can be most efficiently solved when a group of robots is utilized. The type of organization, and the level of coordination and communication within a team of robots affects the type of tasks that can be solved. This paper examines the tradeoff of homogeneity versus heterogeneity in the controlsystems by allowing a team of robots to coevolve their high-level controllers given differentlevels of difficulty of the task. Our hypothesis is that simply increasing the difficulty of a task is not enough to induce a team of robots to create specialists. The key factor is not difficulty per se, but the number of skill sets necessary to successfully solve the task. As the number of skills needed increases, the more beneficial and necessary heterogeneity becomes. We demonstrate this in the task domain of herding, where one or more robots must herd another robot into a confined space. <br> - </details>
 
+
+<details> <summary> <a href="file:///home/dries/Desktop/Research%20papers/Emergent_bucket_brigading-a_simple_mechanism_for_i.pdf"> Emergent bucket brigading-a simple mechanism for improving performance in multi-robot constrained-space foraging tasks </a>Esben H. Ostergaard, Gaurav S. Sukhatme, Maja J. Matari. Proceedings of the Fifth International Conference on Autonomous Agents,, 2001. <a href="link">  </a> </summary> This paper is concerned with the multi-robot foraging problem, which has been described as one of the canonical problems for cooperative robotics. In particular, the multi-robot transportation task in constrained space environments is considered. We describe a simple algorithm which produces bucket brigade-like behavior, where the robots “hand off” resources to each other, using only local sensing. The algorithm is tested in five experimental conditions, to empirically evaluate the strengths/weaknesses of the approach, and is found to perform significantly better in constrained space environments than a homogeneous foraging approach. The key results of the described work are: 1) a framework for foraging tasks in general and 2) a simple mechanism that acts locally to produce bucket brigading behavior globally. <br> - </details>
+
 <details> <summary> <a href="https://www.aaai.org/Papers/Symposia/Fall/2004/FS-04-02/FS04-02-003.pdf"> Co-Evolving Team Capture Strategies for Dissimilar Robots </a>by H. Joseph Blumenthal, Gary B. Parker. AAAI, 2004. <a href="link">  </a> </summary> Evolving team members to act cohesively is a complex and challenging problem. To allow the greatest range of solutions in team problem solving, heterogeneous agents are desirable. To produce highly specialized agents, team members should be evolved in separate populations. Co-evolution in separate populations requires a system for selecting suitable partners for evaluation at trial time. Selecting too many partners for evaluation drives computation time to unreasonable levels, while selecting too few partners blinds the GA from recognizing highly fit individuals. In previous work, we employed a method based on punctuated anytime learning which periodically tests a number of partner combinations to select a single individual from each population to be used at trail time. We began testing our method in simulation using a two-agent box pushing task. We then expanded our research by simulating a predator-prey scenario in which all the agents had the exact same capabilities. In this paper, we report the expansion of our work by applying this method of team learning to five dissimilar robots. <br> - </details>
 
+<details> <summary> <a href="https://www.clear.rice.edu/comp551/papers/Parker-CurrentState.pdf"> Current state of the art in distributed autonomous mobile robotics </a>L. Parker, G. Bekey, and J. Barhen. Distributed Autonomous Robotic Systems 4,, pages 3–12. Springer-Verlag, 2000. <a href="link">  </a> </summary> As research progresses in distributed robotic systems, more and more aspects of multi-robot systems are being explored. This article surveys the current state of the art in distributed mobile robot systems. Our focus is principally on research that has been demonstrated in physical robot implementations. We have identified eight primary research topics within multi-robot systems -- biological inspirations, communication, architectures, localization/mapping/exploration, object transport and manipulation, motion coordination, reconfigurable robots, and learning - and discuss the current state of research in these areas. As we describe each research area, we identify some key open issues in multi-robot team research. We conclude by identifying several additional open research issues in distributed mobile robotic systems. <br> - </details>
+
 <details> <summary> <a href="http://nozdr.ru/data/media/biblio/kolxoz/Cs/CsLn/L/Learning%20and%20Adaption%20in%20Multi-Agent%20Systems,%201%20conf.,%20LAMAS%202005(LNCS3898,%20Springer,%202006)(ISBN%203540330534)(224s).pdf#page=185"> Efficient Reward Functions for Adaptive Multi-rover Systems </a>by Kagan Tumer, Adrian Agogino. LAMAS, 2005. <a href="link">  </a> </summary> This chapter focuses on deriving reward functions that allow multiple agents to co-evolve efficient control policies that maximize a system level reward in noisy and dynamic environments. The solution we present is based on agent rewards satisfying two crucial properties. First, the agent reward function and global reward function has to be aligned, that is, an agent maximizing its agent-specific reward should also maximize the global reward. Second, the agent has to receive sufficient “signal” from its reward, that is, an agent’s action should have a large influence over its agent-specific reward. Agents using rewards with these two properties will evolve the correct policies quickly. This hypothesis is tested in episodic and non-episodic, continuous-space multi-rover environment where rovers evolve to maximize a global reward function over all rovers. The environments are dynamic (i.e. changes over time), noisy and have restriction on communication between agents. We show that a control policy evolved using agent-specific rewards satisfying the above properties outperforms policies evolved using global rewards by up to 400%. More notably, in the presence of a larger number of rovers or rovers with noisy and communication limited sensors, the proposed method outperforms global reward by a higher percentage than in noisefree conditions with a small number of rovers. <br> - </details>
 
 <br/>
@@ -644,3 +663,7 @@ The papers below were found to be difficult to categorise and therefore are pres
 
 <!-- <details> <summary> <a href="link"> title </a>by authors. Conference, year. <a href="link">  </a> </summary> abstract <br> - </details> -->
 
+
+
+
+

From 6f16227d1a57418c490740e10eee4d2342be97ba Mon Sep 17 00:00:00 2001
From: Dries Smit <dries.epos@gmail.com>
Date: Fri, 16 Dec 2022 08:51:20 +0200
Subject: [PATCH 5/6] feat: Add forth page of shallow learning papers.

---
 Research Papers/Shallow learning/README.md | 37 +++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/Research Papers/Shallow learning/README.md b/Research Papers/Shallow learning/README.md
index 8b52fd6..1ae1060 100644
--- a/Research Papers/Shallow learning/README.md	
+++ b/Research Papers/Shallow learning/README.md	
@@ -249,14 +249,26 @@ We introduce a compact graph-theoretic representation for multi-party game theor
 
 <details> <summary> <a href="https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=839146&casa_token=Y-3elWT3fZUAAAAA:YAI-tJ6dnLkc6QZ7rGlVy0sOeKOEw3PRxqCdi2igfTJWnFcMbn6Zl5Y9WN-Iq9RlcNyP__7dPKsQJg&tag=1"> Advantages of Cooperation Between Reinforcement Learning Agents in Difficult Stochastic Problems  </a>by Hamid R. Berenji, David Vengerov. Fuzzy Systems, 2000. <a href="link">  </a> </summary> This paper presents the first results in understanding the reasons for cooperative advantage betwccn reinforcement learning agents. We consider a cooporation method which consists of using and updating a common policy. We tested this method on a complex fumy reinforcement learning problem and found that cooperation brings larger than expected benefits. More precisely, we found that K cooperative agents oach learning for N time steps outperform K independent agents oach learning in a separate world for K*N time steps. In this paper we explain the observed phenomenon and determine the necessary conditions for its presence in a wide class of reinforccment learning problems. <br> - </details>
 
+<details> <summary> <a href="http://web.eecs.utk.edu/~leparker/publications/DARS_2000_learning.pdf"> Multi-robot learning in a cooperative observation task </a>Lynne Parker and Claude F. Touzet.  In Proceedings of Fifth International Symposium on Distributed Autonomous Robotic Systems (DARS 2000), 2000. <a href="link">  </a> </summary> An important need in multi-robot systems is the development of mechanisms that enable robot teams to autonomously generate cooperative behaviors. This paper first briefly presents the Cooperative Multi-robot Observation of Multiple Moving Targets (CMOMMT) application as a rich domain for studying the issues of multi-robot learning of new behaviors. We discuss the results of our hand-generated algorithm for CMOMMT, and then describe our research in generating multi-robot learning techniques for the CMOMMT application, comparing the results to the hand-generated solutions. Our results show that, while the learning approach performs better than random, naive approaches, much room still remains to match the results obtained from the hand-generated approach. The ultimate goal of this research is to develop techniques for multi-robot learning and adaptation that will generalize to cooperative robot applications in many domains, thus facilitating the practical use of multi-robot teams in a wide variety of real-world applications. <br> - </details>
+
 <details> <summary> <a href="https://arxiv.org/pdf/cs/0105032.pdf"> Learning to Cooperate via Policy Search </a>by Leonid Peshkin, Kee-Eung Kim, Nicolas Meuleau, Leslie Pack Kaelbling. UAI, 2000. <a href="link">  </a> </summary> Cooperative games are those in which both agents share the same payoff structure. Valuebased reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based methods for partially observable environments. In this paper, we provide a gradient-based distributed policysearch method for cooperative games and compare the notion of local optimum to that of Nash equilibrium. We demonstrate the effectiveness of this method experimentally in a small, partially observable simulated soccer domain <br> - </details>
 
+<details> <summary> <a href="http://web.eecs.utk.edu/~leparker/publications/LEP_chapter.pdf"> Techniques for Learning in Multi-Robot Teams </a>Lynne Parker and Claude F. Touzet. Robot Teams: From Diversity to Polymorphism. AK Peters, 2001. <a href="link">  </a> </summary> Before multi-robot teams will ever become widely used in practice, we believe that advances must be made in the development of mechanisms that enable the robot teams to autonomously generate and adapt cooperative behaviors.  <br> - </details>
+
+<details> <summary> <a href="http://web.eecs.utk.edu/~leparker/publications/AutoRob02.pdf"> Distributed algorithms for multi-robot observation of multiple moving targets </a>LYNNE E. PARKER. Autonomous Robots, 12(3), 2002. <a href="link">  </a> </summary> 
+An important issue that arises in the automation of many security, surveillance, and reconnaissance tasks is that of observing the movements of targets navigating in a bounded area of interest. A key research issue in these problems is that of sensor placement ( determining where sensors should be located to maintain the targets in view). In complex applications involving limited-range sensors, the use of multiple sensors dynamically moving over time is required. In this paper, we investigate the use of a cooperative team of autonomous sensor-based robots for the observation of multiple moving targets. In other research, analytical techniques have been developed for solving this problem in complex geometrical environments. However, these previous approaches are very computationally expensive ( at least exponential in the number of robots ) and cannot be implemented on robots operating in real-time. Thus, this paper reports on our studies of a simpler problem involving uncluttered environments those with either no obstacles or with randomly distributed simple convex obstacles. We focus primarily on developing the on-line distributed control strategies that allow the robot team to attempt to minimize the total time in which targets escape observation by some robot team member in the area of interest. This paper first formalizes the problem (which we term CMOMMT for Cooperative Multi-Robot Observation of Multiple Moving Targets) and discusses related work. We then present a distributed heuristic approach (which we call A-CMOMMT) for solving the CMOMMT problem that uses weighted local force vector control. We analyze the effectiveness of the resulting weighted force vector approach by comparing it to three other approaches. We present the results of our experiments in both simulation and on physical robots that demonstrate the superiority of the A-CMOMMT approach for situations in which the ratio of targets to robots is greater than 1=2. Finally, we conclude by proposing that the CMOMMT problem makes an excellent domain for studying multi-robot learning in inherently cooperative tasks. This approach is the first of its kind for solving the on-line cooperative observation problem and implementing
+it on a physical robot team. <br> - </details>
+
 <details> <summary> <a href="https://proceedings.neurips.cc/paper/2003/file/c8067ad1937f728f51288b3eb986afaa-Paper.pdf"> All learning is local: Multi-agent learning in global reward games </a>by Yu-Han Chang, Tracey Ho, Leslie Pack Kaelbling. NeurIPS, 2003. <a href="link">  </a> </summary> In large multiagent games, partial observability, coordination, and credit assignment persistently plague attempts to design good learning algorithms. We provide a simple and efficient algorithm that in part uses a linear system to model the world from a single agent’s limited perspective, and takes advantage of Kalman filtering to allow an agent to construct a good training signal and learn an effective policy <br> - </details>
 
 <details> <summary> <a href="https://faculty.cc.gatech.edu/~turk/bio_sim/articles/ant_foraging_revisited.pdf"> Ant Foraging Revisited </a>Liviu A. Panait and Sean Luke. In Proceedings of the Ninth International Conference on the Simulation and Synthesis of Living Systems (ALIFE9), 2004. <a href="link">  </a> </summary> Most previous artificial ant foraging algorithms have to date relied to some degree on a priori knowledge of the environment, in the form of explicit gradients generated by the nest, by hard-coding the nest location in an easily-discoverable place, or by imbuing the artificial ants with the knowledge of the nest direction. In contrast, the work presented solve ant foraging problems using two pheromones, one applied when searching for food and the other when returning food items to the nest. This replaces the need to use complicated nest-discovery devices with simpler mechanisms based on pheromone information, which in turn reduces the ant system complexity. The resulting algorithm is orthogonal and simple, yet ants are able to establish increasingly efficient trails from the nest to the food in the presence of obstacles. The algorithm replaces the blind addition of new amounts of pheromones with an adjustment mechanism that resembles dynamic programming. <br> - </details>
 
 <details> <summary> <a href="https://cs.gmu.edu/~eclab/projects/mason/publications/aamas-ant.pdf"> A Pheromone-Based Utility Model for Collaborative Foraging </a>L. Panait, S. Luke. AAMAS-2004 — Proceedings of the Third International Joint Conference on Autonomous Agents and Multi Agent Systems, 2004. <a href="link">  </a> </summary> Multi-agent research often borrows from biology, where remarkable examples of collective intelligence may be found. One interesting example is ant colonies’ use of pheromones as a joint communication mechanism. In this paper we propose two pheromone-based algorithms for artificial agent foraging, trail-creation, and other tasks. Whereas practically all previous work in this area has focused on biologically-plausible but ad-hoc single pheromone models, we have developed a formalism which uses multiple pheromones to guide cooperative tasks. This model bears some similarity to reinforcement learning. However, our model takes advantage of symmetries common to foraging environments which enables it to achieve much faster reward propagation than reinforcement learning does. Using this approach we demonstrate cooperative behaviors well beyond the previous ant-foraging work, including the ability to create optimal foraging paths in the presence of obstacles, to cope with dynamic environments, and to follow tours with multiple waypoints.We believe that this model may be used for more complex problems still. <br> - </details>
 
+<details> <summary> <a href="https://www.aaai.org/Papers/Symposia/Fall/2004/FS-04-02/FS04-02-010.pdf"> Multi-agent learning in conflicting multi-level games with incomplete information. </a>Maarten Peeters, Katja Verbeeck and Ann Nowe. In Proceedings of Artificial Multiagent Learning. Papers from the 2004 AAAI Fall Symposium. Technical Report FS-04-02, 2004. <a href="link">  </a> </summary> Coordination to some equilibrium point is an interesting problem in multi-agent reinforcement learning. In common interest single stage settings this problem has been studied profoundly and efficient solution techniques have been found. Also for particular multi-stage games some experiments show good results. However, for a large scale of problems the agents do not share a common pay-off function. Again, for single stage problems, a solution technique exists that finds
+a fair solution for all agents. In this paper we report on a technique that is based on learning automata theory and periodical policies. Letting pseudo-independent agents play periodical policies enables them to behave socially in pure conflicting multi-stage games as defined by E. Billard (Billard & Lakshmivarahan 1999; Zhou, Billard, & Lakshmivarahan
+1999). We experimented with this technique on games where simple learning automata have the tendency not to cooperate or to show oscillating behavior resulting in a suboptimal pay-off. Simulation results illustrate that our technique overcomes these problems and our agents find a fair solution for both agents. <br> - </details>
+
 <details> <summary> <a href="https://cs.gmu.edu/~sean/papers/gecco04-bias.pdf"> A Sensitivity Analysis of a Cooperative Coevolutionary Algorithm Biased for Optimization </a>Liviu Panait, R. Paul Wiegand, and Sean Luke. In Genetic and Evolutionary Computation Conference — GECCO-2004. Springer, 2004. <a href="link">  </a> </summary> Recent theoretical work helped explain certain optimization-related pathologies in cooperative coevolutionary algorithms (CCEAs). Such explanations have led to adopting specific and constructive strategies for improving CCEA optimization performance by biasing the algorithm toward ideal collaboration. This paper investigates how sensitivity to the degree of bias (set in advance) is affected by certain algorithmic and problem properties. We discover that the previous static biasing approach is quite sensitive to a number of problem properties, and we propose a stochastic alternative which alleviates this problem. We believe that finding appropriate biasing rates is more feasible with this new biasing technique. <br> - </details>
 
 <details> <summary> <a href="https://www.cs.cmu.edu/~sandholm/cs15-892F15/MarginalContributionEC05.pdf"> Marginal contribution nets: A compact representation scheme for coalitional games </a>by Ieong S, Shoham Y. In Proceedings of the 6th ACM Conference on Electronic Commerce, 2005. <a href="link">  </a> </summary> We present a new approach to representing coalitional games based on rules that describe the marginal contributions of the agents. This representation scheme captures characteristics of the interactions among the agents in a natural and concise manner. We also develop efficient algorithms for two of the most important solution concepts, the Shapley value and the core, under this representation. The Shapley value can be computed in time linear in the size of the input. The emptiness of the core can be determined in time exponential only in the treewidth of a graphical interpretation of our representation. <br> - </details>
@@ -317,6 +329,8 @@ We introduce a compact graph-theoretic representation for multi-party game theor
 
 <details> <summary> <a href="http://bobby.cs-i.brandeis.edu/papers/icga5.pdf"> Competitive Environments Evolve Better Solutions for Complex Tasks </a>by Peter J. Angeline, Jordan B. Pollack. ICGA, 1993. <a href="link">  </a> </summary> In the typical genetic algorithm experiment, the fitness function is constructed to be independent of the contents of the population to provide a consistent objective measure. Such objectivity entails significant knowledge about the environment which suggests either the problem has previously been solved or other non-evolutionary techniques may be more efficient. Furthermore, for many complex tasks an independent fitness function is either impractical or impossible to provide. In this paper, we demonstrate that competitive fitness functions, i.e. fitness functions that are dependent on the constituents of the population, can provide a more robust training environment than independent fitness functions. We describe three differing methods for competitive fitness, and discuss their respective advantages <br> - </details>
 
+<details> <summary> <a href="https://cs.gmu.edu/~mpotter/pubs/ppsn94.pdf"> A cooperative coevolutionary approach to function optimization </a>Mitchell A. Potter and Kenneth A. De Jong. Proceedings of the Third International Conference on Parallel Problem Solving from Nature (PPSN III), 1994. <a href="link">  </a> </summary> A general model for the coevolution of cooperating species is presented. This model is instantiated and tested in the domain of function optimization, and compared with a traditional GA-based function optimizer. The results are encouraging in two respects. They suggest ways in which the performance of GA and other EA-based optimizers can be improved, and they suggest a new approach to evolving complex structures such as neural networks and rule sets. <br> - </details>
+
 <details> <summary> <a href="https://www.econstor.eu/bitstream/10419/94705/1/wp407.pdf"> Nash Equilibrium and Evolution by Imitation </a>by Bjornerstedt J., and Weibull, J. The Rational Foundations of Economic Behavior, 1995. <a href="link">  </a> </summary> Nash's "mass action" interpretation of his equilibrium concept does not presume that the players know the game or are capable of sophisticated calculations. Instead, players are repeatedly and randomly drawn from large populations to play the game, one population for each player position, and base their strategy choice on observed payoffs. The present paper examines in some detail such an interpretation in a dass of population dynamics based on adaptation by way of imitation of successful behaviors. Drawing from results in evolutionary game theory, implications of dynamic stability for aggregate Nash equilibrium play are discussed.  <br> - </details>
 
 <details> <summary> <a href="http://web.ist.utl.pt/adriano.simoes/tese/referencias/Michalewicz%20Z.%20Genetic%20Algorithms%20+%20Data%20Structures%20=%20Evolution%20Programs%20%283ed%29.PDF"> Genetic Algorithms + Data Structures = Evolution Programs </a>Zbigniew Michalewicz. 3rd edition, 1996. <a href="link">  </a> </summary> Classic introduction to the evolution programming techniques. <br> - </details>
@@ -329,6 +343,8 @@ We introduce a compact graph-theoretic representation for multi-party game theor
 
 <details> <summary> <a href="https://www.aaai.org/Papers/Symposia/Spring/1996/SS-96-01/SS96-01-009.pdf"> Methods for Competitive and Cooperative Co-evolution </a>by John Grefenstette, Robert Daley. AAAI, 1996. <a href="link">  </a> </summary> We have been investigating evolutionary methods to design behavioral strategies for intelligent robots in multi-agent environments. Such ~nvironments resemble an ecological system in which species evolve and adapt in a complex interaction with other evolving and adapting species. This paper will report on our investigations of alternative co-evolutionary approaches in the context of a simulated multi-agent environment <br> - </details>
 
+<details> <summary> <a href="https://www.cse.unr.edu/~sushil/class/gas/papers/pollack96coevolution.pdf"> Coevolution of a Backgammon Player </a>Jordan Pollack, Alan Blair and Mark Land.  Artificial Life V: Proc. of the Fifth Int. Workshop on the Synthesis and Simulation of Living Systems, 1997. <a href="link">  </a> </summary> One of the persistent themes in Artificial Life research is the use of co-evolutionary arms races in the development of specific and complex behaviors. However, other than Sims’s work on artificial robots, most of the work has attacked very simple games of prisoners dilemma or predator and prey. Following Tesauro’s work on TD-Gammon, we used a 4000 parameter feed-forward neural network to develop a competitive backgammon evaluation function. Play proceeds by a roll of the dice, application of the network to all legal moves, and choosing the move with the highest evaluation. However, no back-propagation, reinforcement or temporal difference learning methods were employed. Instead we apply simple hillclimbing in a relative fitness environment. We start with an initial champion of all zero weights and proceed simply by playing the current champion network against a slightly mutated challenger, changing weights when the challenger wins. Our results show co-evolution to be a powerful machine learning method, even when coupled with simple hillclimbing, and suggest that the surprising success of Tesauro’s program had more to do with the co-evolutionary structure of the learning task and the dynamics of the backgammon game itself, than to sophistication in the learning techniques. <br> - </details>
+
 <details> <summary> <a href="https://www.researchgate.net/profile/Thomas-Haynes/publication/2751435_Co--adaptation_in_a_Team/links/573b730208ae298602e45732/Co--adaptation-in-a-Team.pdf"> Co-adaptation in a Team </a>by Thomas D. Haynes, Sandip Sen. IJCIO, 1997. <a href="link">  </a> </summary> We introduce a cooperative co-evolutionary system to facilitate the development of teams of heterogeneous agents. We believe that k different behavioral strategies for controlling the actions of a group of k agents can combine to form a cooperation strategy which efficiently achieves global goals. We both examine the on-line adaption of behavioral strategies utilizing genetic programming and demonstrate the successful co-evolution of cooperative individuals. We present a new crossover mechanism for genetic programming systems in order to facilitate the evolution of more than one member in the team during each crossover operation. Our goal is to reduce the time needed to evolve an effective team. <br> - </details>
 
 <details> <summary> <a href="https://www.econstor.eu/bitstream/10419/94851/1/wp487.pdf"> What have we learned from Evolutionary Game Theory so far? </a>by Weibull, Jörgen W. IFN, 1997. <a href="link">  </a> </summary> Evolutionary theorizing has a long tradition in economics. Only recently has this approach been brought into the framework of noncooperative game theory. Evolutionary game theory studies the robustness of strategic behavior with respect to evolutionary forces in the context of games played many times in large populations of boundedly rational agents. This new strand in economic theory has lead to new predictions and opened up doors to other social sciences. The discussion will be focused on the following questions: What distinguishes the evolutionary approach from the rationalistic? What are the most important ndings in evolutionary game theory so far? What are the next challenges for evolutionary game theory in economics? <br> - </details>
@@ -337,6 +353,8 @@ We introduce a compact graph-theoretic representation for multi-party game theor
 
 <details> <summary> <a href="https://web.unbc.ca/~russellt/swarm/hitoshiiba1.pdf"> Evolutionary learning of communicating agents </a>by Hitoshi Iba. Journal of Information Sciences, 1998. <a href="link">  </a> </summary> This paper presents the emergence of the cooperative behavior for communicating agents by means of Genetic Programming (GP). Our experimental domains are the pursuit game and the robot navigation task. We conduct experiments with the evolution of the communicating agents and show the effectiveness of the emergent communication in terms of the robustness of generated GP programs. The performance of GP-based multi-agent learning is discussed with comparative experiments by using different breeding strategies, i.e., homogenous breeding and heterogeneous breeding <br> - </details>
 
+<details> <summary> <a href="http://www.demo.cs.brandeis.edu/papers/bkg_ml.pdf"> Co-Evolution in the Successful Learning of Backgammon Strategy </a>Jordan Pollack and Alan Blair. Machine Learning, 1998. <a href="link">  </a> </summary> Following Tesauro's work on TD-Gammon, we used a 4,000 parameter feedforward neural network to develop a competitive backgammon evaluation function. Play proceeds by a roll of the dice, application of the network to all legal moves, and selection of the position with the highest evaluation. However, no backpropagation, reinforcement or temporal difference learning methods were employed. Instead we apply simple hillclimbing in a relative fitness environment. We start with an initial champion of all zero weights and proceed simply by playing the current champion network against a slightly mutated challenger and changing weights if the challenger wins. Surprisingly, this worked rather well. We investigate how the peculiar dynamics of this domain enabled a previously discarded weak method to succeed, by preventing suboptimal equilibria in a “meta-game” of self-learning. <br> - </details>
+
 <details> <summary> <a href="http://www.fulviofrisone.com/attachments/article/412/Hofbauer%20Evolutionary%20Games%20and%20Population%20Dynamics.pdf"> Evolutionary Games and Population Dynamics </a>by Hofbauer, J., Sigmund, K. Cambridge University Press, 1998. <a href="link">  </a> </summary> Every form of behaviour is shaped by trial and error. Such stepwise adaptation can occur through individual learning or through natural selection, the basis of evolution. Since the work of Maynard Smith and others, it has been realised how game theory can model this process. Evolutionary game theory replaces the static solutions of classical game theory by a dynamical approach centred not on the concept of rational players but on the population dynamics of behavioural programmes. In this book the authors investigate the nonlinear dynamics of the self-regulation of social and economic behaviour, and of the closely related interactions between species in ecological communities. Replicator equations describe how successful strategies spread and thereby create new conditions which can alter the basis of their success, i.e. to enable us to understand the strategic and genetic foundations of the endless chronicle of invasions and extinctions which punctuate evolution. In short, evolutionary game theory describes when to escalate a conflict, how to elicit cooperation, why to expect a balance of the sexes, and how to understand natural selection in mathematical terms. <br> - </details>
 
 <details> <summary> <a href="http://gpbib.cs.ucl.ac.uk/cache/cache/.hidden_13-jun_2087063365/http___www.cs.ucl.ac.uk_staff_W.Langdon_aigp3_ch19.pdf"> Evolving Multiple Agents by Genetic Programming </a>by Hitoshi Iba. MIT press, 1999. <a href="link">  </a> </summary> On the emergence of the cooperative behaviour for multiple agents by means of Genetic Programming (GP). Our experimental domains are multi-agent test beds, i.e., the robot navigation task and the Tile World. The world consists of a simulated robot agent and a simulated environment which is both dynamic and unpredictable. In our previous paper, we proposed three types of strategies, i.e, homogeneous breeding, heterogeneous breeding, and co-evolutionary breeding, for the purpose of evolving the cooperative behavior. We use the heterogeneous breeding in this paper. The previous Q-learning approach commonly used for the multi-agent task has the difficulty with the combinatorial explosion for many agents. This is because the state space for Q-table is so huge for the practical computer resources. We show how successfully GP-based multi-agent learning is applied to multi-agent tasks and compare the performance with Q-learning by experiments. Thereafter, we conduct experiments with the evolution of the communicating agents. The communication is an essential factor for the emergence of cooperation. This is because a collaborative agent must be able to handle situations in which conflicts arise and must be capable of negotiating with other agents to reach an agreement. The effectiveness of the emergent communication is empirically shown in terms of the robustness of generated GP programs. <br> - </details>
@@ -345,6 +363,8 @@ We introduce a compact graph-theoretic representation for multi-party game theor
 
 <details> <summary> <a href="https://www.researchgate.net/profile/Kam-Chuen-Jim/publication/12106374_Talking_Helps_Evolving_Communicating_Agents_for_the_Predator-Prey_Pursuit_Problem/links/564cadcf08aedda4c13435ff/Talking-Helps-Evolving-Communicating-Agents-for-the-Predator-Prey-Pursuit-Problem.pdf"> Talking Helps: Evolving Communicating Agents for the Predator-Prey Pursuit Problem </a>by Kam-Chuen Jim, C. Lee Giles. Artifical life, 2000. <a href="link">  </a> </summary> We analyze a general model of multi-agent communication in which all agents communicate simultaneously to a message board. A genetic algorithm is used to evolve multi-agent languages for the predator agents in a version of the predator-prey pursuit problem. We show that the resulting behavior of the communicating multi-agent system is equivalent to that of a Mealy finite state machine whose states are determined by the agents’ usage of the evolved language. Simulations show that the evolution of a communication language improves the performance of the predators. Increasing the language size (and thus increasing the number of possible states in the Mealy machine) improves the performance even further. Furthermore, the evolved communicating predators perform significantly better than all previous work on similar preys. We introduce a method for incrementally increasing the language size which results in an effective coarse-to-fine search that significantly reduces the evolution time required to find a solution. We present some observations on the effects of language size, experimental setup, and prey difficulty on the evolved Mealy machines. In particular, we observe that the start state is often revisited, and incrementally increasing the language size results in smaller Mealy machines. Finally, a simple rule is derived that provides a pessimistic estimate on the minimum language size that should be used for any multi-agent problem. <br> - </details>
 
+<details> <summary> <a href="https://cs.gmu.edu/~mpotter/pubs/ecj00.pdf"> An architecture for evolving coadapted subcomponents </a>M. A. Potter and K. D. Jong. Evolutionary Computation, 2000. <a href="link">  </a> </summary> To successfully apply evolutionary algorithms to the solution of increasingly complex problems, we must develop effective techniques for evolving solutions in the form of interacting coadapted subcomponents. One of the major difficulties is finding computational extensions to our current evolutionary paradigms that will enable such subcomponents to "emerge" rather than being hand designed. In this paper, we describe an architecture for evolving such subcomponents as a collection of cooperating species. Given a simple string-matching task, we show that evolutionary pressure to increase the overall fitness of the ecosystem can provide the needed stimulus for the emergence of an appropriate number of interdependent subcomponents that cover multiple niches, evolve to an appropriate level of generality, and adapt as the number and roles of their fellow subcomponents change over time. We then explore these issues within the context of a more complicated domain through a case study involving the evolution of artificial neural networks. <br> - </details>
+
 <details> <summary> <a href="https://link.springer.com/content/pdf/10.1007/3-540-45356-3.pdf"> A Game-Theoretic Approach to the Simple Coevolutionary Algorithm </a>by Sevan G. Ficici, Jordan B. Pollack. LNCS, 2000. <a href="link">  </a> </summary> The fundamental distinction between ordinary evolutionary algorithms (EA) and co-evolutionary algorithms lies in the interaction between coevolving entities. We believe that this property is essentially game-theoretic in nature. Using game theory, we describe extensions that allow familiar mixing-matrix and Markov-chain models of EAs to address coevolutionary algorithm dynamics. We then employ concepts from evolutionary game theory to examine design aspects of conventional coevolutionary algorithms that are poorly understood. <br> - </details>
 
 <details> <summary> <a href="https://watermark.silverchair.com/282794.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAAsYwggLCBgkqhkiG9w0BBwagggKzMIICrwIBADCCAqgGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMBRcVXHaCrhh9DPsJAgEQgIICecvIjI-AyHPqDEtoqxG45etk8Kr3qZnjlgYcF9HmV5Zgk3mbn6FJ-OzHZWLMlN-KdRwXGkIjDVCEOB8A-qFzdZz484BYJeQqM-d1usHshJkVd8nuz2mGVsBsS1PT9YPlw68yuUWF1UzLK41IyXZmZk4Pu58zMpQSken2DXuBZzC5R0btcbkHVhC_PuRR2Empi2m-xjW8dxWV5R3sbNiDnQQFLjm7BZZXTEE1qV1GvUCW0DdIDhcWJpa-LP6NZ1G0Evo4nlcVC4h1ZNzgizSe5oKTU3hJjeltMD35akait35q5EiPh6xMFqhykSJAAu-krMio05VNfHqAZ-n9vrW5g2W6z38eSBMCPmOx4VaLh9-vCyxHTIosjdbrsKISJ1F2t1AZitrUPniIOLr07aPZBVQHT1lxegrDKEQ-CzvB5Zg7cyVgyBRHzYKswHkCNij_3teAAB7Wi8DSmJWVKOovllkEuXayDtwEy8cktVZsgCGunE9Mz1wZKB0Pj6UGiccWtbGTS63nTNBgFh4ZntPAChV6x3olsXbzqcn0snIVvDtUzS_ziOyIGSD1WrHcZCQUFcWiZXPg059UQHKxunkIJCoZTqEhq5Aicg0MBOG7RPfBMIMv4GZtEPtNRuo3Ax1kFbdmlN2_FwnbSAtMYnh5an1Z3SjVeWkKCko2FBP6yigjm2vOIJBs0Q1s3sCNAFIoa7fSIH2jgmykzWsFQdRe-0ke7GHPs0UhzzDIYogT569IVE7FUeyY3lAHMFTcH3k8XlSCtnCu84J3nXoghqfHArjcDNZnTEg9mxu5W17_tnJzHhuU-viP7hzQu0gFaej7yaaCLRKkClTkuQ"> Evolution of biological information </a>by Thomas D. Schneider. Nucleic Acids Research, 2000. <a href="link">  </a> </summary> How do genetic systems gain information by evolutionary processes? Answering this question precisely requires a robust, quantitative measure of information. Fortunately, 50 years ago Claude Shannon defined information as a decrease in the uncertainty of a receiver. For molecular systems, uncertainty is closely related to entropy and hence has clear connections to the Second Law of Thermodynamics. These aspects of information theory have allowed the development of a straightforward and practical method of measuring information in genetic control systems. Here this method is used to observe information gain in the binding sites for an artificial ‘protein’ in a computer simulation of evolution. The simulation begins with zero information and, as in naturally occurring genetic systems, the information measured in the fully evolved binding sites is close to that needed to locate the sites in the genome. The transition is rapid, demonstrating that information gain can occur by punctuated equilibrium. <br> - </details>
@@ -374,6 +394,8 @@ to guide this evolution. We propose a simple algorithm that does just that, and
 
 <details> <summary> <a href="http://www.tesseract.org/paul/papers/Panait2004ppsn-final.pdf"> A Visual Demonstration of Convergence Properties of Cooperative Coevolution </a>by Liviu Panait, R. Paul Wiegand, Sean Luke. PPSN, 2004. <a href="link">  </a> </summary> We introduce a model for cooperative coevolutionary algorithms (CCEAs) using partial mixing, which allows us to compute the expected long-run convergence of such algorithms when individuals’ fitness is based on the maximum payoff of some N evaluations with partners chosen at random from the other population. Using this model, we devise novel visualization mechanisms to attempt to qualitatively explain a difficult-to-conceptualize pathology in CCEAs: the tendency for them to converge to suboptimal Nash equilibria. We further demonstrate visually how increasing the size of N, or biasing the fitness to include an ideal-collaboration factor, both improve the likelihood of optimal convergence, and under which initial population configurations they are not much help. <br> - </details>
 
+<details> <summary> <a href="https://www.aaai.org/Papers/Symposia/Fall/2004/FS-04-02/FS04-02-016.pdf"> Understanding competitive co-evolutionary dynamics via fitness landscapes </a>Elena Popovici and Kenneth De Jong. Part of the 2004 AAAI Fall Symposium on Artificial Intelligence, 2004. <a href="link">  </a> </summary> Cooperative co-evolution is often used to solve difficult optimization problems by means of problem decomposition. Its performance for such tasks can vary widely from good to disappointing. One of the reasons for this is that attempts to improve co-evolutionary performance using traditional EC analysis techniques often fail to provide the necessary insights into the dynamics of co-evolutionary systems, a key factor affecting performance. In this paper we use two simple fitness landscapes to illustrate the importance of taking a dynamical systems approach to analyzing co-evolutionary algorithms in order to understand them better and to improve their problem solving performance. <br> - </details>
+
 <details> <summary> <a href="https://link.springer.com/content/pdf/10.1007/978-3-540-30115-8_18.pdf"> Analyzing Multi-agent Reinforcement Learning Using Evolutionary Dynamics </a>by ’t Hoen, P.J., Tuyls, K. ECML, 2004. <a href="link">  </a> </summary> In this paper, we show how the dynamics of Q-learning can be visualized and analyzed from a perspective of Evolutionary Dynamics (ED). More specifically, we show how ED can be used as a model for Qlearning in stochastic games. Analysis of the evolutionary stable strategies and attractors of the derived ED from the Reinforcement Learning (RL) application then predict the desired parameters for RL in MultiAgent Systems (MASs) to achieve Nash equilibriums with high utility. Secondly, we show how the derived fine tuning of parameter settings from the ED can support application of the COllective INtelligence (COIN) framework. COIN is a proved engineering approach for learning of cooperative tasks in MASs. We show that the derived link between ED and RL predicts performance of the COIN framework and visualizes the incentives provided in COIN toward cooperative behavior. <br> - </details>
 
 <details> <summary> <a href="https://link.springer.com/content/pdf/10.1007/978-1-4419-8909-3.pdf"> Selection in Coevolutionary Algorithms and the Inverse Problem </a>by Sevan Ficici, Ofer Melnik, Jordan Pollack. Springer, 2004. <a href="link">  </a> </summary> The inverse problem in the collective intelligence framework concerns how the private utility functions of agents can be engineered so that their selfish behaviors collectively give rise to a desired world state. In this chapter we examine several selection and fitnesssharing methods used in coevolution and consider their operation with respect to the inverse problem. The methods we test are truncation and linear-rank selection and competitive and similarity-based fitness sharing. Using evolutionary game theory to establish the desired world state, our analyses show that variable-sum games with polymorphic Nash are problematic for these methods. Rather than converge to polymorphic Nash, the methods we test produce cyclic behavior, chaos, or attractors that lack game-theoretic justification and therefore fail to solve the inverse problem. The private utilities of the evolving agents may thus be viewed as poorly factored—improved private utility does not correspond to improved world utility. <br> - </details>
@@ -601,6 +623,8 @@ using Extended Optimal Response </a>by Nobuo Suematsu, Akira Hayashi. AAMAS, 200
 <details> <summary> <a href="https://dspace.mit.edu/handle/1721.1/12012"> Interaction and intelligent behavior </a>Mataric, Maja J. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994. <a href="link">  </a> </summary> This thesis addresses situated, embodied agents interacting in complex domains. It focuses on two problems: I) synthesis and analysis of intelligent group behavior, and 2) learning in complex group environments. Behaviors are proposed as the appropriate level for control and learning. Basic behaviors are introduced as building blocks for synthesizing and analyzing system behavior. The thesis describes the process of selecting such basic behaviors, formally specifying them, algorithmically implementing them, and empirically evaluating them. All of the proposed ideas are validated with a group of up to 20 mobile robots owing a basic behavior set censisting of: avoidance, fallowing, aggregation, dispersion, and homing. The set of basic behaviors acts as a substrate for achieving more complex high-level goals and tasks. Two behavior combination operators are introduced, and verified by combining subsets of the above basic behavior set to implement collective flocking and foraging. A methodology is introduced for automatically constructing higher-level behaviors by learning to select among the basic behavior set. A novel formulation of reinforcement kerning in proposed that makes behavior selection learnable in noisy, uncertain multi-agent environments with stochastic dynamics. It consists of using conditions and behaviors for more robust control and minimised state-space, and reinforcement shaping methodology that enables principled embedding of domain knowledge with two types of shaping functions: heterogeneous reward functions and progress estimators. The methodology outperforms two alternatives when tested on a collection of robots learning to forage. The proposed formulation enables and accelerates leaning in complex multi-robot domains. The generality of the approach makes it compatible with the existing reinforcement learning algorithms, allowing it to accelerate learning in a variety of domains and applications. The presented methodologies end results are aimed at extending our understanding of synthesis, analysis, and learning of group behavior. 
  <br> - </details>
 
+<details> <summary> <a href="https://cs.gmu.edu/~mpotter/pubs/thesis2.pdf"> The Design and Analysis of a Computational Model of Cooperative Coevolution. </a>Mitchell A. Potter. PhD thesis, 1997.  <a href="link">  </a> </summary> As evolutionary algorithms are applied to the solution of increasingly complex systems, explicit notions of modularity must be introduced to provide reasonable opportunities for solutions to evolve in the form of interacting coadapted subcomponents. The difficulty comes in finding computational extensions to our current evolutionary paradigms in which such subcomponents “emerge” rather than being hand designed. At issue is how to identify and represent such subcomponents, provide an environment in which they can interact and coadapt, and apportion credit to them for their contributions to the problem-solving activity such that their evolution proceeds without human involvement. We begin by describing a computational model of cooperative coevolution that includes the explicit notion of modularity needed to provide reasonable opportunities for solutions to evolve in the form of interacting coadapted subcomponents. In this novel approach, subcomponents are represented as genetically isolated species and evolved in parallel. Individuals from each species temporarily enter into collaborations with members of the other species and are rewarded based on the success of the collaborations in solving objective functions. Next, we perform a sensitivity analysis on a number of characteristics of decomposable problems likely to have an impact on the effectiveness of the coevolutionary model. Through focused experimentation using tunable test problems chosen specifically to measure the effect of these characteristics, we provide insight into their influence and how any exposed difficulties may be overcome. This is followed by a study of the basic problem-decomposition capability of the model. We show, within the context of a relatively simple environment, that evolutionary pressure can provide the needed stimulus for the emergence of an appropriate number of subcomponents that cover multiple niches, are evolved to an appropriate level of generality, and can adapt to a changing environment. We also perform two case studies in emergent decomposition on complex problems from the domains of artificial neural networks and concept learning. These case studies validate the ability of the model to handle problems only decomposable into subtasks with complex and difficult to understand interdependencies. <br> - </details>
+
 <details> <summary> <a href="http://reports-archive.adm.cs.cmu.edu/anon/1998/CMU-CS-98-187.pdf"> Layered Learning in Multi-Agent Systems </a>by Peter Stone. PhD thesis, 1998. <a href="link">  </a> </summary> Multi-agent systems in complex, real-time domains require agents to act effectively both autonomously and as part of a team. This dissertation addresses multi-agent systems consisting of teams of autonomous agents acting in real-time, noisy, collaborative, and adversarial environments. Because of the inherent complexity of this type of multi-agent system, this thesis investigates the use of machine learning within multi-agent systems. The dissertation makes four main  contributions to the fields of Machine Learning and Multi-Agent Systems. First, the thesis defines a team member agent architecture within which a exible teamstructure is presented, allowing agents to decompose the task space into exible roles and allowing them to smoothly switch roles while acting. Team organization is achieved by the introduction of a locker-room agreement as a collection of conventions followed by all team members. It defines agent roles, team formations, and pre-compiled multi-agent plans. In addition, the team member agent architecture includes a communication paradigm for domains with single-channel, low-bandwidth, unreliable communication. The communication paradigm facilitates team coordination while being robust to lost messages and active interference from opponents. Second, the thesis introduces layered learning, a general-purpose machine learning paradigm for complex domains in which learning a mapping directly from agents' sensors to their actuators is intractable. Given a hierarchical task decomposition, layered learning allows for learning at each level of the hierarchy, with learning at each level directly affecting learning at the next higher level. Third, the thesis introduces a new multi-agent reinforcement learning algorithm, namely team-partitioned, opaque-transition reinforcement learning (TPOT-RL). TPOT-RL is designed for domains in which agents cannot necessarily observe the state changes when other team members act. It exploits local, action-dependent features to aggressively generalize its input representation for learning and partitions the task among the agents, allowing them to simultaneously learn collaborative policies by observing the long-term effects of their actions. Fourth, the thesis contributes a fully functioning multi-agent system that incorporates learning in a real-time, noisy domain with teammates and adversaries. Detailed algorithmic descriptions of the agents' behaviors as well as their source code are included in the thesis. Empirical results validate all four contributions within the simulated robotic soccer domain. The generality of the contributions is verified by applying them to the real robotic soccer, and network routing domains. Ultimately, this dissertation demonstrates that by learning portions of their cognitive processes, selectively communicating, and coordinating their behaviors via common knowledge, a group of independent agents can work towards a common goal in a complex, real-time, noisy, collaborative, and adversarial environment. <br> - </details>
 
 <details> <summary> <a href="https://apps.dtic.mil/sti/pdfs/ADA461188.pdf"> Multiagent Learning in the Presence of Agents with Limitations </a>by Michael Bowling. Thesis, 2003. <a href="link">  </a> </summary> Learning to act in a multiagent environment is a challenging problem. Optimal behavior for one agent depends upon the behavior of the other agents, which are learning as well. Multiagent environments are therefore non-stationary, violating the traditional assumption underlying single-agent learning. In addition, agents in complex tasks may have limitations, such as physical constraints or designer-imposed approximations of the task that make learning tractable. Limitations prevent agents from acting optimally, which complicates the already challenging problem. A learning agent must effectively compensate for its own limitations while exploiting the limitations of the other agents. My thesis research focuses on these two challenges, namely multiagent learning and limitations, and includes four main contributions. First, the thesis introduces the novel concepts of a variable learning rate and the WoLF (Win or Learn Fast) principle to account for other learning agents. The WoLF principle is capable of making rational learning algorithms converge to optimal policies, and by doing so achieves two properties, rationality and convergence, which had not been achieved by previous techniques. The converging effect of WoLF is proven for a class of matrix games, and demonstrated empirically for a wide-range of stochastic games. Second, the thesis contributes an analysis of the effect of limitations on the game-theoretic concept of Nash equilibria. The existence of equilibria is important if multiagent learning techniques, which often depend on the concept, are to be applied to realistic problems where limitations are unavoidable. The thesis introduces a general model for the effect of limitations on agent behavior, which is used to analyze the resulting impact on equilibria. The thesis shows that equilibria do exist for a few restricted classes of games and limitations, but even well-behaved limitations do not preserve the existence of equilibria, in general. Third, the thesis introduces GraWoLF, a general-purpose, scalable, multiagent learning algorithm. GraWoLF combines policy gradient learning techniques with the WoLF variable learning rate. The effectiveness of the learning algorithm is demonstrated in both a card game with an intractably large state space, and an adversarial robot task. These two tasks are complex and agent limitations are prevalent in both. Fourth, the thesis describes the CMDragons robot soccer team strategy for adapting to an unknown opponent. The strategy uses a notion of plays as coordinated team plans. The selection of team plans is the decision point for adapting the team to its current opponent, based on the outcome of previously executed plays. The CMDragons were the first RoboCup robot team to employ online learning to autonomously alter its behavior during the course of a game. These four contributions demonstrate that it is possible to effectively learn to act in the presence of other learning agents in complex domains when agents may have limitations. The introduced learning techniques are proven effective in a class of small games, and demonstrated empirically across a wide range of settings that increase in complexity <br> - </details>
@@ -655,7 +679,7 @@ This thesis builds a formal framework and approximate planning algorithms that e
 
 The papers below were found to be difficult to categorise and therefore are presented under the general "catch-all" category *miscellaneous theory and approaches*.
 
-<details> <summary> <a href="https://members.loria.fr/OBuffet/papiers/aamas02-2pages.pdf"> Learning to Weigh Basic Behaviors in Scalable Agents </a>by Olivier Buffet, Alain Dutech Francois, Charpillet. AAMAS, 2002. <a href="link">  </a> </summary> We are working on the use of Reinforcement Learning [RL](3) algorithms to design automatically reactive situated agents limited to only local perceptions. Unfortunately, as good RL algorithms suffer from combinatorial explosion, their use is generally limited to simple problems. As shown on the tile-world example of figure 1, we propose to overcome these difficulties by making the hypothesis, as in Brook’s subsumption architecture [1], that a complex problem can be efficiently dealt with if considered as a combination of simple problems. This short presentation gives basic ideas on RL algorithms (section 2). Then the three steps of our method are presented: how basic behaviors are learned for each basic motivation (sec. 3), how the scene is decomposed in key figures to find the basic behaviors currently involved (sec. 4), and how to combine them into a complex global behavior using learned weights (sec. 5). A few words are given on the experiments conducted on the tile-world problem (sec. 6) and precede a conclusion <br> - </details>
+<details> <summary> <a href="https://members.loria.fr/OBuffet/papiers/aamas02-2pages.pdf"> Learning to Weigh Basic Behaviors in Scalable Agents </a>Olivier Buffet, Alain Dutech Francois, Charpillet. AAMAS, 2002. <a href="link">  </a> </summary> We are working on the use of Reinforcement Learning [RL](3) algorithms to design automatically reactive situated agents limited to only local perceptions. Unfortunately, as good RL algorithms suffer from combinatorial explosion, their use is generally limited to simple problems. As shown on the tile-world example of figure 1, we propose to overcome these difficulties by making the hypothesis, as in Brook’s subsumption architecture [1], that a complex problem can be efficiently dealt with if considered as a combination of simple problems. This short presentation gives basic ideas on RL algorithms (section 2). Then the three steps of our method are presented: how basic behaviors are learned for each basic motivation (sec. 3), how the scene is decomposed in key figures to find the basic behaviors currently involved (sec. 4), and how to combine them into a complex global behavior using learned weights (sec. 5). A few words are given on the experiments conducted on the tile-world problem (sec. 6) and precede a conclusion <br> - </details>
 
 <br/>
 
@@ -667,3 +691,14 @@ The papers below were found to be difficult to categorise and therefore are pres
 
 
 
+
+
+
+
+
+
+
+
+
+
+

From b36e069037536d1af8bffed63294774001564bed Mon Sep 17 00:00:00 2001
From: Dries <dries.epos@gmail.com>
Date: Fri, 20 Jan 2023 08:53:43 +0200
Subject: [PATCH 6/6] feat: Update conference names.

---
 Research Papers/Shallow learning/README.md | 26 +++++++++++-----------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/Research Papers/Shallow learning/README.md b/Research Papers/Shallow learning/README.md
index 1ae1060..2b0585d 100644
--- a/Research Papers/Shallow learning/README.md	
+++ b/Research Papers/Shallow learning/README.md	
@@ -144,7 +144,7 @@ Using Utility Graphs </a>by Valentin Robu, D.J.A. Somefun, J.A. La Poutre. AAMAS
 
 <details> <summary> <a href="https://langev.com/pdf/quinn01evolvingCommunication.pdf"> Evolving Communication without Dedicated Communication Channels </a>by Matt Quinn. ECAL, 2001. <a href="link">  </a> </summary> Artificial Life models have consistently implemented communication as an exchange of signals over dedicated and functionally isolated channels. I argue that such a feature prevents models from providing a satisfactory account of the origins of communication and present a model in which there are no dedicated channels. Agents controlled by neural networks and equipped with proximity sensors and wheels are presented with a co-ordinated movement task. It is observed that functional, but non-communicative, behaviours which evolve in the early stages of the simulation both make possible, and form the basis of, the communicative behaviour which subsequently evolves. <br> - </details>
 
-<details> <summary> <a href="https://www.researchgate.net/profile/John-Tsitsiklis/publication/228341859_Communication_requirements_of_VCG-like_mechanisms_in_convex_environments/links/543369c00cf225bddcc9a77f/Communication-requirements-of-VCG-like-mechanisms-in-convex-environments.pdf"> Communication requirements of VCG-like mechanisms in convex environments </a>by Johari R, Tsitsiklis JN. In Proceedings of Allerton Conference, 2005. <a href="link">  </a> </summary> We develop VCG-like mechanisms for resource allocation environments where the players have concave utility functions, and the resource constraints can be represented through convex inequality constraints; multicommodity flow problems are a prime example. Unlike VCG mechanisms that require each player to communicate an entire utility function, our mechanisms only require each player to communicate a single scalar quantity. Despite the limited communication, we establish the existence of an efficient Nash equilibrium. Under some further assumptions, we also establish that all Nash equilibria are efficient. Our approach defines an entire family of resource allocation mechanisms; as a special case, we recover the class of mechanisms recently introduced by Yang and Hajek for a single resource. <br> - </details>
+<details> <summary> <a href="https://www.researchgate.net/profile/John-Tsitsiklis/publication/228341859_Communication_requirements_of_VCG-like_mechanisms_in_convex_environments/links/543369c00cf225bddcc9a77f/Communication-requirements-of-VCG-like-mechanisms-in-convex-environments.pdf"> Communication requirements of VCG-like mechanisms in convex environments </a>by Johari R, Tsitsiklis JN. Allerton Conference, 2005. <a href="link">  </a> </summary> We develop VCG-like mechanisms for resource allocation environments where the players have concave utility functions, and the resource constraints can be represented through convex inequality constraints; multicommodity flow problems are a prime example. Unlike VCG mechanisms that require each player to communicate an entire utility function, our mechanisms only require each player to communicate a single scalar quantity. Despite the limited communication, we establish the existence of an efficient Nash equilibrium. Under some further assumptions, we also establish that all Nash equilibria are efficient. Our approach defines an entire family of resource allocation mechanisms; as a special case, we recover the class of mechanisms recently introduced by Yang and Hajek for a single resource. <br> - </details>
 
 <br/>
 
@@ -196,7 +196,7 @@ Assumptions underlying the convergence proofs of reinforcement learning (RL) alg
 <details> <summary> <a href="https://ww2.odu.edu/~jsokolow/projects/files/Best-Response%20Multiagent%20Learning%20in%20Non-Stationary%20Environments.pdf"> Best-Response Multiagent Learning in Non-Stationary Environments </a>by Michael Weinberg, Jeffrey S. Rosenschein. AAMAS, 2004. <a href="link">  </a> </summary>
 This paper investigates a relatively new direction in Multiagent Reinforcement Learning. Most multiagent learning techniques focus on Nash equilibria as elements of both the learning algorithm and its evaluation criteria. In contrast, we propose a multiagent learning algorithm that is optimal in the sense of finding a best-response policy, rather than in reaching an equilibrium. We present the first learning algorithm that is provably optimal against restricted classes of non-stationary opponents. The algorithm infers an accurate model of the opponent’s non-stationary strategy, and simultaneously creates a best-response policy against that strategy. Our learning algorithm works within the very general framework of N-player, general-sum stochastic games, and learns both the game structure and its associated optimal policy <br> - </details>
 
-<details> <summary> <a href="https://www.aaai.org/Papers/AAAI/2006/AAAI06-108.pdf"> A polynomial-time algorithm for Action-Graph Games </a>by Jiang AX, Leyton-Brown K. In PROCEEDINGS OF THE NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2006. <a href="link">  </a> </summary> Action-Graph Games (AGGs) (Bhat & Leyton-Brown 2004) are a fully expressive game representation which can compactly express strict and context-specific independence and anonymity structure in players’ utility functions. We present an efficient algorithm for computing expected payoffs under mixed strategy profiles. This algorithm runs in time polynomial in the size of the AGG representation (which is itself polynomial in the number of players when the in-degree of the action graph is bounded). We also present an extension to the AGG representation which allows us to compactly represent a wider variety of structured utility functions. <br> - </details>
+<details> <summary> <a href="https://www.aaai.org/Papers/AAAI/2006/AAAI06-108.pdf"> A polynomial-time algorithm for Action-Graph Games </a>by Jiang AX, Leyton-Brown K. THE NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2006. <a href="link">  </a> </summary> Action-Graph Games (AGGs) (Bhat & Leyton-Brown 2004) are a fully expressive game representation which can compactly express strict and context-specific independence and anonymity structure in players’ utility functions. We present an efficient algorithm for computing expected payoffs under mixed strategy profiles. This algorithm runs in time polynomial in the size of the AGG representation (which is itself polynomial in the number of players when the in-degree of the action graph is bounded). We also present an extension to the AGG representation which allows us to compactly represent a wider variety of structured utility functions. <br> - </details>
 
 <details> <summary> <a href="https://link.springer.com/content/pdf/10.1007/s10994-006-0143-1.pdf"> AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents </a>by Vincent Conitzer, Tuomas Sandholm. Machine Learning, 2007. <a href="link">  </a> </summary> Two minimal requirements for a satisfactory multiagent learning algorithm are that it 1. learns to play optimally against stationary opponents and 2. converges to a Nash equilibrium in self-play. The previous algorithm that has come closest, WoLF-IGA, has been proven to have these two properties in 2-player 2-action (repeated) games—assuming that the opponent’s mixed strategy is observable. Another algorithm, ReDVaLeR (which was introduced after the algorithm described in this paper), achieves the two properties in games with arbitrary numbers of actions and players, but still requires that the opponents’ mixed strategies are observable. In this paper we present AWESOME, the first algorithm that is guaranteed to have the two properties in games with arbitrary numbers of actions and players. It is still the only algorithm that does so while only relying on observing the other players’ actual actions (not their mixed strategies). It also learns to play optimally against opponents that eventually become stationary. The basic idea behind AWESOME (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium) is to try to adapt to the others’ strategies when they appear stationary, but otherwise to retreat to a precomputed equilibrium strategy. We provide experimental results that suggest that AWESOME converges fast in practice. The techniques used to prove the properties of AWESOME are fundamentally different from those used for previous algorithms, and may help in analyzing future multiagent learning algorithms as well. <br> - </details>
 
@@ -249,7 +249,7 @@ We introduce a compact graph-theoretic representation for multi-party game theor
 
 <details> <summary> <a href="https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=839146&casa_token=Y-3elWT3fZUAAAAA:YAI-tJ6dnLkc6QZ7rGlVy0sOeKOEw3PRxqCdi2igfTJWnFcMbn6Zl5Y9WN-Iq9RlcNyP__7dPKsQJg&tag=1"> Advantages of Cooperation Between Reinforcement Learning Agents in Difficult Stochastic Problems  </a>by Hamid R. Berenji, David Vengerov. Fuzzy Systems, 2000. <a href="link">  </a> </summary> This paper presents the first results in understanding the reasons for cooperative advantage betwccn reinforcement learning agents. We consider a cooporation method which consists of using and updating a common policy. We tested this method on a complex fumy reinforcement learning problem and found that cooperation brings larger than expected benefits. More precisely, we found that K cooperative agents oach learning for N time steps outperform K independent agents oach learning in a separate world for K*N time steps. In this paper we explain the observed phenomenon and determine the necessary conditions for its presence in a wide class of reinforccment learning problems. <br> - </details>
 
-<details> <summary> <a href="http://web.eecs.utk.edu/~leparker/publications/DARS_2000_learning.pdf"> Multi-robot learning in a cooperative observation task </a>Lynne Parker and Claude F. Touzet.  In Proceedings of Fifth International Symposium on Distributed Autonomous Robotic Systems (DARS 2000), 2000. <a href="link">  </a> </summary> An important need in multi-robot systems is the development of mechanisms that enable robot teams to autonomously generate cooperative behaviors. This paper first briefly presents the Cooperative Multi-robot Observation of Multiple Moving Targets (CMOMMT) application as a rich domain for studying the issues of multi-robot learning of new behaviors. We discuss the results of our hand-generated algorithm for CMOMMT, and then describe our research in generating multi-robot learning techniques for the CMOMMT application, comparing the results to the hand-generated solutions. Our results show that, while the learning approach performs better than random, naive approaches, much room still remains to match the results obtained from the hand-generated approach. The ultimate goal of this research is to develop techniques for multi-robot learning and adaptation that will generalize to cooperative robot applications in many domains, thus facilitating the practical use of multi-robot teams in a wide variety of real-world applications. <br> - </details>
+<details> <summary> <a href="http://web.eecs.utk.edu/~leparker/publications/DARS_2000_learning.pdf"> Multi-robot learning in a cooperative observation task </a>Lynne Parker and Claude F. Touzet.  Fifth International Symposium on Distributed Autonomous Robotic Systems (DARS 2000), 2000. <a href="link">  </a> </summary> An important need in multi-robot systems is the development of mechanisms that enable robot teams to autonomously generate cooperative behaviors. This paper first briefly presents the Cooperative Multi-robot Observation of Multiple Moving Targets (CMOMMT) application as a rich domain for studying the issues of multi-robot learning of new behaviors. We discuss the results of our hand-generated algorithm for CMOMMT, and then describe our research in generating multi-robot learning techniques for the CMOMMT application, comparing the results to the hand-generated solutions. Our results show that, while the learning approach performs better than random, naive approaches, much room still remains to match the results obtained from the hand-generated approach. The ultimate goal of this research is to develop techniques for multi-robot learning and adaptation that will generalize to cooperative robot applications in many domains, thus facilitating the practical use of multi-robot teams in a wide variety of real-world applications. <br> - </details>
 
 <details> <summary> <a href="https://arxiv.org/pdf/cs/0105032.pdf"> Learning to Cooperate via Policy Search </a>by Leonid Peshkin, Kee-Eung Kim, Nicolas Meuleau, Leslie Pack Kaelbling. UAI, 2000. <a href="link">  </a> </summary> Cooperative games are those in which both agents share the same payoff structure. Valuebased reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based methods for partially observable environments. In this paper, we provide a gradient-based distributed policysearch method for cooperative games and compare the notion of local optimum to that of Nash equilibrium. We demonstrate the effectiveness of this method experimentally in a small, partially observable simulated soccer domain <br> - </details>
 
@@ -261,21 +261,21 @@ it on a physical robot team. <br> - </details>
 
 <details> <summary> <a href="https://proceedings.neurips.cc/paper/2003/file/c8067ad1937f728f51288b3eb986afaa-Paper.pdf"> All learning is local: Multi-agent learning in global reward games </a>by Yu-Han Chang, Tracey Ho, Leslie Pack Kaelbling. NeurIPS, 2003. <a href="link">  </a> </summary> In large multiagent games, partial observability, coordination, and credit assignment persistently plague attempts to design good learning algorithms. We provide a simple and efficient algorithm that in part uses a linear system to model the world from a single agent’s limited perspective, and takes advantage of Kalman filtering to allow an agent to construct a good training signal and learn an effective policy <br> - </details>
 
-<details> <summary> <a href="https://faculty.cc.gatech.edu/~turk/bio_sim/articles/ant_foraging_revisited.pdf"> Ant Foraging Revisited </a>Liviu A. Panait and Sean Luke. In Proceedings of the Ninth International Conference on the Simulation and Synthesis of Living Systems (ALIFE9), 2004. <a href="link">  </a> </summary> Most previous artificial ant foraging algorithms have to date relied to some degree on a priori knowledge of the environment, in the form of explicit gradients generated by the nest, by hard-coding the nest location in an easily-discoverable place, or by imbuing the artificial ants with the knowledge of the nest direction. In contrast, the work presented solve ant foraging problems using two pheromones, one applied when searching for food and the other when returning food items to the nest. This replaces the need to use complicated nest-discovery devices with simpler mechanisms based on pheromone information, which in turn reduces the ant system complexity. The resulting algorithm is orthogonal and simple, yet ants are able to establish increasingly efficient trails from the nest to the food in the presence of obstacles. The algorithm replaces the blind addition of new amounts of pheromones with an adjustment mechanism that resembles dynamic programming. <br> - </details>
+<details> <summary> <a href="https://faculty.cc.gatech.edu/~turk/bio_sim/articles/ant_foraging_revisited.pdf"> Ant Foraging Revisited </a>Liviu A. Panait and Sean Luke. The Ninth International Conference on the Simulation and Synthesis of Living Systems (ALIFE9), 2004. <a href="link">  </a> </summary> Most previous artificial ant foraging algorithms have to date relied to some degree on a priori knowledge of the environment, in the form of explicit gradients generated by the nest, by hard-coding the nest location in an easily-discoverable place, or by imbuing the artificial ants with the knowledge of the nest direction. In contrast, the work presented solve ant foraging problems using two pheromones, one applied when searching for food and the other when returning food items to the nest. This replaces the need to use complicated nest-discovery devices with simpler mechanisms based on pheromone information, which in turn reduces the ant system complexity. The resulting algorithm is orthogonal and simple, yet ants are able to establish increasingly efficient trails from the nest to the food in the presence of obstacles. The algorithm replaces the blind addition of new amounts of pheromones with an adjustment mechanism that resembles dynamic programming. <br> - </details>
 
-<details> <summary> <a href="https://cs.gmu.edu/~eclab/projects/mason/publications/aamas-ant.pdf"> A Pheromone-Based Utility Model for Collaborative Foraging </a>L. Panait, S. Luke. AAMAS-2004 — Proceedings of the Third International Joint Conference on Autonomous Agents and Multi Agent Systems, 2004. <a href="link">  </a> </summary> Multi-agent research often borrows from biology, where remarkable examples of collective intelligence may be found. One interesting example is ant colonies’ use of pheromones as a joint communication mechanism. In this paper we propose two pheromone-based algorithms for artificial agent foraging, trail-creation, and other tasks. Whereas practically all previous work in this area has focused on biologically-plausible but ad-hoc single pheromone models, we have developed a formalism which uses multiple pheromones to guide cooperative tasks. This model bears some similarity to reinforcement learning. However, our model takes advantage of symmetries common to foraging environments which enables it to achieve much faster reward propagation than reinforcement learning does. Using this approach we demonstrate cooperative behaviors well beyond the previous ant-foraging work, including the ability to create optimal foraging paths in the presence of obstacles, to cope with dynamic environments, and to follow tours with multiple waypoints.We believe that this model may be used for more complex problems still. <br> - </details>
+<details> <summary> <a href="https://cs.gmu.edu/~eclab/projects/mason/publications/aamas-ant.pdf"> A Pheromone-Based Utility Model for Collaborative Foraging </a>L. Panait, S. Luke. AAMAS-2004 — Proceedings of the Third International Joint Conference on AAMAS, 2004. <a href="link">  </a> </summary> Multi-agent research often borrows from biology, where remarkable examples of collective intelligence may be found. One interesting example is ant colonies’ use of pheromones as a joint communication mechanism. In this paper we propose two pheromone-based algorithms for artificial agent foraging, trail-creation, and other tasks. Whereas practically all previous work in this area has focused on biologically-plausible but ad-hoc single pheromone models, we have developed a formalism which uses multiple pheromones to guide cooperative tasks. This model bears some similarity to reinforcement learning. However, our model takes advantage of symmetries common to foraging environments which enables it to achieve much faster reward propagation than reinforcement learning does. Using this approach we demonstrate cooperative behaviors well beyond the previous ant-foraging work, including the ability to create optimal foraging paths in the presence of obstacles, to cope with dynamic environments, and to follow tours with multiple waypoints.We believe that this model may be used for more complex problems still. <br> - </details>
 
-<details> <summary> <a href="https://www.aaai.org/Papers/Symposia/Fall/2004/FS-04-02/FS04-02-010.pdf"> Multi-agent learning in conflicting multi-level games with incomplete information. </a>Maarten Peeters, Katja Verbeeck and Ann Nowe. In Proceedings of Artificial Multiagent Learning. Papers from the 2004 AAAI Fall Symposium. Technical Report FS-04-02, 2004. <a href="link">  </a> </summary> Coordination to some equilibrium point is an interesting problem in multi-agent reinforcement learning. In common interest single stage settings this problem has been studied profoundly and efficient solution techniques have been found. Also for particular multi-stage games some experiments show good results. However, for a large scale of problems the agents do not share a common pay-off function. Again, for single stage problems, a solution technique exists that finds
+<details> <summary> <a href="https://www.aaai.org/Papers/Symposia/Fall/2004/FS-04-02/FS04-02-010.pdf"> Multi-agent learning in conflicting multi-level games with incomplete information. </a>Maarten Peeters, Katja Verbeeck and Ann Nowe. Artificial Multiagent Learning. Papers from the 2004 AAAI Fall Symposium. Technical Report FS-04-02, 2004. <a href="link">  </a> </summary> Coordination to some equilibrium point is an interesting problem in multi-agent reinforcement learning. In common interest single stage settings this problem has been studied profoundly and efficient solution techniques have been found. Also for particular multi-stage games some experiments show good results. However, for a large scale of problems the agents do not share a common pay-off function. Again, for single stage problems, a solution technique exists that finds
 a fair solution for all agents. In this paper we report on a technique that is based on learning automata theory and periodical policies. Letting pseudo-independent agents play periodical policies enables them to behave socially in pure conflicting multi-stage games as defined by E. Billard (Billard & Lakshmivarahan 1999; Zhou, Billard, & Lakshmivarahan
 1999). We experimented with this technique on games where simple learning automata have the tendency not to cooperate or to show oscillating behavior resulting in a suboptimal pay-off. Simulation results illustrate that our technique overcomes these problems and our agents find a fair solution for both agents. <br> - </details>
 
 <details> <summary> <a href="https://cs.gmu.edu/~sean/papers/gecco04-bias.pdf"> A Sensitivity Analysis of a Cooperative Coevolutionary Algorithm Biased for Optimization </a>Liviu Panait, R. Paul Wiegand, and Sean Luke. In Genetic and Evolutionary Computation Conference — GECCO-2004. Springer, 2004. <a href="link">  </a> </summary> Recent theoretical work helped explain certain optimization-related pathologies in cooperative coevolutionary algorithms (CCEAs). Such explanations have led to adopting specific and constructive strategies for improving CCEA optimization performance by biasing the algorithm toward ideal collaboration. This paper investigates how sensitivity to the degree of bias (set in advance) is affected by certain algorithmic and problem properties. We discover that the previous static biasing approach is quite sensitive to a number of problem properties, and we propose a stochastic alternative which alleviates this problem. We believe that finding appropriate biasing rates is more feasible with this new biasing technique. <br> - </details>
 
-<details> <summary> <a href="https://www.cs.cmu.edu/~sandholm/cs15-892F15/MarginalContributionEC05.pdf"> Marginal contribution nets: A compact representation scheme for coalitional games </a>by Ieong S, Shoham Y. In Proceedings of the 6th ACM Conference on Electronic Commerce, 2005. <a href="link">  </a> </summary> We present a new approach to representing coalitional games based on rules that describe the marginal contributions of the agents. This representation scheme captures characteristics of the interactions among the agents in a natural and concise manner. We also develop efficient algorithms for two of the most important solution concepts, the Shapley value and the core, under this representation. The Shapley value can be computed in time linear in the size of the input. The emptiness of the core can be determined in time exponential only in the treewidth of a graphical interpretation of our representation. <br> - </details>
+<details> <summary> <a href="https://www.cs.cmu.edu/~sandholm/cs15-892F15/MarginalContributionEC05.pdf"> Marginal contribution nets: A compact representation scheme for coalitional games </a>by Ieong S, Shoham Y. The 6th ACM Conference on Electronic Commerce, 2005. <a href="link">  </a> </summary> We present a new approach to representing coalitional games based on rules that describe the marginal contributions of the agents. This representation scheme captures characteristics of the interactions among the agents in a natural and concise manner. We also develop efficient algorithms for two of the most important solution concepts, the Shapley value and the core, under this representation. The Shapley value can be computed in time linear in the size of the input. The emptiness of the core can be determined in time exponential only in the treewidth of a graphical interpretation of our representation. <br> - </details>
 
-<details> <summary> <a href="https://cs.gmu.edu/~sean/papers/luke05tunable.pdf"> Tunably decentralized algorithms for cooperative target observation </a>Luke, Sean and Sullivan, Keith and Panait, Liviu and Balan, Gabriel. 4th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2005), 2005. <a href="link">  </a> </summary> Multi-agent problem domains may require distributed algorithms for a variety of reasons: local sensors, limitations of communication, and availability of distributed computational resources. In the absence of these constraints, centralized algorithms are often more efficient, simply because they are able to take advantage of more information. We introduce a variant of the cooperative target observation domain which is free of such constraints. We propose two algorithms, inspired by K-means clustering and hill-climbing respectively, which are scalable in degree of decentralization. Neither algorithm consistently outperforms the other across over all problem domain settings. Surprisingly, we find that hill-climbing is sensitive to degree of decentralization, while K-means is not. We also experiment with a combination of the two algorithms which draws strength from each. <br> - </details>
+<details> <summary> <a href="https://cs.gmu.edu/~sean/papers/luke05tunable.pdf"> Tunably decentralized algorithms for cooperative target observation </a>Luke, Sean and Sullivan, Keith and Panait, Liviu and Balan, Gabriel. AAMAS, 2005. <a href="link">  </a> </summary> Multi-agent problem domains may require distributed algorithms for a variety of reasons: local sensors, limitations of communication, and availability of distributed computational resources. In the absence of these constraints, centralized algorithms are often more efficient, simply because they are able to take advantage of more information. We introduce a variant of the cooperative target observation domain which is free of such constraints. We propose two algorithms, inspired by K-means clustering and hill-climbing respectively, which are scalable in degree of decentralization. Neither algorithm consistently outperforms the other across over all problem domain settings. Surprisingly, we find that hill-climbing is sensitive to degree of decentralization, while K-means is not. We also experiment with a combination of the two algorithms which draws strength from each. <br> - </details>
 
-<details> <summary> <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2006/01/ieong06multi.pdf"> Multi-attribute coalitional games </a>by Ieong S, Shoham Y. In Proceedings of the 7th ACM Conference on Electronic Commerce, 2006. <a href="link">  </a> </summary> We study coalitional games where the value of cooperation among the agents are solely determined by the attributes the agents possess, with no assumption as to how these attributes jointly determine this value. This framework allows us to model diverse economic interactions by picking the right attributes. We study the computational complexity of two coalitional solution concepts for these games — the Shapley value and the core. We show how the positive results obtained in this paper imply comparable results for other games studied in the literature. <br> - </details>
+<details> <summary> <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2006/01/ieong06multi.pdf"> Multi-attribute coalitional games </a>by Ieong S, Shoham Y. The 7th ACM Conference on Electronic Commerce, 2006. <a href="link">  </a> </summary> We study coalitional games where the value of cooperation among the agents are solely determined by the attributes the agents possess, with no assumption as to how these attributes jointly determine this value. This framework allows us to model diverse economic interactions by picking the right attributes. We study the computational complexity of two coalitional solution concepts for these games — the Shapley value and the core. We show how the positive results obtained in this paper imply comparable results for other games studied in the literature. <br> - </details>
 
 <details> <summary> <a href="https://www.aaai.org/Papers/AAAI/2008/AAAI08-015.pdf"> Bayesian Coalitional Games </a>by Ieong S, Shoham Y. AAAI, 2008. <a href="link">  </a> </summary> We introduce Bayesian Coalitional Games (BCGs), a generalization of classical coalitional games to settings with uncertainties. We define the semantics of BCG using the partition model, and generalize the notion of payoffs to contracts among agents. To analyze these games, we extend the solution concept of the core under three natural interpretations—ex ante, ex interim, and ex post—which coincide with the classical definition of the core when there is no uncertainty. In the special case where agents are risk-neutral, we show that checking for core emptiness under all three interpretations can be simplified to linear feasibility problems similar to that of their classical counterpart. <br> - </details>
 
@@ -295,7 +295,7 @@ a fair solution for all agents. In this paper we report on a technique that is b
 
 <details> <summary> <a href="https://www.jair.org/index.php/jair/article/view/10339/24717"> Decision-Theoretic Bidding Based on Learned Density Models in Simultaneous, Interacting Auctions </a>by P. Stone, R. S. P., M. L. Littman, J. A. Csirik, and D. McAlleste. JAIR, 2003. <a href="link">  </a> </summary> Auctions are becoming an increasingly popular method for transacting business, especially over the Internet. This article presents a general approach to building autonomous bidding agents to bid in multiple simultaneous auctions for interacting goods. A core component of our approach learns a model of the empirical price dynamics based on past data and uses the model to analytically calculate, to the greatest extent possible, optimal bids. We introduce a new and general boosting-based algorithm for conditional density estimation problems of this kind, i.e., supervised learning problems in which the goal is to estimate the entire conditional distribution of the real-valued label. This approach is fully implemented as ATTac-2001, a top-scoring agent in the second Trading Agent Competition (TAC-01). We present experiments demonstrating the effectiveness of our boosting-based price predictor relative to several reasonable alternatives. <br> - </details>
 
-<details> <summary> <a href="https://proceedings.neurips.cc/paper/2006/file/0f21f0349462cacdc5796990d37760ae-Paper.pdf"> Learning from Multiple Sources∗ </a>Koby Crammer, Michael Kearns and Jennifer Wortman. Proceedings of the Third International Joint Conference on Autonomous Agents and Multi Agent Systems, 2004. <a href="link">  </a> </summary> We consider the problem of learning accurate models from multiple sources of “nearby” data. Given distinct samples from multiple data sources and estimates of the dissimilarities between these sources, we provide a general theory of which samples should be used to learn models for each source. This theory is applicable in a broad decision-theoretic learning framework, and yields general results for classification and regression. A key component of our approach is the development of approximate triangle inequalities for expected loss, which may be of independent interest. We discuss the related problem of learning parameters of a distribution from multiple data sources. Finally, we illustrate our theory through a series of synthetic simulations. <br> - </details>
+<details> <summary> <a href="https://proceedings.neurips.cc/paper/2006/file/0f21f0349462cacdc5796990d37760ae-Paper.pdf"> Learning from Multiple Sources∗ </a>Koby Crammer, Michael Kearns and Jennifer Wortman. AAMAS, 2004. <a href="link">  </a> </summary> We consider the problem of learning accurate models from multiple sources of “nearby” data. Given distinct samples from multiple data sources and estimates of the dissimilarities between these sources, we provide a general theory of which samples should be used to learn models for each source. This theory is applicable in a broad decision-theoretic learning framework, and yields general results for classification and regression. A key component of our approach is the development of approximate triangle inequalities for expected loss, which may be of independent interest. We discuss the related problem of learning parameters of a distribution from multiple data sources. Finally, we illustrate our theory through a series of synthetic simulations. <br> - </details>
 
 <br/>
 
@@ -420,7 +420,7 @@ to guide this evolution. We propose a simple algorithm that does just that, and
 
 <details> <summary> <a href="https://www.tau.ac.il/~samet/papers/learning-to-play.pdf">  Learning to play games in extensive form by valuation </a>by Phillipe Jehiel, Dov Samet. NAJ Economics, 2001. <a href="link">  </a> </summary> Game theoretic models of learning which are based on the strategic form of the game cannot explain learning in games with large extensive form. We study learning in such games by using valuation of moves. A valuation for a player is a numeric assessment of her moves that purports to reflect their desirability. We consider a myopic player, who chooses moves with the highest valuation. Each time the game is played, the player revises her valuation by assigning the payoff obtained in the play to each of the moves she has made. We show for a repeated win–lose game that if the player has a winning strategy in the stage game, there is almost surely a time after which she always wins. When a player has more than two payoffs, a more elaborate learning procedure is required. We consider one that associates with each move the average payoff in the rounds in which this move was made. When all players adopt this learning procedure, with some perturbations, then, with probability 1 there is a time after which strategies that are close to subgame perfect equilibrium are played. A single player who adopts this procedure can guarantee only her individually rational payoff <br> - </details>
 
-<details> <summary> <a href="https://www.aaai.org/Papers/Symposia/Fall/2004/FS-04-02/FS04-02-004.pdf"> Stochastic Direct Reinforcement: Application to Simple Games with Recurrence </a>John Moody, Yufeng Liu, Matthew Saffell, and Kyoungju Youn. In Proceedings of Artificial Multiagent Learning, 2004. <a href="link">  </a> </summary> We investigate repeated matrix games with stochastic players as a microcosm for studying dynamic, multi-agent interactions using the Stochastic Direct Reinforcement (SDR) policy gradient algorithm. SDR is a generalization of Recurrent Reinforcement Learning (RRL) that supports stochastic policies. Unlike other RL algorithms, SDR and RRL use recurrent policy gradients to properly address temporal credit assignment resulting from recurrent structure. Our main goals in this paper are to (1) distinguish recurrent memory from standard, non-recurrent memory for policy gradient RL, (2) compare SDR with Q-type learning methods for simple games, (3) distinguish reactive from endogenous dynamical agent behavior and (4) explore the use of recurrent learning for interacting, dynamic agents. We find that SDR players learn much faster and hence outperform recently-proposed Q-type learners for the simple game Rock, Paper, Scissors (RPS). With more complex, dynamic SDR players and opponents, we demonstrate that recurrent representations and SDR's recurrent policy gradients yield better performance than non-recurrent players. For the Itterated Prisoners Dilemma, we show that non-recurrent SDR agents learn only to defect (Nash equilibrium), while SDR agents with recurrent gradients can learn a variety of interesting behaviors, including cooperation. <br> - </details>
+<details> <summary> <a href="https://www.aaai.org/Papers/Symposia/Fall/2004/FS-04-02/FS04-02-004.pdf"> Stochastic Direct Reinforcement: Application to Simple Games with Recurrence </a>John Moody, Yufeng Liu, Matthew Saffell, and Kyoungju Youn. Artificial Multiagent Learning, 2004. <a href="link">  </a> </summary> We investigate repeated matrix games with stochastic players as a microcosm for studying dynamic, multi-agent interactions using the Stochastic Direct Reinforcement (SDR) policy gradient algorithm. SDR is a generalization of Recurrent Reinforcement Learning (RRL) that supports stochastic policies. Unlike other RL algorithms, SDR and RRL use recurrent policy gradients to properly address temporal credit assignment resulting from recurrent structure. Our main goals in this paper are to (1) distinguish recurrent memory from standard, non-recurrent memory for policy gradient RL, (2) compare SDR with Q-type learning methods for simple games, (3) distinguish reactive from endogenous dynamical agent behavior and (4) explore the use of recurrent learning for interacting, dynamic agents. We find that SDR players learn much faster and hence outperform recently-proposed Q-type learners for the simple game Rock, Paper, Scissors (RPS). With more complex, dynamic SDR players and opponents, we demonstrate that recurrent representations and SDR's recurrent policy gradients yield better performance than non-recurrent players. For the Itterated Prisoners Dilemma, we show that non-recurrent SDR agents learn only to defect (Nash equilibrium), while SDR agents with recurrent gradients can learn a variety of interesting behaviors, including cooperation. <br> - </details>
 
 <br/>
 
@@ -493,7 +493,7 @@ Halpern begins by surveying possible formal systems for representing uncertainty
 
 <details> <summary> <a href="https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1183&context=cs_faculty_pubs"> Learning to Communicate and Act using Hierarchical Reinforcement Learning </a>by Mohammad Ghavamzadeh, Sridhar Mahadevan. AAMAS, 2004. <a href="link">  </a> </summary> In this paper, we address the issue of rational communication behavior among autonomous agents. The goal is for agents to learn a policy to optimize the communication needed for proper coordination, given the communication cost. We extend our previously reported cooperative hierarchical reinforcement learning (HRL) algorithm to include communication decisions and propose a new multiagent HRL algorithm, called COM-Cooperative HRL. In this algorithm, we define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those levels of the hierarchy which include cooperative subtasks are called cooperation levels. Coordination skills among agents are learned faster by sharing information at the cooperation levels, rather than the level of primitive actions. We add a communication level to the hierarchical decomposition of the problem below each cooperation level. Before making a decision at a cooperative subtask, agents decide if it is worthwhile to perform a communication action. A communication action has a certain cost and provides each agent at a certain cooperation level with the actions selected by the other agents at the same level. We demonstrate the efficacy of the COM-Cooperative HRL algorithm as well as the relation between the communication cost and the learned communication policy using a multiagent taxi domain. <br> - </details>
 
-<details> <summary> <a href="https://mohammadghavamzadeh.github.io/PUBLICATIONS/jaamas06.pdf"> Hierarchical Multi-Agent Reinforcement Learning </a>Mohammad Ghavamzadeh, Sridhar Mahadevan and Rajbala Makar. Autonomous Agents and Multi-Agent Systems, 2006. <a href="link">  </a> </summary> n this paper we investigate the use of hierarchical reinforcement learning to speed up the acquisition of cooperative multi-agent tasks. We extend the MAXQ framework to the multi-agent case. Each agent uses the same MAXQ hierarchy to decompose a task into sub-tasks. Learning is decentralized, with each agent learning three interrelated skills: how to perform subtasks, which order to do them in, and how to coordinate with other agents. Coordination skills among agents are learned by using joint actions at the highest level(s) of the hierarchy. The Q nodes at the highest level(s) of the hierarchy are configured to represent the joint task-action space among multiple agents. In this approach, each agent only knows what other agents are doing at the level of sub-tasks, and is unaware of lower level (primitive) actions. This hierarchical approach allows agents to learn coordination faster by sharing information at the level of sub-tasks, rather than attempting to learn coordination taking into account primit ive joint state-action values. We apply this hierarchical multi-agent reinforcement learning algorithm to a complex AGV scheduling task and compare its performance and speed with other learning approaches, including flat multi-agent, single agent using MAXQ, selfish multiple agents using MAXQ (where each agent acts independently without communicating with the other agents), as well as several well-known AGV heuristics like "first come first serve", "highest queue first" and "nearest station first". We also compare the tradeoffs in learning speed vs. performance of modeling joint action values at multiple levels in the MAXQ hierarchy. <br> - </details>
+<details> <summary> <a href="https://mohammadghavamzadeh.github.io/PUBLICATIONS/jaamas06.pdf"> Hierarchical Multi-Agent Reinforcement Learning </a>Mohammad Ghavamzadeh, Sridhar Mahadevan and Rajbala Makar. AAMAS, 2006. <a href="link">  </a> </summary> n this paper we investigate the use of hierarchical reinforcement learning to speed up the acquisition of cooperative multi-agent tasks. We extend the MAXQ framework to the multi-agent case. Each agent uses the same MAXQ hierarchy to decompose a task into sub-tasks. Learning is decentralized, with each agent learning three interrelated skills: how to perform subtasks, which order to do them in, and how to coordinate with other agents. Coordination skills among agents are learned by using joint actions at the highest level(s) of the hierarchy. The Q nodes at the highest level(s) of the hierarchy are configured to represent the joint task-action space among multiple agents. In this approach, each agent only knows what other agents are doing at the level of sub-tasks, and is unaware of lower level (primitive) actions. This hierarchical approach allows agents to learn coordination faster by sharing information at the level of sub-tasks, rather than attempting to learn coordination taking into account primit ive joint state-action values. We apply this hierarchical multi-agent reinforcement learning algorithm to a complex AGV scheduling task and compare its performance and speed with other learning approaches, including flat multi-agent, single agent using MAXQ, selfish multiple agents using MAXQ (where each agent acts independently without communicating with the other agents), as well as several well-known AGV heuristics like "first come first serve", "highest queue first" and "nearest station first". We also compare the tradeoffs in learning speed vs. performance of modeling joint action values at multiple levels in the MAXQ hierarchy. <br> - </details>
 
 <br/>
 
@@ -669,7 +669,7 @@ This thesis builds a formal framework and approximate planning algorithms that e
 
  <details> <summary> <a href="http://web.mit.edu/jnt/www/Papers/J097-04-joh-ncgame.pdf"> Efficiency loss in a network resource allocation game </a>by Johari R, Tsitsiklis JN. Mathematics of Operations Research, 2004. <a href="link">  </a> </summary> We explore the properties of a congestion game in which users of a congested resource anticipate the effect of their actions on the price of the resource. When users are sharing a single resource, we establish that the aggregate utility received by the users is at least 3/4 of the maximum possible aggregate utility. We also consider extensions to a network context, where users submit individual payments for each link in the network they may wish to use. In this network model, we again show that the selfish behavior of the users leads to an aggregate utility that is no worse than 3/4 of the maximum possible aggregate utility. We also show that the same analysis extends to a wide class of resource allocation systems where end users simultaneously require multiple scarce resources. These results form part of a growing literature on the “price of anarchy,” i.e., the extent to which selfish behavior affects system efficiency. <br> - </details>
 
-<details> <summary> <a href="http://old.gtcenter.org/Archive/Conf07/Downloads/Conf/Guo445.pdf"> Worst-case optimal redistribution of VCG payments </a>by Guo M, Conitzer V. In Proceedings of the 8th ACM conference on Electronic commerce, 2007. <a href="link">  </a> </summary> For allocation problems with one or more items, the wellknown Vickrey-Clarke-Groves (VCG) mechanism is efficient, strategy-proof, individually rational, and does not incur a deficit. However, the VCG mechanism is not (strongly) budget balanced: generally, the agents’ payments will sum to more than 0. If there is an auctioneer who is selling the items, this may be desirable, because the surplus payment corresponds to revenue for the auctioneer. However, if the items do not have an owner and the agents are merely interested in allocating the items efficiently among themselves, any surplus payment is undesirable, because it will have to flow out of the system of agents. In 2006, Cavallo proposed a mechanism that redistributes some of the VCG payment back to the agents, while maintaining efficiency, strategy-proofness, individual rationality, and the non-deficit property. In this paper, we extend this result in a restricted setting. We study allocation settings where there are multiple indistinguishable units of a single good, and agents have unit demand. (For this specific setting, Cavallo’s mechanism coincides with a mechanism proposed by Bailey in 1997.) Here we propose a family of mechanisms that redistribute some of the VCG payment back to the agents. All mechanisms in the family are efficient, strategyproof, individually rational, and never incur a deficit. The family includes the Bailey-Cavallo mechanism as a special case. We then provide an optimization model for finding the optimal mechanism — that is, the mechanism that maximizes redistribution in the worst case — inside the family, and show how to cast this model as a linear program. We give both numerical and analytical solutions of this linear program, and the (unique) resulting mechanism shows significant improvement over the Bailey-Cavallo mechanism (in the worst case). Finally, we prove that the obtained mechanism is optimal among all anonymous deterministic mechanisms that satisfy the above properties. <br> - </details>
+<details> <summary> <a href="http://old.gtcenter.org/Archive/Conf07/Downloads/Conf/Guo445.pdf"> Worst-case optimal redistribution of VCG payments </a>by Guo M, Conitzer V. ACM conference on Electronic commerce, 2007. <a href="link">  </a> </summary> For allocation problems with one or more items, the wellknown Vickrey-Clarke-Groves (VCG) mechanism is efficient, strategy-proof, individually rational, and does not incur a deficit. However, the VCG mechanism is not (strongly) budget balanced: generally, the agents’ payments will sum to more than 0. If there is an auctioneer who is selling the items, this may be desirable, because the surplus payment corresponds to revenue for the auctioneer. However, if the items do not have an owner and the agents are merely interested in allocating the items efficiently among themselves, any surplus payment is undesirable, because it will have to flow out of the system of agents. In 2006, Cavallo proposed a mechanism that redistributes some of the VCG payment back to the agents, while maintaining efficiency, strategy-proofness, individual rationality, and the non-deficit property. In this paper, we extend this result in a restricted setting. We study allocation settings where there are multiple indistinguishable units of a single good, and agents have unit demand. (For this specific setting, Cavallo’s mechanism coincides with a mechanism proposed by Bailey in 1997.) Here we propose a family of mechanisms that redistribute some of the VCG payment back to the agents. All mechanisms in the family are efficient, strategyproof, individually rational, and never incur a deficit. The family includes the Bailey-Cavallo mechanism as a special case. We then provide an optimization model for finding the optimal mechanism — that is, the mechanism that maximizes redistribution in the worst case — inside the family, and show how to cast this model as a linear program. We give both numerical and analytical solutions of this linear program, and the (unique) resulting mechanism shows significant improvement over the Bailey-Cavallo mechanism (in the worst case). Finally, we prove that the obtained mechanism is optimal among all anonymous deterministic mechanisms that satisfy the above properties. <br> - </details>
 
 <details> <summary> <a href="https://nscpolteksby.ac.id/ebook/files/Ebook/Computer%20Engineering/Algorithmic%20Game%20Theory%20%282007%29/22.%20Chapter%2021%20-%20The%20Price%20of%20Anarchy%20and%20the%20Design%20of%20Scalable%20Resource%20Allocation%20Mechanisms.pdf"> The Price of Anarchy and the Design of Scalable Resource Allocation Mechanisms </a>by Johari R. Algorithmic Game Theory, 2007. <a href="link">  </a> </summary> In this chapter, we study the allocation of a single infinitely divisible resource among multiple competing users. While we aim for efficient allocation of the resource, the task is complicated by the fact that users’ utility functions are typically unknown to the resource manager. We study the design of resource allocation mechanisms that are approximately efficient (i.e., have a low price of anarchy), with low communication requirements (i.e., the strategy spaces of users are low dimensional). Our main results concern the proportional allocation mechanism, for which a tight bound on the price of anarchy can be provided. We also show that in a wide range of market mechanisms that use a single market-clearing price, the proportional allocation mechanism minimizes the price of anarchy. Finally, we relax the assumption of a single market-clearing price, and show that by extending the class of Vickrey–Clarke–Groves mechanisms all Nash equilibria can be guaranteed to be fully efficient. <br> - </details>