Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question with the implentation of Modules w.r.t different Multi-agent algorithms #151

Open
sputnik524 opened this issue Apr 14, 2023 · 0 comments

Comments

@sputnik524
Copy link

Hi, contributors of epymarl!

I am currently working on a project where we try to adopt a single-agent RL framework to a multi-agent one and hope to compare on different MA algorithms on our specific problem. After I read through the both papers and corresponding implementation (mainly on QMIX and COMA) , I have some trouble on understanding the implementation of the module part, which contains agents, critics and mixers.

My first concern would be the RNN-Agents. In COMA and QMIX, agents actually play different roles in the algorithm. In QMIX, agents are just local q-functions, which input the obs and actions and outputs the corresponding Q values (action-state value function) to the mixer, where we argmax to obtain the optimal policy (this is more likely to the behaviour of the implemented RNN agents, which outputs q). However, in the COMA, agents are defined in an actor-critic way, just parameterizing a policy, which means it obvious outputs a certain action (maybe in logits manner). How could QMIX and COMA both use the same RNN agent (both algorithms init agents in the controller to interact with the env)? Am I misunderstanding some thing?

My second confusion is about the non-shared COMA (coma_ns.py from module dir). In COMA, the critic is obviously defined as a centralized critic Q(U,s). How could this critic be defined in a decentralized way? Because from my perspective, non-shared modules should only be the agents, not something defined to be centralized. In the COMA_learner.py, a single centralized critic would obviously make more sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant