Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using heuristic with MARL #120

Closed
majid5776 opened this issue Jun 27, 2024 · 5 comments
Closed

using heuristic with MARL #120

majid5776 opened this issue Jun 27, 2024 · 5 comments

Comments

@majid5776
Copy link

majid5776 commented Jun 27, 2024

hello.
how can we use heuristic with multi agent RL.
for example I want to use rrlib.py and run_heuristic.py on example directory in the same time?
this mean I want to use MAPPO with heuristic function.
Does rrlib.py use this heuristic automatically?
Thanks.

@matteobettini
Copy link
Member

Hello!

I don't super understand the question:

The heuristic is alternative to RL. It is made so that you can compare your RL agents to the performance of the heuristic.

EIther your agents are controlled by an RL policy or by the heuristic policy.

What do you mean when you say you want to mix the two?

@majid5776
Copy link
Author

majid5776 commented Jun 27, 2024

OK thank you for your answer.
you are right. but As you know, solving problems with deep reinforcement learning algorithms is very time-consuming. My idea was to use search methods in some envs like discovery or flocking to calculate action and doing it instead of using epsilon greedy. In my opinion this can reduce the time of training.

@matteobettini
Copy link
Member

Oh ok got it! You woul like to use the heuristic to bootstrap exploration.

Yes this is a really good idea!

Unfortunately I do not now is there is a default way to do this in rllib, I think you might need to code something custom.

The way I would do it in torchrl and BenchMARL is coding a custom callback that fills the replay buffer with data collected with the heuristic policy

@Zartris
Copy link
Contributor

Zartris commented Jul 1, 2024

Hey @matteobettini
Do you have an example of the TorchRL callback you talk about here? Would love to see an example.

@matteobettini
Copy link
Member

I usually write all my custom code here https://github.com/facebookresearch/BenchMARL/blob/main/benchmarl/experiment/callback.py

Examples can be found in my recent project https://github.com/proroklab/ControllingBehavioralDiversity/blob/main/het_control/callback.py

I dont have an example for this specific case, but it should be easy do rollouts in the env with a given policy upon setup and store those in the buffer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants