using heuristic with MARL #120

majid5776 · 2024-06-27T08:16:30Z

hello.
how can we use heuristic with multi agent RL.
for example I want to use rrlib.py and run_heuristic.py on example directory in the same time?
this mean I want to use MAPPO with heuristic function.
Does rrlib.py use this heuristic automatically?
Thanks.

matteobettini · 2024-06-27T13:58:01Z

Hello!

I don't super understand the question:

The heuristic is alternative to RL. It is made so that you can compare your RL agents to the performance of the heuristic.

EIther your agents are controlled by an RL policy or by the heuristic policy.

What do you mean when you say you want to mix the two?

majid5776 · 2024-06-27T18:43:03Z

OK thank you for your answer.
you are right. but As you know, solving problems with deep reinforcement learning algorithms is very time-consuming. My idea was to use search methods in some envs like discovery or flocking to calculate action and doing it instead of using epsilon greedy. In my opinion this can reduce the time of training.

matteobettini · 2024-06-28T10:32:07Z

Oh ok got it! You woul like to use the heuristic to bootstrap exploration.

Yes this is a really good idea!

Unfortunately I do not now is there is a default way to do this in rllib, I think you might need to code something custom.

The way I would do it in torchrl and BenchMARL is coding a custom callback that fills the replay buffer with data collected with the heuristic policy

Zartris · 2024-07-01T10:00:20Z

Hey @matteobettini
Do you have an example of the TorchRL callback you talk about here? Would love to see an example.

matteobettini · 2024-07-01T10:14:50Z

I usually write all my custom code here https://github.com/facebookresearch/BenchMARL/blob/main/benchmarl/experiment/callback.py

Examples can be found in my recent project https://github.com/proroklab/ControllingBehavioralDiversity/blob/main/het_control/callback.py

I dont have an example for this specific case, but it should be easy do rollouts in the env with a given policy upon setup and store those in the buffer

matteobettini closed this as completed Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using heuristic with MARL #120

using heuristic with MARL #120

majid5776 commented Jun 27, 2024 •

edited

Loading

matteobettini commented Jun 27, 2024

majid5776 commented Jun 27, 2024 •

edited

Loading

matteobettini commented Jun 28, 2024

Zartris commented Jul 1, 2024

matteobettini commented Jul 1, 2024

using heuristic with MARL #120

using heuristic with MARL #120

Comments

majid5776 commented Jun 27, 2024 • edited Loading

matteobettini commented Jun 27, 2024

majid5776 commented Jun 27, 2024 • edited Loading

matteobettini commented Jun 28, 2024

Zartris commented Jul 1, 2024

matteobettini commented Jul 1, 2024

majid5776 commented Jun 27, 2024 •

edited

Loading

majid5776 commented Jun 27, 2024 •

edited

Loading