-
Notifications
You must be signed in to change notification settings - Fork 102
Usage
Let's begin by importing the basic packages
>>> import gym
>>> import ma_gym
We have registered all the new multi agent environments
>>> env = gym.make('Switch2-v0')
How many agents does this environment has?
>>> env.n_agents
>>> 2
What's the action space of each agent?
>>> env.action_space
>>> [Discrete(5), Discrete(5)]
What do these actions mean?
>>> env.get_action_meanings() # action meaning of each agent
[['DOWN', 'LEFT', 'UP', 'RIGHT', 'NOOP'], ['DOWN', 'LEFT', 'UP', 'RIGHT', 'NOOP']]
>>> env.get_action_meanings(0) # action meaning of agent '0'
['DOWN', 'LEFT', 'UP', 'RIGHT', 'NOOP']
How do we sample action for each agent? ( much like open-ai gym)
>>> env.action_space.sample()
>>> [0, 2]
>>> env.reset()
>>> [[0.0, 0.17], [0.0, 0.83]]
Let's step into the environment with a random action
>>> obs_n, reward_n, done_n, info = env.step(env.action_space.sample())
Upon step, we get a list of local observation for each agent
>>> obs_n
>>> [[0.0, 0.17], [0.0, 0.83]]
Upon step, We get reward for each agent
>>> reward_n
>>> [-0.1, -0.1]
Also, An episode is considered to be done when all agents die.
>>> done_n
>>> [False, False]
>>> episode_terminate = all(done_n)
And, team reward is simply sum of all local rewards
>>> team_reward = sum(reward_n)
import gym
gym.envs.register(
id='MySwitch2-v0',
entry_point='ma_gym.envs.switch:Switch',
kwargs={'n_agents': 2, 'full_observable': False, 'step_cost': -0.2}
# It has a step cost of -0.2 now
)
env = gym.make('MySwitch2-v0')
For more usage details , refer to : https://github.com/koulanurag/ma-gym/blob/master/ma_gym/init.py
Please note that the following Monitor package is imported from ma_gym
>>> from ma_gym.wrappers import Monitor
>>> env = gym.make('Switch2-v0')
>>> env = Monitor(env, directory='recordings')
This helps in saving video files in the recordings
folder
Tip:
- Save video of every episode:
>>> env = Monitor(env, directory='recordings',video_callable=lambda episode_id: True)
- Save video of every 10th episode
>>> env = Monitor(env, directory='recordings',
... video_callable=lambda episode_id: episode_id%10==0)
Contributions are Welcome!