Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiagent CPP API #584

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

jjshoots
Copy link
Member

@jjshoots jjshoots commented Dec 14, 2024

This updates the CPP interface to allow a 2 player interface. Work for the 4 player interface is left as a future PR because that's quite a bit more complicated.

AFAICT, these are the 2 player games available:

  1. Air Raid
  2. Combat
  3. Double Dunk
  4. Human Cannonball
  5. Ice Hockey
  6. Joust
  7. Maze Craze
  8. Surround
  9. Tennis
  10. Video Checkers
  11. Video Chess

And these are games with 4 players:

  1. Warlords
  2. Flag Capture

I believe we should be able to just copy the games that support multiplayer from the MALE repo since they already have the modifications required. I have modified Surround to support 2 player mode for the testing script below.

The more difficult question is how the Python interface for this should look, since, I presume, the gymnasium API is not sufficient.

Testing

import gymnasium as gym
import ale_py

gym.register_envs(ale_py)

# Initialise the environment
env = gym.make("ALE/Surround-v5", render_mode="human", mode=4)  # mode here controls multiplayer. I believe mode 2 is single player

# Reset the environment to generate the first observation
observation, info = env.reset(seed=42)
for _ in range(300):
    # this is where you would insert your policy
    action = env.action_space.sample()

    # step (transition) through the environment with the action
    # receiving the next observation, reward and if the episode has terminated or truncated
    observation, reward, terminated, truncated, info = env.step(action)

    # If the episode has ended then we can reset to start a new episode
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Expected Behaviour:
Since the gymnasium API by default makes player B's actions NOOP, setting mode=4 means that player B won't have any actions.
Conversely, setting mode=2 would mean that player B will be controlled by the emulator, therefore moving in random directions.

@jjshoots jjshoots marked this pull request as ready for review December 14, 2024 11:02
Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to change the actions? I don't see where that is necessary?
Could you add tests that PettingZoo can work with this?

@jjshoots
Copy link
Member Author

jjshoots commented Dec 15, 2024

@pseudo-rnd-thoughts Yeah we do, both players use different action idx and the game will throw an error here. Although now that you mention it, maybe that's not needed afterall and we can just do a remapping within the C stack. I'll work on that.
Updated to use just one set of actions.

Roger on the PZ tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants