Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] TerminateIllegalWrapper breaks agent selection #1176

Closed
1 task done
dm-ackerman opened this issue Feb 5, 2024 · 4 comments · Fixed by #1206 · May be fixed by #1180
Closed
1 task done

[Bug Report] TerminateIllegalWrapper breaks agent selection #1176

dm-ackerman opened this issue Feb 5, 2024 · 4 comments · Fixed by #1206 · May be fixed by #1180
Labels
bug Something isn't working

Comments

@dm-ackerman
Copy link
Contributor

Describe the bug

When running an env using TerminateIllegalWrapper and not using the action mask, the agent selection becomes corrupted when an illegal move is made.

Here is an example from tictactoe (code below). Notice in the first game that player 1 starts (as expected) and it alternates between players 1 and 2 (as expected) until player 1 makes an illegal move, which is caught by the wrapper.
However, in the second game, player 1 makes two moves in a row. That should not happen. Also note that the illegal move flagged is not actually illegal per the game rules.
This behaviour has been reported for other games.

New game
--------
calling reset(seed=42)
player_1 is making action: 1 current board:[0, 0, 0, 0, 0, 0, 0, 0, 0]
player_2 is making action: 5 current board:[0, 1, 0, 0, 0, 0, 0, 0, 0]
player_1 is making action: 7 current board:[0, 1, 0, 0, 0, 2, 0, 0, 0]
player_2 is making action: 4 current board:[0, 1, 0, 0, 0, 2, 0, 1, 0]
player_1 is making action: 1 current board:[0, 1, 0, 0, 2, 2, 0, 1, 0]
[WARNING]: Illegal move made, game terminating with current player losing. 
obs['action_mask'] contains a mask of all legal moves that can be chosen.

New game
--------
calling reset(seed=42)
player_1 is making action: 5 current board:[0, 0, 0, 0, 0, 0, 0, 0, 0]
player_1 is making action: 0 current board:[0, 0, 0, 0, 0, 1, 0, 0, 0]
[WARNING]: Illegal move made, game terminating with current player losing. 
obs['action_mask'] contains a mask of all legal moves that can be chosen.

Code example

from pettingzoo.classic import tictactoe_v3

env = tictactoe_v3.env()

def do_game(seed):
    print("\nNew game")
    print("--------")
    
    print(f"calling reset(seed={seed})")
    env.reset(seed)
    for agent in env.agent_iter():
        observation, reward, termination, truncation, info = env.last()

        if termination or truncation:
            env.step(None)
        else:
            mask = observation["action_mask"]
            # this is where you would insert your policy
            action = env.action_space(agent).sample()  # **no action_mask applied**
            print(f"{env.agent_selection} is making action: {action} current board:{env.board}")
            env.step(action)

do_game(42)
do_game(42)

System info

>>> import sys; sys.version
'3.9.12 (main, Apr  5 2022, 06:56:58) \n[GCC 7.5.0]'
>>> pettingzoo.__version__
'1.24.3'

Additional context

No response

Checklist

  • I have checked that there is no similar issue in the repo
@dm-ackerman dm-ackerman added the bug Something isn't working label Feb 5, 2024
@dm-ackerman
Copy link
Contributor Author

The problem here is that env variables, including agent_selection, are set by calls from TerminateIllegalWrapper to env functions. However, they are called by the wrapper object, not the env so they are set in the wrapper object rather than the base env object. When the code later tries to run, the values get updated in the env code, but the wrapper pulls it's own values that shadow them.

There are several ways to fix this. I think the most reliable and robust fix is to ensure that base wrapper has all methods from AECEnv and redirects them to be called by the env.

An alternate option is to change TerminateIllegalWrapper to call the method from the unwrapped env, but that is less general because it relies on other wrappers to do that. Alternatively, agent_selection and other values can be set as properties in BaseWrapper, but that requires more items added to BaseWrapper.

@dm-ackerman
Copy link
Contributor Author

(for later reference when testing the fix)
Code that shows the problem:

from pettingzoo.classic import tictactoe_v3
env = tictactoe_v3.env()

def show_agent_selector(env):
    unwrapped = env
    while type(unwrapped) != type(env.unwrapped):
        # the actual value for this wrapper (or a placeholder if no value)
        agent_here = unwrapped.__dict__.get("agent_selection", "--------")
        print(f"agent_selection is {agent_here} in wrapper: {unwrapped.__class__}")
        unwrapped = unwrapped.env

    agent_here = env.unwrapped.__dict__.get("agent_selection", "------")
    print(f"agent_selection is {agent_here} in env    : {unwrapped.__class__}")

def do_game(seed):
    print("\nNew game\n")
    
    print(f"calling reset(seed={seed})")
    env.reset(seed)
    for agent in env.agent_iter():
        observation, reward, termination, truncation, info = env.last()

        if termination or truncation:
            env.step(None)
        else:
            mask = observation["action_mask"]
            # this is where you would insert your policy
            action = env.action_space(agent).sample()
            print(f"{env.agent_selection} is making action: {action} current board:{env.board}")
            env.step(action)

show_agent_selector(env)
do_game(42)
show_agent_selector(env)
do_game(42)
show_agent_selector(env)

Output
Before the game only raw_env had agent_selection defined. After the illegal move, the illegal wrapper had it defined as well which didn't get reset when resetting. A fix will prevent the wrapper from defining its own agent_selection (and any other variables that should be in raw_env)

agent_selection is -------- in wrapper: <class 'pettingzoo.utils.wrappers.order_enforcing.OrderEnforcingWrapper'>
agent_selection is -------- in wrapper: <class 'pettingzoo.utils.wrappers.assert_out_of_bounds.AssertOutOfBoundsWrapper'>
agent_selection is -------- in wrapper: <class 'pettingzoo.utils.wrappers.terminate_illegal.TerminateIllegalWrapper'>
agent_selection is player_1 in env : <class 'pettingzoo.classic.tictactoe.tictactoe.raw_env'>

New game

calling reset(seed=42)
player_1 is making action: 3 current board:[0, 0, 0, 0, 0, 0, 0, 0, 0]
player_2 is making action: 6 current board:[0, 0, 0, 1, 0, 0, 0, 0, 0]
player_1 is making action: 7 current board:[0, 0, 0, 1, 0, 0, 2, 0, 0]
player_2 is making action: 4 current board:[0, 0, 0, 1, 0, 0, 2, 1, 0]
player_1 is making action: 8 current board:[0, 0, 0, 1, 2, 0, 2, 1, 0]
player_2 is making action: 4 current board:[0, 0, 0, 1, 2, 0, 2, 1, 1]
[WARNING]: Illegal move made, game terminating with current player losing.
obs['action_mask'] contains a mask of all legal moves that can be chosen.
agent_selection is -------- in wrapper: <class 'pettingzoo.utils.wrappers.order_enforcing.OrderEnforcingWrapper'>
agent_selection is -------- in wrapper: <class 'pettingzoo.utils.wrappers.assert_out_of_bounds.AssertOutOfBoundsWrapper'>
agent_selection is player_2 in wrapper: <class 'pettingzoo.utils.wrappers.terminate_illegal.TerminateIllegalWrapper'>
agent_selection is player_2 in env : <class 'pettingzoo.classic.tictactoe.tictactoe.raw_env'>

New game

calling reset(seed=42)
player_2 is making action: 0 current board:[0, 0, 0, 0, 0, 0, 0, 0, 0]
[WARNING]: Illegal move made, game terminating with current player losing.
obs['action_mask'] contains a mask of all legal moves that can be chosen.
agent_selection is -------- in wrapper: <class 'pettingzoo.utils.wrappers.order_enforcing.OrderEnforcingWrapper'>
agent_selection is -------- in wrapper: <class 'pettingzoo.utils.wrappers.assert_out_of_bounds.AssertOutOfBoundsWrapper'>
agent_selection is player_2 in wrapper: <class 'pettingzoo.utils.wrappers.terminate_illegal.TerminateIllegalWrapper'>
agent_selection is player_1 in env : <class 'pettingzoo.classic.tictactoe.tictactoe.raw_env'>

dm-ackerman added a commit to dm-ackerman/PettingZoo that referenced this issue Feb 10, 2024
Adding this ensures that variables get set to the appropriate
location. By default, any public value set by the wrapper is sent
to the env to be set there instead of on the wrapper. If a variable
is meant to be set by the wrapper, it should be listed in the
_local_vars class variable of the wrapper. This is not ideal, but
seems to be the most reasonable design.

An example of needing to specify which vars to keep locally is here:
https://python-patterns.guide/gang-of-four/decorator-pattern/#implementing-dynamic-wrapper
The solution is to list which vars should be in the wrapper and
check them when setting a value. That is the approach used in this
commit, but more generalized.

In line with __getattr__, private values cannot be set on underlying
envs. There are two exceptions:
_cumulative_rewards was previously exempted in __getattr__ because it
is used by many envs.
_skip_agent_selection is added because is used byt the dead step
handling. If a wrapper can't set this, that functionality will break.
@elliottower
Copy link
Member

I think there should be a good way to fix this, will discuss in the other PR or make a new one

@elliottower
Copy link
Member

Thanks for the detailed code we can adapt this as a test case to ensure it doesn’t break in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants