[Bug Report] TerminateIllegalWrapper breaks agent selection #1176

dm-ackerman · 2024-02-05T20:55:56Z

Describe the bug

When running an env using TerminateIllegalWrapper and not using the action mask, the agent selection becomes corrupted when an illegal move is made.

Here is an example from tictactoe (code below). Notice in the first game that player 1 starts (as expected) and it alternates between players 1 and 2 (as expected) until player 1 makes an illegal move, which is caught by the wrapper.
However, in the second game, player 1 makes two moves in a row. That should not happen. Also note that the illegal move flagged is not actually illegal per the game rules.
This behaviour has been reported for other games.

New game
--------
calling reset(seed=42)
player_1 is making action: 1 current board:[0, 0, 0, 0, 0, 0, 0, 0, 0]
player_2 is making action: 5 current board:[0, 1, 0, 0, 0, 0, 0, 0, 0]
player_1 is making action: 7 current board:[0, 1, 0, 0, 0, 2, 0, 0, 0]
player_2 is making action: 4 current board:[0, 1, 0, 0, 0, 2, 0, 1, 0]
player_1 is making action: 1 current board:[0, 1, 0, 0, 2, 2, 0, 1, 0]
[WARNING]: Illegal move made, game terminating with current player losing. 
obs['action_mask'] contains a mask of all legal moves that can be chosen.

New game
--------
calling reset(seed=42)
player_1 is making action: 5 current board:[0, 0, 0, 0, 0, 0, 0, 0, 0]
player_1 is making action: 0 current board:[0, 0, 0, 0, 0, 1, 0, 0, 0]
[WARNING]: Illegal move made, game terminating with current player losing. 
obs['action_mask'] contains a mask of all legal moves that can be chosen.

Code example

from pettingzoo.classic import tictactoe_v3

env = tictactoe_v3.env()

def do_game(seed):
    print("\nNew game")
    print("--------")
    
    print(f"calling reset(seed={seed})")
    env.reset(seed)
    for agent in env.agent_iter():
        observation, reward, termination, truncation, info = env.last()

        if termination or truncation:
            env.step(None)
        else:
            mask = observation["action_mask"]
            # this is where you would insert your policy
            action = env.action_space(agent).sample()  # **no action_mask applied**
            print(f"{env.agent_selection} is making action: {action} current board:{env.board}")
            env.step(action)

do_game(42)
do_game(42)

System info

>>> import sys; sys.version
'3.9.12 (main, Apr  5 2022, 06:56:58) \n[GCC 7.5.0]'
>>> pettingzoo.__version__
'1.24.3'

Additional context

No response

Checklist

I have checked that there is no similar issue in the repo

The text was updated successfully, but these errors were encountered:

dm-ackerman · 2024-02-06T16:44:53Z

The problem here is that env variables, including agent_selection, are set by calls from TerminateIllegalWrapper to env functions. However, they are called by the wrapper object, not the env so they are set in the wrapper object rather than the base env object. When the code later tries to run, the values get updated in the env code, but the wrapper pulls it's own values that shadow them.

There are several ways to fix this. I think the most reliable and robust fix is to ensure that base wrapper has all methods from AECEnv and redirects them to be called by the env.

An alternate option is to change TerminateIllegalWrapper to call the method from the unwrapped env, but that is less general because it relies on other wrappers to do that. Alternatively, agent_selection and other values can be set as properties in BaseWrapper, but that requires more items added to BaseWrapper.

dm-ackerman · 2024-02-06T16:56:54Z

(for later reference when testing the fix)
Code that shows the problem:

from pettingzoo.classic import tictactoe_v3
env = tictactoe_v3.env()

def show_agent_selector(env):
    unwrapped = env
    while type(unwrapped) != type(env.unwrapped):
        # the actual value for this wrapper (or a placeholder if no value)
        agent_here = unwrapped.__dict__.get("agent_selection", "--------")
        print(f"agent_selection is {agent_here} in wrapper: {unwrapped.__class__}")
        unwrapped = unwrapped.env

    agent_here = env.unwrapped.__dict__.get("agent_selection", "------")
    print(f"agent_selection is {agent_here} in env    : {unwrapped.__class__}")

def do_game(seed):
    print("\nNew game\n")
    
    print(f"calling reset(seed={seed})")
    env.reset(seed)
    for agent in env.agent_iter():
        observation, reward, termination, truncation, info = env.last()

        if termination or truncation:
            env.step(None)
        else:
            mask = observation["action_mask"]
            # this is where you would insert your policy
            action = env.action_space(agent).sample()
            print(f"{env.agent_selection} is making action: {action} current board:{env.board}")
            env.step(action)

show_agent_selector(env)
do_game(42)
show_agent_selector(env)
do_game(42)
show_agent_selector(env)

Output
Before the game only raw_env had agent_selection defined. After the illegal move, the illegal wrapper had it defined as well which didn't get reset when resetting. A fix will prevent the wrapper from defining its own agent_selection (and any other variables that should be in raw_env)

agent_selection is -------- in wrapper: <class 'pettingzoo.utils.wrappers.order_enforcing.OrderEnforcingWrapper'>
agent_selection is -------- in wrapper: <class 'pettingzoo.utils.wrappers.assert_out_of_bounds.AssertOutOfBoundsWrapper'>
agent_selection is -------- in wrapper: <class 'pettingzoo.utils.wrappers.terminate_illegal.TerminateIllegalWrapper'>
agent_selection is player_1 in env : <class 'pettingzoo.classic.tictactoe.tictactoe.raw_env'>

New game

calling reset(seed=42)
player_1 is making action: 3 current board:[0, 0, 0, 0, 0, 0, 0, 0, 0]
player_2 is making action: 6 current board:[0, 0, 0, 1, 0, 0, 0, 0, 0]
player_1 is making action: 7 current board:[0, 0, 0, 1, 0, 0, 2, 0, 0]
player_2 is making action: 4 current board:[0, 0, 0, 1, 0, 0, 2, 1, 0]
player_1 is making action: 8 current board:[0, 0, 0, 1, 2, 0, 2, 1, 0]
player_2 is making action: 4 current board:[0, 0, 0, 1, 2, 0, 2, 1, 1]
[WARNING]: Illegal move made, game terminating with current player losing.
obs['action_mask'] contains a mask of all legal moves that can be chosen.
agent_selection is -------- in wrapper: <class 'pettingzoo.utils.wrappers.order_enforcing.OrderEnforcingWrapper'>
agent_selection is -------- in wrapper: <class 'pettingzoo.utils.wrappers.assert_out_of_bounds.AssertOutOfBoundsWrapper'>
agent_selection is player_2 in wrapper: <class 'pettingzoo.utils.wrappers.terminate_illegal.TerminateIllegalWrapper'>
agent_selection is player_2 in env : <class 'pettingzoo.classic.tictactoe.tictactoe.raw_env'>

New game

calling reset(seed=42)
player_2 is making action: 0 current board:[0, 0, 0, 0, 0, 0, 0, 0, 0]
[WARNING]: Illegal move made, game terminating with current player losing.
obs['action_mask'] contains a mask of all legal moves that can be chosen.
agent_selection is -------- in wrapper: <class 'pettingzoo.utils.wrappers.order_enforcing.OrderEnforcingWrapper'>
agent_selection is -------- in wrapper: <class 'pettingzoo.utils.wrappers.assert_out_of_bounds.AssertOutOfBoundsWrapper'>
agent_selection is player_2 in wrapper: <class 'pettingzoo.utils.wrappers.terminate_illegal.TerminateIllegalWrapper'>
agent_selection is player_1 in env : <class 'pettingzoo.classic.tictactoe.tictactoe.raw_env'>

Adding this ensures that variables get set to the appropriate location. By default, any public value set by the wrapper is sent to the env to be set there instead of on the wrapper. If a variable is meant to be set by the wrapper, it should be listed in the _local_vars class variable of the wrapper. This is not ideal, but seems to be the most reasonable design. An example of needing to specify which vars to keep locally is here: https://python-patterns.guide/gang-of-four/decorator-pattern/#implementing-dynamic-wrapper The solution is to list which vars should be in the wrapper and check them when setting a value. That is the approach used in this commit, but more generalized. In line with __getattr__, private values cannot be set on underlying envs. There are two exceptions: _cumulative_rewards was previously exempted in __getattr__ because it is used by many envs. _skip_agent_selection is added because is used byt the dead step handling. If a wrapper can't set this, that functionality will break.

elliottower · 2024-03-13T15:57:12Z

I think there should be a good way to fix this, will discuss in the other PR or make a new one

elliottower · 2024-03-13T15:57:39Z

Thanks for the detailed code we can adapt this as a test case to ensure it doesn’t break in the future

dm-ackerman added the bug Something isn't working label Feb 5, 2024

dm-ackerman mentioned this issue Feb 10, 2024

Add __setattr__ to wrappers. Fixes #1176 #1180

Open

7 tasks

dm-ackerman mentioned this issue May 7, 2024

TerminateIllegalWrapper fix #1206

Merged

7 tasks

jjshoots closed this as completed in #1206 Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] TerminateIllegalWrapper breaks agent selection #1176

[Bug Report] TerminateIllegalWrapper breaks agent selection #1176

dm-ackerman commented Feb 5, 2024

dm-ackerman commented Feb 6, 2024

dm-ackerman commented Feb 6, 2024

elliottower commented Mar 13, 2024

elliottower commented Mar 13, 2024

[Bug Report] TerminateIllegalWrapper breaks agent selection #1176

[Bug Report] TerminateIllegalWrapper breaks agent selection #1176

Comments

dm-ackerman commented Feb 5, 2024

Describe the bug

Code example

System info

Additional context

Checklist

dm-ackerman commented Feb 6, 2024

dm-ackerman commented Feb 6, 2024

elliottower commented Mar 13, 2024

elliottower commented Mar 13, 2024