Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Bid Selection Always Returning 7NT in Bridge Game #324

Open
ahmed-hararaa opened this issue Nov 8, 2024 · 2 comments
Open

Issue with Bid Selection Always Returning 7NT in Bridge Game #324

ahmed-hararaa opened this issue Nov 8, 2024 · 2 comments

Comments

@ahmed-hararaa
Copy link

I am using the Bridge environment in RLCard and made a small modification to the get_payoffs method to print the selected bid at the end of the game. The code modification I made is as follows:

def get_payoffs(self):
    ''' Get the payoffs of players.

    Returns:
        (list): A list of payoffs for each player.
    '''
    round = self.game.round
    bid = self.game.round.contract_bid_move
    print(bid)

    return self.bridgePayoffDelegate.get_payoffs(game=self.game)

However, I have noticed that the bid selection is almost always the same, specifically 7NT (No Trump), regardless of the situation. Below is a sample of the output printed by my code:

E bids 7NT
E bids 7NT
N bids 7NT
N bids 7NT
N bids 7NT
S bids 7NT
N bids 7NT
E bids 7NT
E bids 7NT
W bids 7NT
N bids 7NT
N bids 7NT
W bids 7NT
S bids 7H
S bids 7NT
E bids 7NT
N bids 7NT
N bids 7NT
E bids 7NT

As you can see, the bid is almost always 7NT (No Trump), with only a few exceptions (like 7H). This seems like an issue since bids should vary depending on the game state, the hands of the players, and other factors.

Steps to Reproduce:

  1. Use the RLCard Bridge environment.
  2. Modify the get_payoffs method to print the selected bid
def get_payoffs(self):
    round = self.game.round
    bid = self.game.round.contract_bid_move
    print(bid)
    return self.bridgePayoffDelegate.get_payoffs(game=self.game)
  1. Run several games and observe the output of the bid.

Expected Behavior:
The bid should vary based on the game state, and not always default to 7NT (No Trump). The agents should be making bids based on their hand strength, position, and other factors relevant to the Bridge game.

Actual Behavior:
The bid is almost always 7NT (No Trump), with very few instances of other bids like 7H (Hearts).

@billh0420
Copy link
Contributor

billh0420 commented Dec 1, 2024

A) See my last comment below about how to handle the bidding.

B) You need to make sure that all four agents are "good". Using RandomAgents for the other three players will skew the bidding.

The easiest way to do this is to set all four agents to the dqn_agent. A better way is to use the weights of the training dqn_agent to create the other three agents. It is also a good idea to keep these three agents more current with the training progress of the dqn_agent.

I modified the training loop as follows on lines 9-11:


# Start training
num_episodes = 10000
evaluate_every = 1000
num_eval_games = 10
log_dir = '.'
algorithm = 'dqn'
with Logger(log_dir) as logger:
    for episode in range(num_episodes):
        if episode % 1000 == 0:
            for i in range(1, env.num_players):
                env.agents[i] = agent

        # Generate data from the environment
        trajectories, payoffs = env.run(is_training=True)

        # Reorganaize the data to be state, action, reward, next_state, done
        trajectories = reorganize(trajectories, payoffs)

        # Feed transitions into agent memory, and train the agent
        # Here, we assume that DQN always plays the first position
        # and the other players play randomly (if any)
        for ts in trajectories[0]:
            agent.feed(ts)

        # Evaluate the performance. Play with random agents.
        if episode % evaluate_every == 0:
            logger.log_performance(
                episode,
                tournament(
                    env,
                    num_eval_games,
                )[0]
            )

    # Get the paths
    csv_path, fig_path = logger.csv_path, logger.fig_path

# Plot the learning curve
plot_curve(csv_path, fig_path, algorithm)

C) You need a new BridgePayoffDelegate since the default doesn't care about what the final contract is (only trick play is of concern).

Here is an example of such a BridgePayoffDelegate:


from rlcard.envs.bridge import BridgePayoffDelegate

class BetterBridgePayoffDelegate(BridgePayoffDelegate):

    def __init__(self):
        self.make_bid_bonus = 0

    def get_payoffs(self, game: BridgeGame):
        ''' Get the payoffs of players.

        Returns:
            (list): A list of payoffs for each player.
        '''
        contract_bid_move = game.round.contract_bid_move
        if contract_bid_move:
            declarer = contract_bid_move.player
            bid_trick_count = contract_bid_move.action.bid_amount + 6
            won_trick_counts = game.round.won_trick_counts
            declarer_won_trick_count = won_trick_counts[declarer.player_id % 2]
            defender_won_trick_count = won_trick_counts[(declarer.player_id + 1) % 2]
            if bid_trick_count <= declarer_won_trick_count:
                over_tricks = declarer_won_trick_count - bid_trick_count
                declarer_payoff = bid_trick_count + over_tricks / 10 + self.make_bid_bonus
            else:
                declarer_payoff = declarer_won_trick_count - bid_trick_count
            defender_payoff = defender_won_trick_count / 13
            payoffs = []
            for player_id in range(4):
                payoff = declarer_payoff if player_id % 2 == declarer.player_id % 2 else defender_payoff
                payoffs.append(payoff)
        else:
            payoffs = [0, 0, 0, 0]
        return np.array(payoffs)

This is used as follows:


game = BridgeGame()
betterBridgePayoffDelegate = BetterBridgePayoffDelegate()
config = {"allow_step_back": False, "seed": None}
env = BridgeEnv(config)
env.bridgePayoffDelegate = betterBridgePayoffDelegate

D) After a training session, you can run the following to see what the contract bids are:


for i in range(100):
    game = env.game
    state, next_player_id = game.init_game()
    while not game.is_over() and not game.round.is_bidding_over():
        extracted_state = env._extract_state(game)
        player = game.round.get_current_player()
        current_agent = env.agents[player.player_id]
        action_id, info = current_agent.eval_step(extracted_state)
        action = ActionEvent.from_action_id(action_id)
        state, next_player_id = game.step(action)
    final_contract_bid_move = game.round.contract_bid_move
    if final_contract_bid_move and final_contract_bid_move.action.bid_amount < 7:
        print(f'{game.round.contract_bid_move}')

E) After the bidding becomes more reasonable, you can review the bidding with:


game = env.game
state, next_player_id = game.init_game()
print(f'state={state}')
while not game.is_over() and not game.round.is_bidding_over():
    extracted_state = env._extract_state(game)
    player = game.round.get_current_player()
    current_agent = env.agents[player.player_id]
    action_id, info = current_agent.eval_step(extracted_state)
    action = ActionEvent.from_action_id(action_id)
    print(f"{player}: {action}")
    state, next_player_id = game.step(action)

F) Here is my comment on how to handle the bidding. Using RL will not quite work since the agents will learn a private bidding system. Note: this is true about trick play where they will learn a private signaling system (but this is not as much a problem as with bidding. The bidding has to have a 'published' bidding system filled out on their 'bridge card'.

One way is to hard code the bidding and use the agent only for trick play.

Another way is to choose a bidding system like SAYC and train an agent (need not be dqn) to follow it. The major concern is to avoid 'psych' bids.

@billh0420
Copy link
Contributor

billh0420 commented Dec 3, 2024

A much easier way to get started is to set the east, south, west agents to BridgeDefenderNoviceRuleAgent() rather than to RandomAgent(num_actions).

In this case, the north agent will have to 'guess' what the best contract is. The more points he has and the more high card points he has, the more likely he will bid something while the other players will just pass.

You can combine this with my suggestions in the previous comment as follows:


from rlcard.models.bridge_rule_models import BridgeDefenderNoviceRuleAgent

agents = [north_agent]
for _ in range(1, env.num_players):
    other_agent = DQNAgent(num_actions=num_actions,
                     state_shape=state_shape[0],
                     mlp_layers=mlp_layers,
                     device=device,
                     batch_size=batch_size,
                     replay_memory_init_size=replay_memory_init_size,
                     learning_rate=learning_rate,
                     update_target_estimator_every=update_target_estimator_every)
    other_agent.target_estimator = deepcopy(north_agent.q_estimator)
    agents.append(other_agent)
agents[1] = BridgeDefenderNoviceRuleAgent()
# agents[2] = BridgeDefenderNoviceRuleAgent()  # don't change south_agent to make bidding better
agents[3] = BridgeDefenderNoviceRuleAgent()
env.set_agents(agents)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants