Added optional double down action to blackjack #1529

chisness · 2019-06-11T02:41:02Z

Extends the game from just hit/stick to the 3rd action of doubling down, which means doubling the bet and getting only 1 additional card

gym/envs/toy_text/blackjack.py

pzhokhov

overall, looks good, requesting a couple of minor changes (in the comments), and also maybe a simple test (say, initialize with double down, make action 2, ensure reward is in {-2, 0, 2}) set). Thanks!

1) Moved comments to docstring and updated docstring 2) Removed if self.double_down since not needed with assert 3) Made rewards always floats 4) Made the natural blackjack result in an instant win (when natural=True, this is a standard casino rule) instead of allowing the dealer to get to 21 to result in a draw. Now only a draw if dealer also has natural blackjack. 5) Test file with output below showing all rewards are {-2, 0, 2} import gym from gym.envs.registration import register ENV_NAME = "BlackjackMax-v0" DOUBLE = 2 HIT = 1 STICK = 0 register(id='BlackjackMax-v0', entry_point='blackjack1:BlackjackEnv1') class Player: def __init__(self): self.env = gym.make(ENV_NAME, natural = True, double_down = True) self.state = self.env.reset() def play_action(self, blackjack_state): return DOUBLE if __name__ == "__main__": agent = Player() new_state = agent.state for i in range(100): while True: print('state', new_state) action = agent.play_action(new_state) new_state, reward, done, _ = agent.env.step(action) print('new_state', new_state) print('reward', reward) if done: new_state = agent.env.reset() print('===New hand===') break state (13, 10, False) new_state (18, 10, False) reward 2.0 ===New hand=== state (13, 8, False) new_state (15, 8, False) reward -2.0 ===New hand=== state (13, 7, False) new_state (23, 7, False) reward -2.0 ===New hand=== state (12, 10, False) new_state (22, 10, False) reward -2.0 ===New hand=== state (13, 4, False) new_state (14, 4, False) reward 2.0 ===New hand=== state (17, 1, True) new_state (21, 1, True) reward 2.0 ===New hand=== state (13, 5, False) new_state (20, 5, False) reward 2.0 ===New hand=== state (6, 9, False) new_state (15, 9, False) reward -2.0 ===New hand=== state (14, 9, True) new_state (19, 9, True) reward 2.0 ===New hand=== state (15, 7, True) new_state (15, 7, False) reward -2.0 ===New hand=== state (13, 10, False) new_state (21, 10, False) reward 2.0 ===New hand=== state (13, 9, False) new_state (15, 9, False) reward -2.0 ===New hand=== state (19, 10, False) new_state (23, 10, False) reward -2.0 ===New hand=== state (11, 1, False) new_state (20, 1, False) reward 2.0 ===New hand=== state (16, 1, False) new_state (26, 1, False) reward -2.0 ===New hand=== state (9, 10, False) new_state (20, 10, True) reward 0.0 ===New hand=== state (13, 4, True) new_state (13, 4, False) reward -2.0 ===New hand=== state (18, 6, False) new_state (21, 6, False) reward 2.0 ===New hand=== state (12, 3, False) new_state (14, 3, False) reward -2.0 ===New hand=== state (20, 3, False) new_state (21, 3, False) reward 2.0 ===New hand=== state (16, 10, False) new_state (26, 10, False) reward -2.0 ===New hand=== state (17, 10, False) new_state (20, 10, False) reward -2.0 ===New hand=== state (11, 9, False) new_state (21, 9, False) reward 2.0 ===New hand=== state (14, 3, False) new_state (15, 3, False) reward -2.0 ===New hand=== state (21, 1, True) new_state (21, 1, True) reward 1.5 ===New hand=== state (14, 6, False) new_state (17, 6, False) reward 2.0 ===New hand=== state (13, 8, False) new_state (21, 8, False) reward 2.0 ===New hand=== state (21, 10, True) new_state (21, 10, True) reward 1.5 ===New hand=== state (16, 2, False) new_state (20, 2, False) reward 2.0 ===New hand=== state (20, 6, False) new_state (25, 6, False) reward -2.0 ===New hand=== state (13, 9, True) new_state (17, 9, True) reward 2.0 ===New hand=== state (21, 10, True) new_state (21, 10, True) reward 1.5 ===New hand=== state (20, 10, False) new_state (21, 10, False) reward 2.0 ===New hand=== state (9, 5, False) new_state (14, 5, False) reward -2.0 ===New hand=== state (19, 2, False) new_state (21, 2, False) reward 2.0 ===New hand=== state (17, 10, True) new_state (17, 10, False) reward -2.0 ===New hand=== state (14, 1, False) new_state (20, 1, False) reward 2.0 ===New hand=== state (13, 7, False) new_state (23, 7, False) reward -2.0 ===New hand=== state (21, 7, True) new_state (21, 7, True) reward 1.5 ===New hand=== state (20, 2, False) new_state (21, 2, False) reward 0.0 ===New hand=== state (8, 8, False) new_state (14, 8, False) reward 2.0 ===New hand=== state (12, 10, False) new_state (19, 10, False) reward 0.0 ===New hand=== state (20, 5, False) new_state (30, 5, False) reward -2.0 ===New hand=== state (14, 6, False) new_state (22, 6, False) reward -2.0 ===New hand=== state (17, 9, True) new_state (12, 9, False) reward -2.0 ===New hand=== state (20, 8, False) new_state (24, 8, False) reward -2.0 ===New hand=== state (12, 8, False) new_state (16, 8, False) reward -2.0 ===New hand=== state (16, 5, False) new_state (18, 5, False) reward 2.0 ===New hand=== state (16, 10, False) new_state (18, 10, False) reward 2.0 ===New hand=== state (19, 8, False) new_state (27, 8, False) reward -2.0 ===New hand=== state (13, 4, False) new_state (17, 4, False) reward -2.0 ===New hand=== state (12, 4, False) new_state (14, 4, False) reward 2.0 ===New hand=== state (18, 9, False) new_state (19, 9, False) reward 0.0 ===New hand=== state (17, 4, False) new_state (21, 4, False) reward 2.0 ===New hand=== state (17, 7, True) new_state (12, 7, False) reward 2.0 ===New hand=== state (11, 8, False) new_state (13, 8, False) reward 2.0 ===New hand=== state (20, 3, False) new_state (24, 3, False) reward -2.0 ===New hand=== state (16, 2, False) new_state (23, 2, False) reward -2.0 ===New hand=== state (18, 4, False) new_state (25, 4, False) reward -2.0 ===New hand=== state (11, 7, False) new_state (21, 7, False) reward 2.0 ===New hand=== state (20, 10, False) new_state (23, 10, False) reward -2.0 ===New hand=== state (12, 10, False) new_state (15, 10, False) reward -2.0 ===New hand=== state (17, 10, True) new_state (17, 10, False) reward -2.0 ===New hand=== state (14, 2, False) new_state (24, 2, False) reward -2.0 ===New hand=== state (17, 10, True) new_state (14, 10, False) reward 2.0 ===New hand=== state (11, 10, False) new_state (21, 10, False) reward 2.0 ===New hand=== state (17, 6, False) new_state (23, 6, False) reward -2.0 ===New hand=== state (10, 1, False) new_state (20, 1, False) reward -2.0 ===New hand=== state (8, 10, False) new_state (18, 10, False) reward -2.0 ===New hand=== state (17, 9, True) new_state (17, 9, False) reward -2.0 ===New hand=== state (6, 2, False) new_state (10, 2, False) reward -2.0 ===New hand=== state (15, 6, False) new_state (17, 6, False) reward 2.0 ===New hand=== state (5, 10, False) new_state (8, 10, False) reward -2.0 ===New hand=== state (14, 8, False) new_state (24, 8, False) reward -2.0 ===New hand=== state (20, 10, False) new_state (29, 10, False) reward -2.0 ===New hand=== state (14, 10, False) new_state (24, 10, False) reward -2.0 ===New hand=== state (18, 10, False) new_state (28, 10, False) reward -2.0 ===New hand=== state (12, 8, False) new_state (22, 8, False) reward -2.0 ===New hand=== state (11, 5, False) new_state (20, 5, False) reward -2.0 ===New hand=== state (12, 2, False) new_state (17, 2, False) reward -2.0 ===New hand=== state (15, 10, False) new_state (17, 10, False) reward -2.0 ===New hand=== state (11, 3, False) new_state (12, 3, False) reward 2.0 ===New hand=== state (21, 10, True) new_state (21, 10, True) reward 1.5 ===New hand=== state (15, 9, True) new_state (16, 9, True) reward -2.0 ===New hand=== state (6, 3, False) new_state (12, 3, False) reward 2.0 ===New hand=== state (21, 10, True) new_state (21, 10, True) reward 1.5 ===New hand=== state (10, 2, False) new_state (20, 2, False) reward 2.0 ===New hand=== state (8, 3, False) new_state (14, 3, False) reward -2.0 ===New hand=== state (10, 5, False) new_state (13, 5, False) reward -2.0 ===New hand=== state (13, 1, False) new_state (15, 1, False) reward 2.0 ===New hand=== state (11, 3, False) new_state (21, 3, False) reward 2.0 ===New hand=== state (18, 10, False) new_state (28, 10, False) reward -2.0 ===New hand=== state (16, 9, True) new_state (12, 9, False) reward -2.0 ===New hand=== state (15, 4, True) new_state (15, 4, False) reward -2.0 ===New hand=== state (17, 3, True) new_state (17, 3, False) reward 2.0 ===New hand=== state (14, 7, False) new_state (21, 7, False) reward 2.0 ===New hand=== state (16, 6, False) new_state (24, 6, False) reward -2.0 ===New hand=== state (11, 6, False) new_state (21, 6, False) reward 2.0 ===New hand=== state (20, 6, False) new_state (22, 6, False) reward -2.0 ===New hand=== state (13, 2, False) new_state (18, 2, False) reward 2.0 ===New hand===

chisness · 2019-06-26T02:40:35Z

@pzhokhov I made those changes and a couple of other small ones in the new commit. I showed a test of 100 double_down actions to show the reward is always {-2, 0, 2} (except the case of natural Blackjack when natural=True, which gives an immediate reward of 1.5).

christopherhesse · 2019-07-19T21:39:00Z

Do we really want to extend these built-in environments instead of having people make their own environments outside the gym repo @pzhokhov?

jkterry1 · 2021-07-27T18:01:30Z

Closing per #2259

jkterry1 · 2021-07-27T18:19:14Z

That was a typo sorry, I'm closing this after discussion with @cpnota because this is intended to match the simplified blackjack game in Barto and Sutton

Added optional double down action

6633097

Extends the game from just hit/stick to the 3rd action of doubling down, which means doubling the bet and getting only 1 additional card

chisness changed the title ~~Added optional double down action~~ Added optional double down action to blackjack Jun 11, 2019

pzhokhov reviewed Jun 21, 2019

View reviewed changes

gym/envs/toy_text/blackjack.py Outdated Show resolved Hide resolved

pzhokhov reviewed Jun 21, 2019

View reviewed changes

gym/envs/toy_text/blackjack.py Outdated Show resolved Hide resolved

pzhokhov requested changes Jun 21, 2019

View reviewed changes

jkterry1 closed this Jul 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added optional double down action to blackjack #1529

Added optional double down action to blackjack #1529

chisness commented Jun 11, 2019

pzhokhov left a comment

chisness commented Jun 26, 2019

christopherhesse commented Jul 19, 2019

jkterry1 commented Jul 27, 2021

jkterry1 commented Jul 27, 2021

Added optional double down action to blackjack #1529

Added optional double down action to blackjack #1529

Conversation

chisness commented Jun 11, 2019

pzhokhov left a comment

Choose a reason for hiding this comment

chisness commented Jun 26, 2019

christopherhesse commented Jul 19, 2019

jkterry1 commented Jul 27, 2021

jkterry1 commented Jul 27, 2021