Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added optional double down action to blackjack #1529

Closed
wants to merge 2 commits into from

Conversation

chisness
Copy link

Extends the game from just hit/stick to the 3rd action of doubling down, which means doubling the bet and getting only 1 additional card

Extends the game from just hit/stick to the 3rd action of doubling down, which means doubling the bet and getting only 1 additional card
@chisness chisness changed the title Added optional double down action Added optional double down action to blackjack Jun 11, 2019
Copy link
Collaborator

@pzhokhov pzhokhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall, looks good, requesting a couple of minor changes (in the comments), and also maybe a simple test (say, initialize with double down, make action 2, ensure reward is in {-2, 0, 2}) set). Thanks!

1) Moved comments to docstring and updated docstring

2) Removed if self.double_down since not needed with assert

3) Made rewards always floats

4) Made the natural blackjack result in an instant win (when natural=True, this is a standard casino rule) instead of allowing the dealer to get to 21 to result in a draw. Now only a draw if dealer also has natural blackjack.

5) Test file with output below showing all rewards are {-2, 0, 2}

import gym
from gym.envs.registration import register

ENV_NAME = "BlackjackMax-v0"
DOUBLE = 2
HIT = 1
STICK = 0

register(id='BlackjackMax-v0', entry_point='blackjack1:BlackjackEnv1')

class Player:
	def __init__(self):
		self.env = gym.make(ENV_NAME, natural = True, double_down = True)
		self.state = self.env.reset()

	def play_action(self, blackjack_state):
		return DOUBLE

if __name__ == "__main__":
	agent = Player()
	new_state = agent.state

	for i in range(100):
		while True:
			print('state', new_state)
			action = agent.play_action(new_state)
			new_state, reward, done, _ = agent.env.step(action)
			print('new_state', new_state)
			print('reward', reward)
			if done:
				new_state = agent.env.reset()
				print('===New hand===')
				break

state (13, 10, False)
new_state (18, 10, False)
reward 2.0
===New hand===
state (13, 8, False)
new_state (15, 8, False)
reward -2.0
===New hand===
state (13, 7, False)
new_state (23, 7, False)
reward -2.0
===New hand===
state (12, 10, False)
new_state (22, 10, False)
reward -2.0
===New hand===
state (13, 4, False)
new_state (14, 4, False)
reward 2.0
===New hand===
state (17, 1, True)
new_state (21, 1, True)
reward 2.0
===New hand===
state (13, 5, False)
new_state (20, 5, False)
reward 2.0
===New hand===
state (6, 9, False)
new_state (15, 9, False)
reward -2.0
===New hand===
state (14, 9, True)
new_state (19, 9, True)
reward 2.0
===New hand===
state (15, 7, True)
new_state (15, 7, False)
reward -2.0
===New hand===
state (13, 10, False)
new_state (21, 10, False)
reward 2.0
===New hand===
state (13, 9, False)
new_state (15, 9, False)
reward -2.0
===New hand===
state (19, 10, False)
new_state (23, 10, False)
reward -2.0
===New hand===
state (11, 1, False)
new_state (20, 1, False)
reward 2.0
===New hand===
state (16, 1, False)
new_state (26, 1, False)
reward -2.0
===New hand===
state (9, 10, False)
new_state (20, 10, True)
reward 0.0
===New hand===
state (13, 4, True)
new_state (13, 4, False)
reward -2.0
===New hand===
state (18, 6, False)
new_state (21, 6, False)
reward 2.0
===New hand===
state (12, 3, False)
new_state (14, 3, False)
reward -2.0
===New hand===
state (20, 3, False)
new_state (21, 3, False)
reward 2.0
===New hand===
state (16, 10, False)
new_state (26, 10, False)
reward -2.0
===New hand===
state (17, 10, False)
new_state (20, 10, False)
reward -2.0
===New hand===
state (11, 9, False)
new_state (21, 9, False)
reward 2.0
===New hand===
state (14, 3, False)
new_state (15, 3, False)
reward -2.0
===New hand===
state (21, 1, True)
new_state (21, 1, True)
reward 1.5
===New hand===
state (14, 6, False)
new_state (17, 6, False)
reward 2.0
===New hand===
state (13, 8, False)
new_state (21, 8, False)
reward 2.0
===New hand===
state (21, 10, True)
new_state (21, 10, True)
reward 1.5
===New hand===
state (16, 2, False)
new_state (20, 2, False)
reward 2.0
===New hand===
state (20, 6, False)
new_state (25, 6, False)
reward -2.0
===New hand===
state (13, 9, True)
new_state (17, 9, True)
reward 2.0
===New hand===
state (21, 10, True)
new_state (21, 10, True)
reward 1.5
===New hand===
state (20, 10, False)
new_state (21, 10, False)
reward 2.0
===New hand===
state (9, 5, False)
new_state (14, 5, False)
reward -2.0
===New hand===
state (19, 2, False)
new_state (21, 2, False)
reward 2.0
===New hand===
state (17, 10, True)
new_state (17, 10, False)
reward -2.0
===New hand===
state (14, 1, False)
new_state (20, 1, False)
reward 2.0
===New hand===
state (13, 7, False)
new_state (23, 7, False)
reward -2.0
===New hand===
state (21, 7, True)
new_state (21, 7, True)
reward 1.5
===New hand===
state (20, 2, False)
new_state (21, 2, False)
reward 0.0
===New hand===
state (8, 8, False)
new_state (14, 8, False)
reward 2.0
===New hand===
state (12, 10, False)
new_state (19, 10, False)
reward 0.0
===New hand===
state (20, 5, False)
new_state (30, 5, False)
reward -2.0
===New hand===
state (14, 6, False)
new_state (22, 6, False)
reward -2.0
===New hand===
state (17, 9, True)
new_state (12, 9, False)
reward -2.0
===New hand===
state (20, 8, False)
new_state (24, 8, False)
reward -2.0
===New hand===
state (12, 8, False)
new_state (16, 8, False)
reward -2.0
===New hand===
state (16, 5, False)
new_state (18, 5, False)
reward 2.0
===New hand===
state (16, 10, False)
new_state (18, 10, False)
reward 2.0
===New hand===
state (19, 8, False)
new_state (27, 8, False)
reward -2.0
===New hand===
state (13, 4, False)
new_state (17, 4, False)
reward -2.0
===New hand===
state (12, 4, False)
new_state (14, 4, False)
reward 2.0
===New hand===
state (18, 9, False)
new_state (19, 9, False)
reward 0.0
===New hand===
state (17, 4, False)
new_state (21, 4, False)
reward 2.0
===New hand===
state (17, 7, True)
new_state (12, 7, False)
reward 2.0
===New hand===
state (11, 8, False)
new_state (13, 8, False)
reward 2.0
===New hand===
state (20, 3, False)
new_state (24, 3, False)
reward -2.0
===New hand===
state (16, 2, False)
new_state (23, 2, False)
reward -2.0
===New hand===
state (18, 4, False)
new_state (25, 4, False)
reward -2.0
===New hand===
state (11, 7, False)
new_state (21, 7, False)
reward 2.0
===New hand===
state (20, 10, False)
new_state (23, 10, False)
reward -2.0
===New hand===
state (12, 10, False)
new_state (15, 10, False)
reward -2.0
===New hand===
state (17, 10, True)
new_state (17, 10, False)
reward -2.0
===New hand===
state (14, 2, False)
new_state (24, 2, False)
reward -2.0
===New hand===
state (17, 10, True)
new_state (14, 10, False)
reward 2.0
===New hand===
state (11, 10, False)
new_state (21, 10, False)
reward 2.0
===New hand===
state (17, 6, False)
new_state (23, 6, False)
reward -2.0
===New hand===
state (10, 1, False)
new_state (20, 1, False)
reward -2.0
===New hand===
state (8, 10, False)
new_state (18, 10, False)
reward -2.0
===New hand===
state (17, 9, True)
new_state (17, 9, False)
reward -2.0
===New hand===
state (6, 2, False)
new_state (10, 2, False)
reward -2.0
===New hand===
state (15, 6, False)
new_state (17, 6, False)
reward 2.0
===New hand===
state (5, 10, False)
new_state (8, 10, False)
reward -2.0
===New hand===
state (14, 8, False)
new_state (24, 8, False)
reward -2.0
===New hand===
state (20, 10, False)
new_state (29, 10, False)
reward -2.0
===New hand===
state (14, 10, False)
new_state (24, 10, False)
reward -2.0
===New hand===
state (18, 10, False)
new_state (28, 10, False)
reward -2.0
===New hand===
state (12, 8, False)
new_state (22, 8, False)
reward -2.0
===New hand===
state (11, 5, False)
new_state (20, 5, False)
reward -2.0
===New hand===
state (12, 2, False)
new_state (17, 2, False)
reward -2.0
===New hand===
state (15, 10, False)
new_state (17, 10, False)
reward -2.0
===New hand===
state (11, 3, False)
new_state (12, 3, False)
reward 2.0
===New hand===
state (21, 10, True)
new_state (21, 10, True)
reward 1.5
===New hand===
state (15, 9, True)
new_state (16, 9, True)
reward -2.0
===New hand===
state (6, 3, False)
new_state (12, 3, False)
reward 2.0
===New hand===
state (21, 10, True)
new_state (21, 10, True)
reward 1.5
===New hand===
state (10, 2, False)
new_state (20, 2, False)
reward 2.0
===New hand===
state (8, 3, False)
new_state (14, 3, False)
reward -2.0
===New hand===
state (10, 5, False)
new_state (13, 5, False)
reward -2.0
===New hand===
state (13, 1, False)
new_state (15, 1, False)
reward 2.0
===New hand===
state (11, 3, False)
new_state (21, 3, False)
reward 2.0
===New hand===
state (18, 10, False)
new_state (28, 10, False)
reward -2.0
===New hand===
state (16, 9, True)
new_state (12, 9, False)
reward -2.0
===New hand===
state (15, 4, True)
new_state (15, 4, False)
reward -2.0
===New hand===
state (17, 3, True)
new_state (17, 3, False)
reward 2.0
===New hand===
state (14, 7, False)
new_state (21, 7, False)
reward 2.0
===New hand===
state (16, 6, False)
new_state (24, 6, False)
reward -2.0
===New hand===
state (11, 6, False)
new_state (21, 6, False)
reward 2.0
===New hand===
state (20, 6, False)
new_state (22, 6, False)
reward -2.0
===New hand===
state (13, 2, False)
new_state (18, 2, False)
reward 2.0
===New hand===
@chisness
Copy link
Author

@pzhokhov I made those changes and a couple of other small ones in the new commit. I showed a test of 100 double_down actions to show the reward is always {-2, 0, 2} (except the case of natural Blackjack when natural=True, which gives an immediate reward of 1.5).

@christopherhesse
Copy link
Contributor

Do we really want to extend these built-in environments instead of having people make their own environments outside the gym repo @pzhokhov?

@jkterry1
Copy link
Collaborator

Closing per #2259

@jkterry1 jkterry1 closed this Jul 27, 2021
@jkterry1
Copy link
Collaborator

That was a typo sorry, I'm closing this after discussion with @cpnota because this is intended to match the simplified blackjack game in Barto and Sutton

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants