gyx v0.1.1 Gyx.Environments.Pure.Blackjack
This is an environment implementation of the game of Blackjack as described in Sutton and Barto RL book Example 5.1 cited below.
Exctract from Sutton and Barto RL book:
The object of the popular casino card game of blackjack is toobtain
cards the sum of whose numerical values is as great as possible without
exceeding 21
.
All face cards count as 10
, and an ace can count either
as 1
or as 11
. We considerthe version in which each player competes
independently against the dealer. The gamebegins with two cards dealt
to both dealer and player. One of the dealer’s cards is faceup
and the other is face down. If the player has 21
immediately
(an ace and a 10-card),it is called anatural. He then wins unless
the dealer also has a natural, in which case thegame is a draw. If
the player does not have a natural, then he can request
additionalcards, one by one (hits), until he either stops (sticks)
or exceeds 21
(goes bust). If he goesbust, he loses; if he sticks,
then it becomes the dealer’s turn. The dealer hits or sticksaccording
to a fixed strategy without choice: he sticks on any sum of 17 or
greater, andhits otherwise. If the dealer goes bust, then the
player wins; otherwise, the outcome -win,lose, or draw- is
determined by whose final sum is closer to 21
.
Playing blackjack is naturally formulated as an episodic finite MDP. Each game
ofblackjack is an episode. Rewards of +1
,-1
, and 0
are given
for winning, losing, anddrawing, respectively. All rewards
within a game are zero, and we do not discount (gamma = 1
); therefore
these terminal rewards are also the returns. The player’s actions
are to hit orto stick. The states depend on the player’s cards
and the dealer’s showing card. Weassume that cards are dealt from an
infinite deck (i.e., with replacement) so that there isno advantage
to keeping track of the cards already dealt. If the player
holds an ace thathe could count as 11
without going bust, then
the ace is said to beusable. In this caseit is always counted as
11 because counting it as 1 would make the sum 11
or less, in
which case there is no decision to be made because, obviously,
the player should alwayshit. Thus, the player makes decisions
on the basis of three variables: his current sum(12–21),
the dealer’s one showing card (ace–10), and whether or not he
holds a usableace. This makes for a total of 200
states.
This implementation must behave as OpenAI Gym Blackjack-v0 implementation.
Link to this section Summary
Functions
Returns a specification to start this module under a supervisor
Link to this section Types
t()
t() :: %Gyx.Environments.Pure.Blackjack{
action_space: Gyx.Core.Spaces.Discrete.t(),
dealer: list(),
dealer_sum: term(),
done: bool(),
observation_space: Gyx.Core.Spaces.Tuple.t(),
player: list(),
player_sum: term()
}
t() :: %Gyx.Environments.Pure.Blackjack{ action_space: Gyx.Core.Spaces.Discrete.t(), dealer: list(), dealer_sum: term(), done: bool(), observation_space: Gyx.Core.Spaces.Tuple.t(), player: list(), player_sum: term() }
Link to this section Functions
child_spec(init_arg)
Returns a specification to start this module under a supervisor.
See Supervisor
.