gyx v0.1.1 Gyx.Environments.Pure.Blackjack

This is an environment implementation of the game of Blackjack as described in Sutton and Barto RL book Example 5.1 cited below.

Exctract from Sutton and Barto RL book: The object of the popular casino card game of blackjack is toobtain cards the sum of whose numerical values is as great as possible without exceeding 21. All face cards count as 10, and an ace can count either as 1 or as 11. We considerthe version in which each player competes independently against the dealer. The gamebegins with two cards dealt to both dealer and player. One of the dealer’s cards is faceup and the other is face down. If the player has 21 immediately (an ace and a 10-card),it is called anatural. He then wins unless the dealer also has a natural, in which case thegame is a draw. If the player does not have a natural, then he can request additionalcards, one by one (hits), until he either stops (sticks) or exceeds 21 (goes bust). If he goesbust, he loses; if he sticks, then it becomes the dealer’s turn. The dealer hits or sticksaccording to a fixed strategy without choice: he sticks on any sum of 17 or greater, andhits otherwise. If the dealer goes bust, then the player wins; otherwise, the outcome -win,lose, or draw- is determined by whose final sum is closer to 21.

Playing blackjack is naturally formulated as an episodic finite MDP. Each game ofblackjack is an episode. Rewards of +1,-1, and 0 are given for winning, losing, anddrawing, respectively. All rewards within a game are zero, and we do not discount (gamma = 1); therefore these terminal rewards are also the returns. The player’s actions are to hit orto stick. The states depend on the player’s cards and the dealer’s showing card. Weassume that cards are dealt from an infinite deck (i.e., with replacement) so that there isno advantage to keeping track of the cards already dealt. If the player holds an ace thathe could count as 11 without going bust, then the ace is said to beusable. In this caseit is always counted as 11 because counting it as 1 would make the sum 11 or less, in which case there is no decision to be made because, obviously, the player should alwayshit. Thus, the player makes decisions on the basis of three variables: his current sum(12–21), the dealer’s one showing card (ace–10), and whether or not he holds a usableace. This makes for a total of 200 states.

This implementation must behave as OpenAI Gym Blackjack-v0 implementation.

Link to this section Summary

Functions

Returns a specification to start this module under a supervisor

Link to this section Types

Link to this type

t()
t() :: %Gyx.Environments.Pure.Blackjack{
  action_space: Gyx.Core.Spaces.Discrete.t(),
  dealer: list(),
  dealer_sum: term(),
  done: bool(),
  observation_space: Gyx.Core.Spaces.Tuple.t(),
  player: list(),
  player_sum: term()
}

Link to this section Functions

Link to this function

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

Link to this function

get_state_abstraction(environment)

Link to this function

start_link(_, opts)