seldonian.RL.environments.gridworld.Gridworld

class Gridworld(size=3)

Bases: Environment

__init__(size=3)

Square 2D gridworld RL environment of arbitrary size. Actions: 0,1,2,3 -> up,right,down,left. Hardcoded such that entering state 7 returns a reward=-1. Reward=1 when entering terminal state (bottom right), and Reward=0 elsewhere.

Parameters:

size – The number of grid cells on a side

Variables:
  • num_states – The number of distinct grid cells

  • env_description (Env_Description) – contains attributes describing the environment

  • state – The location in the gridworld, ranging from 0 (top left) to size**2 (bottom right)

  • terminal_state (bool) – Whether the terminal obs is occupied

  • time (int) – The current timestep

  • max_time (int) – Maximum allowed timestep

  • gamma (float) – The discount factor in calculating the expected return

__repr__()

Return repr(self).

Methods

create_env_description(num_states)

Creates the environment description object.

Parameters:

num_states – The number of states

Returns:

Environment description for the obs and action spaces

Return type:

Env_Description

get_env_description()

Get environment description. Override this method in child class implementation

get_observation()

Get the current obs

is_in_goal_state()

Check whether current obs is goal obs

Returns:

True if obs is in goal obs, False if not

reset()

Go back to initial obs and timestep

start_visualizing()

Turn on visualization debugger.

stop_visualizing()

Turn off visualization debugger.

terminated()

Get the terminal observation

transition(action)

Transition between states given an action, return a reward.

Parameters:

action – A possible action at the current obs

Returns:

reward for reaching the next obs

update_position(action)

Helper function for transition() that updates the current position given an action

Parameters:

action – A possible action at the current obs

visualize()

Print out current obs information