seldonian.RL.environments.gridworld.Gridworld¶
- class Gridworld(size=3)¶
Bases:
Environment
- __init__(size=3)¶
Square 2D gridworld RL environment of arbitrary size. Actions: 0,1,2,3 -> up,right,down,left. Hardcoded such that entering state 7 returns a reward=-1. Reward=1 when entering terminal state (bottom right), and Reward=0 elsewhere.
- Parameters:
size – The number of grid cells on a side
- Variables:
num_states – The number of distinct grid cells
env_description (
Env_Description
) – contains attributes describing the environmentstate – The location in the gridworld, ranging from 0 (top left) to size**2 (bottom right)
terminal_state (bool) – Whether the terminal obs is occupied
time (int) – The current timestep
max_time (int) – Maximum allowed timestep
gamma (float) – The discount factor in calculating the expected return
- __repr__()¶
Return repr(self).
Methods
- create_env_description(num_states)¶
Creates the environment description object.
- Parameters:
num_states – The number of states
- Returns:
Environment description for the obs and action spaces
- Return type:
- get_env_description()¶
Get environment description. Override this method in child class implementation
- get_observation()¶
Get the current obs
- is_in_goal_state()¶
Check whether current obs is goal obs
- Returns:
True if obs is in goal obs, False if not
- reset()¶
Go back to initial obs and timestep
- start_visualizing()¶
Turn on visualization debugger.
- stop_visualizing()¶
Turn off visualization debugger.
- terminated()¶
Get the terminal observation
- transition(action)¶
Transition between states given an action, return a reward.
- Parameters:
action – A possible action at the current obs
- Returns:
reward for reaching the next obs
- update_position(action)¶
Helper function for transition() that updates the current position given an action
- Parameters:
action – A possible action at the current obs
- visualize()¶
Print out current obs information