seldonian.RL.Agents.Policies.Policy.Discrete_Action_Policy¶

class Discrete_Action_Policy(hyperparam_and_setting_dict, env_description)¶

__init__(hyperparam_and_setting_dict, env_description)¶

General policy class where actions are discrete. Converts input actions into 0-indexed actions.

Parameters:

hyperparameter_and_setting_dict – Specifies the environment, agent, number of episodes per trial, and number of trials
env_description (Env_Description) – an object for accessing attributes of the environment

Methods

choose_action(obs)¶: Defines how to select an action given an observation, obs

construct_basis_and_linear_FA(env_description, hyperparam_and_setting_dict)¶

Create a basis and linear function approximator from an environment description and dictionary specification

Parameters:

env_description (Env_Description) – an object for accessing attributes of the environment
hyperparameter_and_setting_dict – Specifies the environment, agent, number of episodes per trial, and number of trials

from_0_indexed_action_to_environment_action(action_0_indexed)¶: Convert 0-indexed action to env-specific action

from_environment_action_to_0_indexed_action(env_action)¶: Convert env-specific action to 0 indexed action

get_action_values_given_state(obs)¶: Get all parameter weights possible in a given observation

get_params()¶

Get the current parameters (weights) of the agent

get_prob_this_action(obs, action)¶: Get probability of taking an action given an observation. Does not necessarily need to be overridden, but is often called from self.get_probs_from_observations_and_actions()

get_probs_from_observations_and_actions(observations, actions, behavior_action_probs)¶: Get probabilities for each observation and action in the input arrays

make_state_action_FA(env_description, hyperparam_and_setting_dict)¶

Create a function approximator from an environment description and dictionary specification

Parameters:

env_description (Env_Description) – an object for accessing attributes of the environment
hyperparameter_and_setting_dict – Specifies the environment, agent, number of episodes per trial, and number of trials

Returns:

function approximator, type depends on whether observation space is discrete or continous

set_new_params(new_params)¶

Set the parameters of the agent

Seldonian Engine