seldonian.RL.Agents.Policies.Policy.Discrete_Action_Policy¶
- class Discrete_Action_Policy(hyperparam_and_setting_dict, env_description)¶
Bases:
Policy
- __init__(hyperparam_and_setting_dict, env_description)¶
General policy class where actions are discrete. Converts input actions into 0-indexed actions.
- Parameters:
hyperparameter_and_setting_dict – Specifies the environment, agent, number of episodes per trial, and number of trials
env_description (
Env_Description
) – an object for accessing attributes of the environment
- __repr__()¶
Return repr(self).
Methods
- choose_action(obs)¶
Defines how to select an action given an observation, obs
- construct_basis_and_linear_FA(env_description, hyperparam_and_setting_dict)¶
Create a basis and linear function approximator from an environment description and dictionary specification
- Parameters:
env_description (
Env_Description
) – an object for accessing attributes of the environmenthyperparameter_and_setting_dict – Specifies the environment, agent, number of episodes per trial, and number of trials
- from_0_indexed_action_to_environment_action(action_0_indexed)¶
Convert 0-indexed action to env-specific action
- from_environment_action_to_0_indexed_action(env_action)¶
Convert env-specific action to 0 indexed action
- get_action_values_given_state(obs)¶
Get all parameter weights possible in a given observation
- get_params()¶
Get the current parameters (weights) of the agent
- Returns:
array of weights
- get_prob_this_action(obs, action)¶
Get probability of taking an action given an observation. Does not necessarily need to be overridden, but is often called from self.get_probs_from_observations_and_actions()
- get_probs_from_observations_and_actions(observations, actions, behavior_action_probs)¶
Get probabilities for each observation and action in the input arrays
- make_state_action_FA(env_description, hyperparam_and_setting_dict)¶
Create a function approximator from an environment description and dictionary specification
- Parameters:
env_description (
Env_Description
) – an object for accessing attributes of the environmenthyperparameter_and_setting_dict – Specifies the environment, agent, number of episodes per trial, and number of trials
- Returns:
function approximator, type depends on whether observation space is discrete or continous
- set_new_params(new_params)¶
Set the parameters of the agent
- Parameters:
new_params – array of weights