seldonian.RL.Agents.Policies.Softmax.DiscreteSoftmax¶
- class DiscreteSoftmax(hyperparam_and_setting_dict, env_description)¶
Bases:
Softmax
- __init__(hyperparam_and_setting_dict, env_description)¶
Softmax where both observations and actions are discrete. Faster than just using Softmax class because a cache is used for lookups to Q Table
- __repr__()¶
Return repr(self).
Methods
- _arg(observation, action)¶
Helper function to accelerate action probability calculation
- Parameters:
observation (int) – A observation of the environment
action (int) – A possible action at the given observation
- _denom(observation)¶
Helper function to accelerate action probability calculation
- Parameters:
observation (int) – An observation of the environment
- choose_action(obs)¶
Select an action given an observation
- Parameters:
obs – An observation of the environment
- Returns:
array of actions
- choose_action_from_action_values(action_values)¶
Select an action given a list of action values
- Parameters:
action_values – List of action values (param weights)
- construct_basis_and_linear_FA(env_description, hyperparam_and_setting_dict)¶
Create a basis and linear function approximator from an environment description and dictionary specification
- Parameters:
env_description (
Env_Description
) – an object for accessing attributes of the environmenthyperparameter_and_setting_dict – Specifies the environment, agent, number of episodes per trial, and number of trials
- from_0_indexed_action_to_environment_action(action_0_indexed)¶
Convert 0-indexed action to env-specific action
- from_environment_action_to_0_indexed_action(env_action)¶
Convert env-specific action to 0 indexed action
- get_action_probs_from_action_values(action_values)¶
Get action probabilities given a list of action values
- Parameters:
action_values – List of action values (param weights)
- Returns:
array of action probabilites
- get_action_values_given_state(obs)¶
Get all parameter weights possible in a given observation
- get_e_to_the_something_terms(action_values)¶
Exponentiate list of action values
- Parameters:
action_values – List of action values (param weights)
- Returns:
array of exponentiated action values
- get_params()¶
Get the current parameters (weights) of the agent
- Returns:
array of weights
- get_prob_this_action(observation, action)¶
Get the probability of a selected action in a given obsertavtion
- Parameters:
observation – The current obseravation of the environment
action – The selected action
action_prob – The probability of the selected action
- Returns:
probability of action
- Return type:
float
- get_probs_from_observations_and_actions(observations, actions, behavior_action_probs)¶
Get the action probabilities of selected actions and observations under the new policy.
- Parameters:
observations – array of observations of the environment
actions – array of selected actions
behavior_action_probs – The probability of the selected actions under the behavior policy
- Returns:
action probabilities of the observation,action pairs under the new policy
- Return type:
numpy.ndarray(float)
- make_state_action_FA(env_description, hyperparam_and_setting_dict)¶
Create a function approximator from an environment description and dictionary specification
- Parameters:
env_description (
Env_Description
) – an object for accessing attributes of the environmenthyperparameter_and_setting_dict – Specifies the environment, agent, number of episodes per trial, and number of trials
- Returns:
function approximator, type depends on whether observation space is discrete or continous
- set_new_params(new_params)¶
Set the parameters of the agent
- Parameters:
new_params – array of weights