seldonian.RL.Agents.Policies.Softmax.DiscreteSoftmax¶

class DiscreteSoftmax(hyperparam_and_setting_dict, env_description)¶

Bases: Softmax

__init__(hyperparam_and_setting_dict, env_description)¶: Softmax where both observations and actions are discrete. Faster than just using Softmax class because a cache is used for lookups to Q Table

__repr__()¶: Return repr(self).

Methods

_arg(observation, action)¶

Helper function to accelerate action probability calculation

Parameters:

observation (int) – A observation of the environment
action (int) – A possible action at the given observation

_denom(observation)¶

Helper function to accelerate action probability calculation

Parameters:: observation (int) – An observation of the environment

choose_action(obs)¶

Select an action given an observation

Parameters:: obs – An observation of the environment
Returns:: array of actions

choose_action_from_action_values(action_values)¶

Select an action given a list of action values

Parameters:: action_values – List of action values (param weights)

construct_basis_and_linear_FA(env_description, hyperparam_and_setting_dict)¶

Create a basis and linear function approximator from an environment description and dictionary specification

Parameters:

env_description (Env_Description) – an object for accessing attributes of the environment
hyperparameter_and_setting_dict – Specifies the environment, agent, number of episodes per trial, and number of trials

from_0_indexed_action_to_environment_action(action_0_indexed)¶: Convert 0-indexed action to env-specific action

from_environment_action_to_0_indexed_action(env_action)¶: Convert env-specific action to 0 indexed action

get_action_probs_from_action_values(action_values)¶

Get action probabilities given a list of action values

Parameters:: action_values – List of action values (param weights)
Returns:: array of action probabilites

get_action_values_given_state(obs)¶: Get all parameter weights possible in a given observation

get_e_to_the_something_terms(action_values)¶

Exponentiate list of action values

Parameters:: action_values – List of action values (param weights)
Returns:: array of exponentiated action values

get_params()¶

Get the current parameters (weights) of the agent

Returns:: array of weights

get_prob_this_action(observation, action)¶

Get the probability of a selected action in a given obsertavtion

Parameters:

observation – The current obseravation of the environment
action – The selected action
action_prob – The probability of the selected action

Returns:

probability of action

Return type:

float

get_probs_from_observations_and_actions(observations, actions, behavior_action_probs)¶

Get the action probabilities of selected actions and observations under the new policy.

Parameters:

observations – array of observations of the environment
actions – array of selected actions
behavior_action_probs – The probability of the selected actions under the behavior policy

Returns:

action probabilities of the observation,action pairs under the new policy

Return type:

numpy.ndarray(float)

make_state_action_FA(env_description, hyperparam_and_setting_dict)¶

Create a function approximator from an environment description and dictionary specification

Parameters:

env_description (Env_Description) – an object for accessing attributes of the environment
hyperparameter_and_setting_dict – Specifies the environment, agent, number of episodes per trial, and number of trials

Returns:

function approximator, type depends on whether observation space is discrete or continous

set_new_params(new_params)¶

Set the parameters of the agent

Parameters:: new_params – array of weights

seldonian.RL.Agents.Policies.Softmax.DiscreteSoftmax¶

Seldonian Engine

Navigation

Related Topics