seldonian.RL.Agents.Parameterized_non_learning_softmax_agent.Parameterized_non_learning_softmax_agent

class Parameterized_non_learning_softmax_agent(env_description, hyperparam_and_setting_dict)

Bases: Agent

__init__(env_description, hyperparam_and_setting_dict)

RL agent that takes actions using parametrized softmax function: \(\pi(s,a) = \frac{e^{p(s,a)}}{\sum_{a'}{e^{p(s,a')}}}\)

Parameters:
  • env_description (Env_Description) – an object for accessing attributes of the environment

  • hyperparam_and_setting_dict – Contains additional info about the environment and data generation

Variables:

softmax (Softmax) – The policy

__repr__()

Return repr(self).

Methods

choose_action(obs)

Select an action given a observation

Parameters:

obs – The current observation of the agent, type depends on environment

Returns:

array of actions

get_action_values(obs)

Get all possible actions from this state using the FA

Parameters:

obs – The current observation of the agent, type depends on environment.

get_params()

Get the current parameters (weights) of the agent

Returns:

array of weights

get_policy()

Retrieve the agent’s policy object

get_prob_this_action(observation, action)

Get the probability of a selected action in a given obs using softmax.

Parameters:
  • observation – The current obs of the agent, type depends on environment.

  • action – The action selected, type depends on environment

Returns:

probability of action

Return type:

float

set_new_params(new_params)

Set the parameters of the agent

Parameters:

new_params – array of weights

update(observation, next_observation, reward, terminated)

Updates agent’s parameters according to the learning rule. Not implemented for this agent.