seldonian.RL.Agents.Parameterized_non_learning_softmax_agent.Parameterized_non_learning_softmax_agent¶

class Parameterized_non_learning_softmax_agent(env_description, hyperparam_and_setting_dict)¶

__init__(env_description, hyperparam_and_setting_dict)¶

RL agent that takes actions using parametrized softmax function: \(\pi(s,a) = \frac{e^{p(s,a)}}{\sum_{a'}{e^{p(s,a')}}}\)

Parameters:

env_description (Env_Description) – an object for accessing attributes of the environment
hyperparam_and_setting_dict – Contains additional info about the environment and data generation

Variables:

softmax (Softmax) – The policy

Methods

choose_action(obs)¶

Select an action given a observation

Parameters:: obs – The current observation of the agent, type depends on environment
Returns:: array of actions

get_action_values(obs)¶

Get all possible actions from this state using the FA

Parameters:: obs – The current observation of the agent, type depends on environment.

get_params()¶

Get the current parameters (weights) of the agent

get_prob_this_action(observation, action)¶

Get the probability of a selected action in a given obs using softmax.

Parameters:

Returns:

probability of action

Return type:

float

set_new_params(new_params)¶

Set the parameters of the agent

update(observation, next_observation, reward, terminated)¶: Updates agent’s parameters according to the learning rule. Not implemented for this agent.

Seldonian Engine