seldonian.RL.Agents.Parameterized_non_learning_softmax_agent.Parameterized_non_learning_softmax_agent¶
- class Parameterized_non_learning_softmax_agent(env_description, hyperparam_and_setting_dict)¶
Bases:
Agent
- __init__(env_description, hyperparam_and_setting_dict)¶
RL agent that takes actions using parametrized softmax function: \(\pi(s,a) = \frac{e^{p(s,a)}}{\sum_{a'}{e^{p(s,a')}}}\)
- Parameters:
env_description (
Env_Description
) – an object for accessing attributes of the environmenthyperparam_and_setting_dict – Contains additional info about the environment and data generation
- Variables:
softmax (
Softmax
) – The policy
- __repr__()¶
Return repr(self).
Methods
- choose_action(obs)¶
Select an action given a observation
- Parameters:
obs – The current observation of the agent, type depends on environment
- Returns:
array of actions
- get_action_values(obs)¶
Get all possible actions from this state using the FA
- Parameters:
obs – The current observation of the agent, type depends on environment.
- get_params()¶
Get the current parameters (weights) of the agent
- Returns:
array of weights
- get_policy()¶
Retrieve the agent’s policy object
- get_prob_this_action(observation, action)¶
Get the probability of a selected action in a given obs using softmax.
- Parameters:
observation – The current obs of the agent, type depends on environment.
action – The action selected, type depends on environment
- Returns:
probability of action
- Return type:
float
- set_new_params(new_params)¶
Set the parameters of the agent
- Parameters:
new_params – array of weights
- update(observation, next_observation, reward, terminated)¶
Updates agent’s parameters according to the learning rule. Not implemented for this agent.