seldonian.RL.RL_model.RL_model

class RL_model(policy, env_kwargs)

Bases: SeldonianModel

__init__(policy, env_kwargs)

Base class for all RL models.

Parameters:
  • policy (Policy) – A policy parameterization

  • env_kwargs (dict) – Kwargs pertaining to environment such as gamma, the discount factor

__repr__()

Return repr(self).

Methods

get_probs_from_observations_and_actions(new_params, observations, actions, action_probs)

Get action probablities under policy with new parameters. Just a wrapper to call policy method of same name.

Parameters:
  • new_params – New policy parameter weights to set

  • observations – Array of observations

  • actions – Array of actions

  • action_probs – Array of action probabilities from the behavior policy

Returns:

Array of action probabilities under the new policy