experiments.baselines.diabetes_US_baseline.RLDiabetesUSAgentBaseline¶
- class RLDiabetesUSAgentBaseline(initial_solution, env_kwargs, bb_crmin=5.0, bb_crmax=15.0, bb_cfmin=15.0, bb_cfmax=25.0, cr_shrink_factor=1.7320508075688772, cf_shrink_factor=1.7320508075688772)¶
Bases:
object
- __init__(initial_solution, env_kwargs, bb_crmin=5.0, bb_crmax=15.0, bb_cfmin=15.0, bb_cfmax=25.0, cr_shrink_factor=1.7320508075688772, cf_shrink_factor=1.7320508075688772)¶
Implements an RL baseline that uses importance sampling with unequal support (US) with a fixed area policy.
- Parameters:
initial_solution – Initial policy parameters
env_kwargs – Environment-specific keyword arguments
bb_crmin – Bounding box minimum in CR dimension
bb_crmax – Bounding box maximum in CR dimension
bb_cfmin – Bounding box minimum in CF dimension
bb_cfmax – Bounding box maximum in CF dimension
cr_shrink_factor – Factor to shrink the bounding box CR size of the box by for this fixed area policy
cf_shrink_factor – Factor to shrink the bounding box CR size of the box by for this fixed area policy
- __repr__()¶
Return repr(self).
Methods
- primary_objective_fn(theta)¶
This is the function we want to minimize. In RL, we want to maximize the expected return so we need to minimize the negative expected return.
- Parameters:
theta – Model weights
- Returns:
the negative expected return of the history in self.episodes
- set_new_params(new_params)¶
Set the parameters of the agent
- Parameters:
new_params – array of weights
- train(dataset, **kwargs)¶
Run CMA-ES starting with a random initial policy parameterization
- Parameters:
dataset – A seldonian.dataset.RLDataSet object containing the episodes
- Return solution:
The fitted policy parameters