experiments.baselines.diabetes_US_baseline.RLDiabetesUSAgentBaseline¶

class RLDiabetesUSAgentBaseline(initial_solution, env_kwargs, bb_crmin=5.0, bb_crmax=15.0, bb_cfmin=15.0, bb_cfmax=25.0, cr_shrink_factor=1.7320508075688772, cf_shrink_factor=1.7320508075688772)¶

Bases: object

__init__(initial_solution, env_kwargs, bb_crmin=5.0, bb_crmax=15.0, bb_cfmin=15.0, bb_cfmax=25.0, cr_shrink_factor=1.7320508075688772, cf_shrink_factor=1.7320508075688772)¶

Implements an RL baseline that uses importance sampling with unequal support (US) with a fixed area policy.

Parameters:

initial_solution – Initial policy parameters
env_kwargs – Environment-specific keyword arguments
bb_crmin – Bounding box minimum in CR dimension
bb_crmax – Bounding box maximum in CR dimension
bb_cfmin – Bounding box minimum in CF dimension
bb_cfmax – Bounding box maximum in CF dimension
cr_shrink_factor – Factor to shrink the bounding box CR size of the box by for this fixed area policy
cf_shrink_factor – Factor to shrink the bounding box CR size of the box by for this fixed area policy

__repr__()¶: Return repr(self).

Methods

primary_objective_fn(theta)¶

This is the function we want to minimize. In RL, we want to maximize the expected return so we need to minimize the negative expected return.

Parameters:: theta – Model weights
Returns:: the negative expected return of the history in self.episodes

set_new_params(new_params)¶

Set the parameters of the agent

Parameters:: new_params – array of weights

train(dataset, **kwargs)¶

Run CMA-ES starting with a random initial policy parameterization

Parameters:: dataset – A seldonian.dataset.RLDataSet object containing the episodes
Return solution:: The fitted policy parameters

experiments.baselines.diabetes_US_baseline.RLDiabetesUSAgentBaseline¶

Seldonian Experiments

Navigation

Related Topics