experiments.baselines.diabetes_US_baseline.RLDiabetesUSAgentBaseline

class RLDiabetesUSAgentBaseline(initial_solution, env_kwargs, bb_crmin=5.0, bb_crmax=15.0, bb_cfmin=15.0, bb_cfmax=25.0, cr_shrink_factor=1.7320508075688772, cf_shrink_factor=1.7320508075688772)

Bases: object

__init__(initial_solution, env_kwargs, bb_crmin=5.0, bb_crmax=15.0, bb_cfmin=15.0, bb_cfmax=25.0, cr_shrink_factor=1.7320508075688772, cf_shrink_factor=1.7320508075688772)

Implements an RL baseline that uses importance sampling with unequal support (US) with a fixed area policy.

Parameters:
  • initial_solution – Initial policy parameters

  • env_kwargs – Environment-specific keyword arguments

  • bb_crmin – Bounding box minimum in CR dimension

  • bb_crmax – Bounding box maximum in CR dimension

  • bb_cfmin – Bounding box minimum in CF dimension

  • bb_cfmax – Bounding box maximum in CF dimension

  • cr_shrink_factor – Factor to shrink the bounding box CR size of the box by for this fixed area policy

  • cf_shrink_factor – Factor to shrink the bounding box CR size of the box by for this fixed area policy

__repr__()

Return repr(self).

Methods

primary_objective_fn(theta)

This is the function we want to minimize. In RL, we want to maximize the expected return so we need to minimize the negative expected return.

Parameters:

theta – Model weights

Returns:

the negative expected return of the history in self.episodes

set_new_params(new_params)

Set the parameters of the agent

Parameters:

new_params – array of weights

train(dataset, **kwargs)

Run CMA-ES starting with a random initial policy parameterization

Parameters:

dataset – A seldonian.dataset.RLDataSet object containing the episodes

Return solution:

The fitted policy parameters