seldonian.dataset.Episode

class Episode(observations, actions, rewards, action_probs, alt_rewards=[])

Bases: object

__init__(observations, actions, rewards, action_probs, alt_rewards=[])

Object for holding RL episodes.

Parameters:
  • observations – List of observations at each timestep.

  • actions – List of actions at each timestep.

  • rewards – List of primary rewards at each timestep.

  • action_probs – List of action probabilities from the behavior policy at each timestep.

  • alt_rewards (numpy.ndarray) – A 2D numpy array where each column contains the rewards for a new reward function other than the primary reward function.

__repr__()

Return repr(self).

Methods