seldonian.dataset.Episode¶
- class Episode(observations, actions, rewards, action_probs, alt_rewards=[])¶
Bases:
object
- __init__(observations, actions, rewards, action_probs, alt_rewards=[])¶
Object for holding RL episodes.
- Parameters:
observations – List of observations at each timestep.
actions – List of actions at each timestep.
rewards – List of primary rewards at each timestep.
action_probs – List of action probabilities from the behavior policy at each timestep.
alt_rewards (numpy.ndarray) – A 2D numpy array where each column contains the rewards for a new reward function other than the primary reward function.
- __repr__()¶
Return repr(self).
Methods