seldonian.spec.SupervisedSpec

class SupervisedSpec(dataset, model, parse_trees, sub_regime, frac_data_in_safety=0.6, primary_objective=None, initial_solution_fn=None, base_node_bound_method_dict={}, use_builtin_primary_gradient_fn=True, custom_primary_gradient_fn=None, optimization_technique='gradient_descent', optimizer='adam', optimization_hyperparams={'alpha_lamb': 0.005, 'alpha_theta': 0.005, 'beta_rmsprop': 0.95, 'beta_velocity': 0.9, 'gradient_library': 'autograd', 'hyper_search': None, 'lambda_init': 0.5, 'num_iters': 200, 'use_batches': False, 'verbose': False}, regularization_hyperparams={}, batch_size_safety=None, candidate_dataset=None, additional_datasets={}, safety_dataset=None, verbose=False)

Bases: Spec

Specification object for running Supervised learning Seldonian algorithms

Parameters:
  • dataset (DataSet object) – The dataset object containing safety data

  • model – The SeldonianModel object

  • parse_trees – List of parse tree objects containing the behavioral constraints

  • sub_regime – “classification” or “regression”

  • frac_data_in_safety (float) – Fraction of data used in safety test. The remaining fraction will be used in candidate selection

  • primary_objective – The objective function that would be solely optimized in the absence of behavioral constraints, i.e. the loss function

  • initial_solution_fn – Function to provide initial model weights in candidate selection

  • base_node_bound_method_dict (dict, defaults to {}) – A dictionary specifying the bounding method to use for each base node

  • use_builtin_primary_gradient_fn (bool, defaults to True) – Whether to use the built-in function for the gradient of the primary objective, if one exists. If False, uses autograd

  • custom_primary_gradient_fn (function, defaults to None) – A function for computing the gradient of the primary objective. If None, falls back on builtin function or autograd

  • optimization_technique (str, defaults to 'gradient_descent') – The method for optimization during candidate selection. E.g. ‘gradient_descent’, ‘barrier_function’

  • optimizer (str, defaults to 'adam') – The string name of the optimizer used during candidate selection

  • optimization_hyperparams (dict) – Hyperparameters for optimization during candidate selection. See Candidate Selection.

  • regularization_hyperparams (dict) – Hyperparameters for regularization during candidate selection. See Candidate Selection.

  • batch_size_safety (int, defaults to None) – The number of samples that are forward passed through the model during the safety test. Value does not change result, but sometimes is necessary when dataset is large to avoid memory overflow.

  • candidate_dataset (DataSet, defaults to None) – An dataset to use explicitly for candidate selection. If provided, overrides the data splitting and dataset is not used.

  • safety_dataset (DataSet, defaults to None) – An dataset to use explicitly for the safety test. If provided in conjuction with candidate_dataset, overrides the data splitting and dataset is not used.

  • additional_datasets (dict, defaults to {}) – Specifies optional additional datasets to use for bounding the base nodes of the parse trees.

__init__(dataset, model, parse_trees, sub_regime, frac_data_in_safety=0.6, primary_objective=None, initial_solution_fn=None, base_node_bound_method_dict={}, use_builtin_primary_gradient_fn=True, custom_primary_gradient_fn=None, optimization_technique='gradient_descent', optimizer='adam', optimization_hyperparams={'alpha_lamb': 0.005, 'alpha_theta': 0.005, 'beta_rmsprop': 0.95, 'beta_velocity': 0.9, 'gradient_library': 'autograd', 'hyper_search': None, 'lambda_init': 0.5, 'num_iters': 200, 'use_batches': False, 'verbose': False}, regularization_hyperparams={}, batch_size_safety=None, candidate_dataset=None, additional_datasets={}, safety_dataset=None, verbose=False)
__repr__()

Return repr(self).

Methods

validate_additional_datasets(additional_datasets)

Ensure that the additional datasets dict is valid. It is valid if 1) the strings for the parse trees and base nodes match what is in the parse trees. 2) Either a “dataset” key is present or

BOTH “candidate_dataset” and “safety_dataset” are present in each subdict.

Also, for the missing parse trees and base nodes, fill those entires with the primary dataset or candidate/safety split if provided.

Parameters:

additional_datasets (dict, defaults to {}) – Specifies optional additional datasets to use for bounding the base nodes of the parse trees.

validate_custom_datasets(candidate_dataset, safety_dataset)

Ensure that if either candidate_dataset or safety_dataset is specified, both are specified.

Parameters:
  • candidate_dataset – The dataset provided by the user to be used for candidate selection.

  • safety_dataset – The dataset provided by the user to be used for the safety test.

validate_parse_trees(parse_trees)

Ensure that there are no duplicate constraints in a list of parse trees

Parameters:

parse_trees – List of ParseTree objects