seldonian.hyperparam_search.HyperparamSearch¶

class HyperparamSearch(spec, hyperparam_spec, results_dir, write_logfile=False)¶

Bases: object

__init__(spec, hyperparam_spec, results_dir, write_logfile=False)¶

Class for finding the best hyperparameters to use to optimize for probability of returning a safe solution for Seldonian algorithms.

Parameters:

spec (Spec object) – The specification object with the complete set of parameters for running the Seldonian algorithm
hyperparam_spec (HyperparameterSelectionSpec object) – The specification object with the complete set of parameters for doing hyparpameter selection
results_dir (str) – The directory where results will be saved
write_logfile (Bool) – Whether to write out logs from hyperparameter optimization

__repr__()¶: Return repr(self).

Methods

_get_theta_init_from_hyper_dict()¶: Utility function for packing hyperparam initial values into a 1D vector for CMA-ES.

_unpack_theta_to_hyperparam_values(theta)¶

Utility function for unpacking hyperparam values from a 1D vector used in CMA-ES to values we can inject into a Seldonian Spec object.

Parameters:: theta – Vector of hyperparameters

aggregate_est_prob_pass(est_frac_data_in_safety, bootstrap_savedir)¶

Compute the estimated probability of passing using the result files in bootstrap_savedir.

Parameters:

est_frac_data_in_safety (float) – fraction of data in safety set that we want to estimate the probabiilty of returning a solution for
bootstrap_savedir (str) – root diretory to load results from bootstrap trial, and write aggregated result

candidate_safety_combine(candidate_dataset, safety_dataset)¶

Combine candidate_dataset and safety_dataset into a full dataset. The data will be joined so that the candidate data comes before the safety data.

Parameters:

candidate_dataset – a dataset object containing data
safety_dataset (DataSet object) – a dataset object containing data

Returns:

combinded_dataset, a dataset containing candidate and safety dataset

Return type:

DataSet object

candidate_safety_split(dataset, frac_data_in_safety)¶

Split features, labels and sensitive attributes into candidate and safety sets according to frac_data_in_safety

Parameters:

dataset (DataSet object) – a dataset object containing data
frac_data_in_safety (float) – Fraction of data used in safety test. The remaining fraction will be used in candidate selection

Returns:

F_c,F_s,L_c,L_s,S_c,S_s where F=features, L=labels, S=sensitive attributes

Return type:

Tuple

cmaes_objective(theta, frac_data_in_safety, fixed_hyperparam_setting)¶

The objective function that CMA-ES tries to minimize. We want to minimize (1-prob_pass) in order to maximize prob_pass. Need to return the thing we are trying to minimze.

Parameters:

theta – Vector of hyperparameters
frac_data_in_safety – Fraction of data going to safety test
fixed_hyperparam_setting – The hyperparameters from grid search that are frozen for this CMA-ES run.

create_bootstrap_trial_spec(bootstrap_trial_i, frac_data_in_safety, bootstrap_savedir, hyperparam_setting=None)¶

Create the spec to run this iteration of the bootstrap trial.

Parameters:

bootstrap_trial_i (int) – Indicates which trial we are currently running
frac_data_in_safety (float) – fraction of data used in safety test to split the datasets for the trial.
bootstrap_savedir (str) – The root diretory to save all the bootstrapped datasets.

Returns:

spec_for_bootstrap_trial

Return type:

Spec

create_dataset(dataset, frac_data_in_safety, shuffle=False)¶

Partition data to create candidate and safety dataset according to: frac_data_in_safety.

Parameters:

dataset (DataSet object) – a dataset object containing data
frac_data_in_safety (float) – fraction of data used in safety test, the remaining fraction will be used in candidate selection
shuffle (bool) – bool indicating if we should shuffle the dataset before splitting it into candidate and safety datasets

Returns:

(candidate_dataset, safety_dataset). candidate_dataset and safety_datasets are the resulting datasets after partitioning the dataset.

Return type:

Tuple containing two .DataSet objects.

find_best_frac_data_in_safety(threshold=0.01)¶

Find the best frac_data_in_safety to use for the Seldonian algorithm.

Returns:: (frac_data_in_safety, candidate_dataset, safety_dataset). frac_data_in_safety indicates the percentage of total data that is included in the safety dataset. candidate_dataset and safety_dataset are dataset objects containing data from elf.dataset split according to frac_data_in_safety
Rtyle:: Tuple

find_best_hyperparameters(frac_data_in_safety, **kwargs)¶: Does hyperparameter tuning for all hyperparameters in HyperSchema.hyper_dict. Figures out which ones are to be grid-searched and which are to be optimized with CMA-ES, constructs the grid, then runs the tuning.

generate_all_bootstrap_datasets(candidate_dataset, frac_data_in_safety, n_bootstrap_samples_candidate, n_bootstrap_samples_safety, bootstrap_savedir)¶

Utility function for supervised learning to generate the resampled datasets to use in each bootstrap trial. Resamples (with replacement) features, labels and sensitive attributes to create self.hyperparam_spec.n_bootstrap_trials versions of these. Saves pickle files.

Parameters:

candidate_dataset (DataSet object) – Dataset object containing candidate solution dataset. This is the dataset we will be bootstrap sampling from.
frac_data_in_safety (float) – fraction of data in safety set that we want to estimate the probabiilty of returning a solution for
n_bootstrap_samples_candidate (int) – The size of the candidate selection bootstrapped dataset
n_bootstrap_samples_safety – The size of the safety bootstrapped dataset
bootstrap_savedir (str) – The root diretory to save all the bootstrapped datasets.

get_all_greater_est_prob_pass()¶: Compute the estimated probability of passing for all safety fractions in self.all_frac_data_in_safety.

get_bootstrap_dataset_size(frac_data_in_safety)¶

Computes the number of datapoints that should go into the bootstrapped: candidate and safety datasets according to frac_data_in_safety.

Parameters:: frac_data_in_safety (float) – fraction of data in safety set that we want to estimate the probabiilty of returning a solution for

get_est_prob_pass(frac_data_in_safety, bootstrap_savedir, hyperparam_setting=None)¶

Estimates probability of returning a solution with rho_prime fraction of data: in candidate selection.

Parameters:

frac_data_in_safety (float) – fraction of data in safety set that we want to estimate the probabiilty of returning a solution for
n_bootstrap_samples_candidate – size of candidate dataset sampled in bootstrap
n_bootstrap_samples_safety (int) – size of safety dataset sampled in bootstrap
bootstrap_savedir (str) – root diretory to store bootstrap datasets and results

get_gridsearchable_hyperparameter_iterator()¶

Create iterator for every combination of grid-searchable hyperparameter values that we want to: optimize for.

get_safety_size(n_total, frac_data_in_safety)¶

Determine the number of data points in the safety dataset.

Parameters:

n_total (int) – the size of the total dataset
frac_data_in_safety (float) – fraction of data used in safety test, the remaining fraction will be used in candidate selection

Returns:

n_safety, the desired size of the safety dataset

Return type:

int

powell_objective(theta, frac_data_in_safety, fixed_hyperparam_setting)¶

The objective function that Powell tries to minimize. We want to minimize (1-prob_pass) in order to maximize prob_pass. Need to return the thing we are trying to minimze.

Parameters:

theta – Vector of hyperparameters
frac_data_in_safety – Fraction of data going to safety test
fixed_hyperparam_setting – The hyperparameters from grid search that are frozen for this run.

run_bootstrap_trial(bootstrap_trial_i, frac_data_in_safety, parent_savedir, hyperparam_setting=None)¶

Run bootstrap train bootstrap_trial_i to estimate the probability of passing with frac_data_in_safety.

Returns a boolean indicating if the bootstrap trial was actually run. If the: bootstrap has been already run, will return False.

Parameters:

bootstrap_trial_i (int) – integer indicating which trial of the bootstrap experiment we are currently running. Allows us to identify which bootstrapped dataset to load adn run
frac_data_in_safety (float) – fraction of data in safety set that we want to estimate the probabiilty of returning a solution for
bootstrap_savedir (str) – The root diretory to load bootstrapped dataset and save the result of this bootstrap trial

run_cmaes(frac_data_in_safety, fixed_hyperparam_setting, **kwargs)¶: Run CMA-ES over the hyperparameters that we specified in hyper_dict to have tuning_method = “CMA-ES”. Use fixed values for all other hyperparams.

run_powell(frac_data_in_safety, fixed_hyperparam_setting, **kwargs)¶: Run Powell minimization over a single hyperparameter. This is the fallback optimizer we use when we only have 1 hyperparameter and CMA-ES tuning is specified. CMA-ES is not intended for use in 1D. Use fixed values for all other hyperparams.

seldonian.hyperparam_search.HyperparamSearch¶

Seldonian Engine

Navigation

Related Topics