seldonian.hyperparam_search.HyperparamSearch¶
- class HyperparamSearch(spec, hyperparam_spec, results_dir, write_logfile=False)¶
Bases:
object
- __init__(spec, hyperparam_spec, results_dir, write_logfile=False)¶
Class for finding the best hyperparameters to use to optimize for probability of returning a safe solution for Seldonian algorithms.
- Parameters:
spec (
Spec
object) – The specification object with the complete set of parameters for running the Seldonian algorithmhyperparam_spec (
HyperparameterSelectionSpec
object) – The specification object with the complete set of parameters for doing hyparpameter selectionresults_dir (str) – The directory where results will be saved
write_logfile (Bool) – Whether to write out logs from hyperparameter optimization
- __repr__()¶
Return repr(self).
Methods
- _get_theta_init_from_hyper_dict()¶
Utility function for packing hyperparam initial values into a 1D vector for CMA-ES.
- _unpack_theta_to_hyperparam_values(theta)¶
Utility function for unpacking hyperparam values from a 1D vector used in CMA-ES to values we can inject into a Seldonian Spec object.
- Parameters:
theta – Vector of hyperparameters
- aggregate_est_prob_pass(est_frac_data_in_safety, bootstrap_savedir)¶
Compute the estimated probability of passing using the result files in bootstrap_savedir.
- Parameters:
est_frac_data_in_safety (float) – fraction of data in safety set that we want to estimate the probabiilty of returning a solution for
bootstrap_savedir (str) – root diretory to load results from bootstrap trial, and write aggregated result
- candidate_safety_combine(candidate_dataset, safety_dataset)¶
Combine candidate_dataset and safety_dataset into a full dataset. The data will be joined so that the candidate data comes before the safety data.
- candidate_safety_split(dataset, frac_data_in_safety)¶
Split features, labels and sensitive attributes into candidate and safety sets according to frac_data_in_safety
- Parameters:
dataset (
DataSet
object) – a dataset object containing datafrac_data_in_safety (float) – Fraction of data used in safety test. The remaining fraction will be used in candidate selection
- Returns:
F_c,F_s,L_c,L_s,S_c,S_s where F=features, L=labels, S=sensitive attributes
- Return type:
Tuple
- cmaes_objective(theta, frac_data_in_safety, fixed_hyperparam_setting)¶
The objective function that CMA-ES tries to minimize. We want to minimize (1-prob_pass) in order to maximize prob_pass. Need to return the thing we are trying to minimze.
- Parameters:
theta – Vector of hyperparameters
frac_data_in_safety – Fraction of data going to safety test
fixed_hyperparam_setting – The hyperparameters from grid search that are frozen for this CMA-ES run.
- create_bootstrap_trial_spec(bootstrap_trial_i, frac_data_in_safety, bootstrap_savedir, hyperparam_setting=None)¶
Create the spec to run this iteration of the bootstrap trial.
- Parameters:
bootstrap_trial_i (int) – Indicates which trial we are currently running
frac_data_in_safety (float) – fraction of data used in safety test to split the datasets for the trial.
bootstrap_savedir (str) – The root diretory to save all the bootstrapped datasets.
- Returns:
spec_for_bootstrap_trial
- Return type:
- create_dataset(dataset, frac_data_in_safety, shuffle=False)¶
- Partition data to create candidate and safety dataset according to
frac_data_in_safety.
- Parameters:
dataset (
DataSet
object) – a dataset object containing datafrac_data_in_safety (float) – fraction of data used in safety test, the remaining fraction will be used in candidate selection
shuffle (bool) – bool indicating if we should shuffle the dataset before splitting it into candidate and safety datasets
- Returns:
(candidate_dataset, safety_dataset). candidate_dataset and safety_datasets are the resulting datasets after partitioning the dataset.
- Return type:
Tuple containing two .DataSet objects.
- find_best_frac_data_in_safety(threshold=0.01)¶
Find the best frac_data_in_safety to use for the Seldonian algorithm.
- Returns:
(frac_data_in_safety, candidate_dataset, safety_dataset). frac_data_in_safety indicates the percentage of total data that is included in the safety dataset. candidate_dataset and safety_dataset are dataset objects containing data from elf.dataset split according to frac_data_in_safety
- Rtyle:
Tuple
- find_best_hyperparameters(frac_data_in_safety, **kwargs)¶
Does hyperparameter tuning for all hyperparameters in HyperSchema.hyper_dict. Figures out which ones are to be grid-searched and which are to be optimized with CMA-ES, constructs the grid, then runs the tuning.
- generate_all_bootstrap_datasets(candidate_dataset, frac_data_in_safety, n_bootstrap_samples_candidate, n_bootstrap_samples_safety, bootstrap_savedir)¶
Utility function for supervised learning to generate the resampled datasets to use in each bootstrap trial. Resamples (with replacement) features, labels and sensitive attributes to create self.hyperparam_spec.n_bootstrap_trials versions of these. Saves pickle files.
- Parameters:
candidate_dataset (
DataSet
object) – Dataset object containing candidate solution dataset. This is the dataset we will be bootstrap sampling from.frac_data_in_safety (float) – fraction of data in safety set that we want to estimate the probabiilty of returning a solution for
n_bootstrap_samples_candidate (int) – The size of the candidate selection bootstrapped dataset
n_bootstrap_samples_safety – The size of the safety bootstrapped dataset
bootstrap_savedir (str) – The root diretory to save all the bootstrapped datasets.
- get_all_greater_est_prob_pass()¶
Compute the estimated probability of passing for all safety fractions in self.all_frac_data_in_safety.
- get_bootstrap_dataset_size(frac_data_in_safety)¶
- Computes the number of datapoints that should go into the bootstrapped
candidate and safety datasets according to frac_data_in_safety.
- Parameters:
frac_data_in_safety (float) – fraction of data in safety set that we want to estimate the probabiilty of returning a solution for
- get_est_prob_pass(frac_data_in_safety, bootstrap_savedir, hyperparam_setting=None)¶
- Estimates probability of returning a solution with rho_prime fraction of data
in candidate selection.
- Parameters:
frac_data_in_safety (float) – fraction of data in safety set that we want to estimate the probabiilty of returning a solution for
n_bootstrap_samples_candidate – size of candidate dataset sampled in bootstrap
n_bootstrap_samples_safety (int) – size of safety dataset sampled in bootstrap
bootstrap_savedir (str) – root diretory to store bootstrap datasets and results
- get_gridsearchable_hyperparameter_iterator()¶
- Create iterator for every combination of grid-searchable hyperparameter values that we want to
optimize for.
- get_safety_size(n_total, frac_data_in_safety)¶
Determine the number of data points in the safety dataset.
- Parameters:
n_total (int) – the size of the total dataset
frac_data_in_safety (float) – fraction of data used in safety test, the remaining fraction will be used in candidate selection
- Returns:
n_safety, the desired size of the safety dataset
- Return type:
int
- powell_objective(theta, frac_data_in_safety, fixed_hyperparam_setting)¶
The objective function that Powell tries to minimize. We want to minimize (1-prob_pass) in order to maximize prob_pass. Need to return the thing we are trying to minimze.
- Parameters:
theta – Vector of hyperparameters
frac_data_in_safety – Fraction of data going to safety test
fixed_hyperparam_setting – The hyperparameters from grid search that are frozen for this run.
- run_bootstrap_trial(bootstrap_trial_i, frac_data_in_safety, parent_savedir, hyperparam_setting=None)¶
Run bootstrap train bootstrap_trial_i to estimate the probability of passing with frac_data_in_safety.
- Returns a boolean indicating if the bootstrap trial was actually run. If the
bootstrap has been already run, will return False.
- Parameters:
bootstrap_trial_i (int) – integer indicating which trial of the bootstrap experiment we are currently running. Allows us to identify which bootstrapped dataset to load adn run
frac_data_in_safety (float) – fraction of data in safety set that we want to estimate the probabiilty of returning a solution for
bootstrap_savedir (str) – The root diretory to load bootstrapped dataset and save the result of this bootstrap trial
- run_cmaes(frac_data_in_safety, fixed_hyperparam_setting, **kwargs)¶
Run CMA-ES over the hyperparameters that we specified in hyper_dict to have tuning_method = “CMA-ES”. Use fixed values for all other hyperparams.
- run_powell(frac_data_in_safety, fixed_hyperparam_setting, **kwargs)¶
Run Powell minimization over a single hyperparameter. This is the fallback optimizer we use when we only have 1 hyperparameter and CMA-ES tuning is specified. CMA-ES is not intended for use in 1D. Use fixed values for all other hyperparams.