experiments.generate_plots.CustomPlotGenerator¶
- class CustomPlotGenerator(spec, n_trials, data_fracs, datagen_method, perf_eval_fn, results_dir, n_workers, constraint_eval_fns=[], perf_eval_kwargs={}, constraint_eval_kwargs={}, batch_epoch_dict={})¶
Bases:
PlotGenerator
- __init__(spec, n_trials, data_fracs, datagen_method, perf_eval_fn, results_dir, n_workers, constraint_eval_fns=[], perf_eval_kwargs={}, constraint_eval_kwargs={}, batch_epoch_dict={})¶
- Class for running supervised custom experiments
and generating the three plots. Use with spec.regime = “custom”
- Parameters:
spec (seldonian.spec.Spec object) – Specification object for running the Seldonian algorithm
n_trials (int) – The number of times the Seldonian algorithm is run for each data fraction. Used for generating error bars
data_fracs (List(float)) – Proportions of the overall size of the dataset to use (the horizontal axis on the three plots).
datagen_method (str, e.g. "resample") – Method for generating data that is used to run the Seldonian algorithm for each trial
perf_eval_fn (function or class method) – Function used to evaluate the performance of the model obtained in each trial, with signature: func(theta,**kwargs), where theta is the solution from candidate selection
results_dir (str) – The directory in which to save the results
n_workers (int) – The number of workers to use if using multiprocessing
constraint_eval_fns (List(function or class method), defaults to []) – List of functions used to evaluate the constraints on ground truth. If an empty list is provided, the constraints are evaluated using the parse tree
perf_eval_kwargs (dict) – Extra keyword arguments to pass to perf_eval_fn
constraint_eval_kwargs (dict) – Extra keyword arguments to pass to the constraint_eval_fns
batch_epoch_dict (dict) – Instruct batch sizes and n_epochs for each data frac
- __repr__()¶
Return repr(self).
Methods
- generate_resampled_datasets(verbose=False)¶
Generate resampled datasets to use in each trial. Resamples (with replacement) features, labels and sensitive attributes to create n_trials versions of these of the same shape as the inputs. Saves them in self.results_dir/resampled_datasets
- generate_trial_datasets(verbose=False)¶
Generate the datasets to be used in each trial.
- make_plots(tot_data_size=None, model_label_dict={}, ignore_models=[], fontsize=12, title_fontsize=12, legend_fontsize=8, ncols_legend=3, performance_label='accuracy', sr_label='Prob. of solution', fr_label='Prob. of violation', performance_yscale='linear', performance_ylims=[], hoz_axis_label='Amount of data', show_confidence_level=True, marker_size=20, save_format='pdf', show_title=True, custom_title=None, include_legend=True, savename=None)¶
Make the three plots of the experiment. Looks up any experiments run in self.results_dir and plots them on the same three plots.
- Parameters:
tot_data_size (int) – The total number of datapoints in the experiment. This is used, alongside the data_fracs array to construct the horizontal axes of the three plots. If None, assumes a value from the dataset.
model_label_dict (int) – An optional dictionary where keys are model names and values are the names you want shown in the legend. Note that if you specify this dict, then only the models in this dictionary will appear in the legend, and they will show up in the legend in the order that you specify them in the dict.
ignore_models (List) – Do not plot any models whose .model_name attribute appears in this list.
fontsize (int) – The font size to use for the axis labels
title_fontsize (int) – The font size to use for the title of each subplot
legend_fontsize (int) – The font size to use for text in the legend
ncols_legend (int, defaults to 3) – The number of columns to use in the legend
performance_label (str, defaults to "accuracy") – The y axis label on the performance plot (left plot) you want to use.
sr_label (str, defaults to "Prob. of solution") – The y axis label on the solution rate plot (middle plot) you want to use.
fr_label (str, defaults to "Prob. of violation") – The y axis label on the failure rate plot (right plot) you want to use.
performance_yscale – The y axis scaling, “log” or “linear”
performance_ylims – The y limits of the performance plot. Default is to use matplotlib’s automatic determination.
hoz_axis_label (str, defaults to "Amount of data") – What you want to show as the horizontal axis label for all plots.
show_confidence_level (Bool) – Whether to show the black dotted line for the value of delta in the failure rate plot (right plot)
marker_size (float, defaults to 20.) – The size of the points in each plots (matplotlib “s” parameter)
save_format (str, defaults to "pdf") – The file type for the saved plot
show_title (bool) – Whether to show the title at the top of the figure
custom_title (str, defaults to None) – A custom title
include_legend (bool, defaults to True) – Whether to include the legend
savename (str, defaults to None) – If not None, the filename to which the figure will be saved on disk.
- run_seldonian_experiment(verbose=False)¶
Run a supervised Seldonian experiment using the spec attribute assigned to the class in __init__().
- Parameters:
verbose (bool, defaults to False) – Whether to display results to stdout while the Seldonian algorithms are running in each trial
- validate_constraint_eval_kwargs(constraint_eval_kwargs)¶
Ensure that if additional datasets are contained within the spec object that there are held out datasets in constraint_eval_kwargs for each additional dataset.
- Parameters:
constraint_eval_kwargs – The keyword arguments used when evaluating the constraints for the failure rate plot (right plot).