experiments.generate_plots.PlotGenerator¶
- class PlotGenerator(spec, n_trials, data_fracs, datagen_method, perf_eval_fn, results_dir, n_workers, perf_eval_kwargs={}, constraint_eval_fns=[], constraint_eval_kwargs={}, batch_epoch_dict={})¶
Bases:
object
- __init__(spec, n_trials, data_fracs, datagen_method, perf_eval_fn, results_dir, n_workers, perf_eval_kwargs={}, constraint_eval_fns=[], constraint_eval_kwargs={}, batch_epoch_dict={})¶
Class for running Seldonian experiments and generating the three plots: 1) Performance 2) Solution rate 3) Failure rate all plotted vs. amount of data used in the algorithm (candidate + safety).
- Parameters:
spec (seldonian.spec.Spec object) – Specification object for running the Seldonian algorithm
n_trials (int) – The number of times the Seldonian algorithm is run for each data fraction. Used for generating error bars
data_fracs (List(float)) – Proportions of the overall size of the dataset to use (used to make the horizontal axis on the three plots).
datagen_method (str) – Method for generating data that is used to run the Seldonian algorithm for each trial, e.g., “resample”
perf_eval_fn (function or class method) – Function used to evaluate the performance of the model obtained in each trial, with signature varies depending on regime. See tutorials: https://seldonian.cs.umass.edu/Tutorials/tutorials/
results_dir (str) – The directory in which to save the results
n_workers (int) – The number of workers to use if using multiprocessing
perf_eval_kwargs (dict) – Extra keyword arguments to pass to perf_eval_fn
constraint_eval_fns (List(function or class method), defaults to []) – List of custom functions used to evaluate the constraints on ground truth. If an empty list is provided, the constraints are evaluated using the parse tree.
constraint_eval_kwargs (dict) – Extra keyword arguments to pass to the constraint_eval_fns
batch_epoch_dict (dict) – Instruct batch sizes and n_epochs for each data frac
- __repr__()¶
Return repr(self).
Methods
- make_plots(tot_data_size=None, model_label_dict={}, ignore_models=[], fontsize=12, title_fontsize=12, legend_fontsize=8, ncols_legend=3, performance_label='accuracy', sr_label='Prob. of solution', fr_label='Prob. of violation', performance_yscale='linear', performance_ylims=[], hoz_axis_label='Amount of data', show_confidence_level=True, marker_size=20, save_format='pdf', show_title=True, custom_title=None, include_legend=True, savename=None)¶
Make the three plots of the experiment. Looks up any experiments run in self.results_dir and plots them on the same three plots.
- Parameters:
tot_data_size (int) – The total number of datapoints in the experiment. This is used, alongside the data_fracs array to construct the horizontal axes of the three plots. If None, assumes a value from the dataset.
model_label_dict (int) – An optional dictionary where keys are model names and values are the names you want shown in the legend. Note that if you specify this dict, then only the models in this dictionary will appear in the legend, and they will show up in the legend in the order that you specify them in the dict.
ignore_models (List) – Do not plot any models whose .model_name attribute appears in this list.
fontsize (int) – The font size to use for the axis labels
title_fontsize (int) – The font size to use for the title of each subplot
legend_fontsize (int) – The font size to use for text in the legend
ncols_legend (int, defaults to 3) – The number of columns to use in the legend
performance_label (str, defaults to "accuracy") – The y axis label on the performance plot (left plot) you want to use.
sr_label (str, defaults to "Prob. of solution") – The y axis label on the solution rate plot (middle plot) you want to use.
fr_label (str, defaults to "Prob. of violation") – The y axis label on the failure rate plot (right plot) you want to use.
performance_yscale – The y axis scaling, “log” or “linear”
performance_ylims – The y limits of the performance plot. Default is to use matplotlib’s automatic determination.
hoz_axis_label (str, defaults to "Amount of data") – What you want to show as the horizontal axis label for all plots.
show_confidence_level (Bool) – Whether to show the black dotted line for the value of delta in the failure rate plot (right plot)
marker_size (float, defaults to 20.) – The size of the points in each plots (matplotlib “s” parameter)
save_format (str, defaults to "pdf") – The file type for the saved plot
show_title (bool) – Whether to show the title at the top of the figure
custom_title (str, defaults to None) – A custom title
include_legend (bool, defaults to True) – Whether to include the legend
savename (str, defaults to None) – If not None, the filename to which the figure will be saved on disk.
- validate_constraint_eval_kwargs(constraint_eval_kwargs)¶
Ensure that if additional datasets are contained within the spec object that there are held out datasets in constraint_eval_kwargs for each additional dataset.
- Parameters:
constraint_eval_kwargs – The keyword arguments used when evaluating the constraints for the failure rate plot (right plot).