seldonian.parse_tree.nodes.CVaRSQeBaseNode

class CVaRSQeBaseNode(name, lower=-inf, upper=inf, **kwargs)

Bases: BaseNode

__init__(name, lower=-inf, upper=inf, **kwargs)

Custom base node that calculates the upper and lower bounds on CVaR_alpha (with alpha fixed to 0.1) of the squared error. We are using the positive definition of CVaR_alpha, i.e. “…the expected value if we only consider the samples that are at least VaR_alpha,” where VaR_alpha “… is the largest value such that at least 100*alpha% of samples will be larger than it.” - Thomas & Miller 2019: https://people.cs.umass.edu/~pthomas/papers/Thomas2019.pdf See Theorem 3 for upper bound and Theorem 4 for lower bound

Overrides several parent class methods

Parameters:
  • name (str) – The name of the node

  • lower (float) – Lower confidence bound

  • upper (float) – Upper confidence bound

Variables:
  • delta (float) – The share of the confidence put into this node

  • alpha (float) – The probability threshold used to define CVAR

__repr__()

Overrides Node.__repr__()

Methods

calculate_bounds(**kwargs)

Calculate confidence bounds given a bound_method, such as t-test.

Returns:

A dictionary mapping the bound name to its value, e.g., {“lower”:-1.0, “upper”: 1.0}

calculate_data_forbound(**kwargs)

Prepare data inputs for confidence bound calculation.

Returns:

data_dict, a dictionary containing the prepared data

calculate_value(**kwargs)

Calculate the actual value of CVAR_alpha, not the bound.

compute_HC_lowerbound(Z, delta, datasize, a, **kwargs)

Calculate high confidence lower bound Used in safety test.

Parameters:
  • Z (numpy ndarray of length datasize) – Vector containing sorted squared errors

  • delta (float) – Confidence level, e.g. 0.05

  • datasize (int) – The number of observations in the safety dataset

  • a (float) – The minimum possible value of the squared error

compute_HC_upper_and_lowerbound(data, datasize, delta_lower, delta_upper, **kwargs)

Calculate high confidence lower and upper bounds Used in safety test. Confidence levels for lower and upper bound do not have to be equivalent.

Depending on the bound_method, this is not always equivalent to calling compute_HC_lowerbound() and compute_HC_upperbound() independently.

Parameters:
  • data (numpy ndarray) – Vector containing base variable evaluated at each observation in dataset

  • datasize (int) – The number of observations in the safety dataset

  • delta_lower – Confidence level for the lower bound, e.g. 0.05

  • delta_upper – Confidence level for the upper bound, e.g. 0.05

Returns:

(lower,upper) the high-confidence lower and upper bounds.

compute_HC_upperbound(Z, delta, datasize, b, **kwargs)

Calculate high confidence upper bound Used in safety test

Parameters:
  • Z (numpy ndarray of length datasize) – Vector containing sorted squared errors

  • delta (float) – Confidence level, e.g. 0.05

  • datasize (int) – The number of observations in the safety dataset

  • b (float) – The maximum possible value of the squared error

mask_data(dataset, conditional_columns)

Mask features and labels using a joint AND mask where each of the conditional columns is True.

Parameters:
  • dataset (dataset.Dataset object) – The candidate or safety dataset

  • conditional_columns (List(str)) – List of columns for which to create the joint AND mask on the dataset

Returns:

The masked dataframe

Return type:

numpy ndarray

predict_HC_lowerbound(Z, delta, datasize, a, **kwargs)

Calculate high confidence lower bound that we expect to pass the safety test. Used in candidate selection

Parameters:
  • Z (numpy ndarray of length n_candidate) – Vector containing sorted squared errors

  • delta (float) – Confidence level, e.g. 0.05

  • datasize (int) – The (predicted) number of observations in the safety dataset

  • a (float) – The minimum possible value of the squared error

predict_HC_upper_and_lowerbound(data, datasize, delta_lower, delta_upper, **kwargs)

Calculate high confidence lower and upper bounds that we expect to pass the safety test. Used in candidate selection. Confidence levels for lower and upper bound do not have to be equivalent.

Depending on the bound_method, this is not always equivalent to calling predict_HC_lowerbound() and predict_HC_upperbound() independently.

Parameters:
  • data (numpy ndarray) – Vector containing base variable evaluated at each observation in dataset

  • datasize (int) – The number of observations in the safety dataset

  • delta_lower – Confidence level for the lower bound, e.g. 0.05

  • delta_upper – Confidence level for the upper bound, e.g. 0.05

Returns:

(lower,upper) the predicted high-confidence lower and upper bounds.

predict_HC_upperbound(Z, delta, datasize, b, **kwargs)

Calculate high confidence upper bound that we expect to pass the safety test. Used in candidate selection

Parameters:
  • Z (numpy ndarray of length n_candidate) – Vector containing sorted squared errors

  • delta (float) – Confidence level, e.g. 0.05

  • datasize (int) – The (predicted) number of observations in the safety dataset

  • b (float) – The maximum possible value of the squared error

zhat(model, theta, data_dict, sub_regime, **kwargs)

Calculate an unbiased estimate of the base variable node.

Parameters:
  • model (models.SeldonianModel object) – The machine learning model

  • theta (numpy ndarray) – model weights

  • data_dict (dict) – Contains inputs to model, such as features and labels

Returns:

A vector of unbiased estimates of the measure function