seldonian.parse_tree.nodes.MEDCustomBaseNode¶
- class MEDCustomBaseNode(name, lower=-inf, upper=inf, **kwargs)¶
Bases:
BaseNode
- __init__(name, lower=-inf, upper=inf, **kwargs)¶
Custom base node that calculates pair-wise mean error differences between male and female points. This was used in the Seldonian regression algorithm presented by Thomas et al. (2019): https://www.science.org/stoken/author-tokens/ST-119/full see Figure 2.
Overrides several parent class methods
- Parameters:
name (str) – The name of the node
lower (float) – Lower confidence bound
upper (float) – Upper confidence bound
- Variables:
delta (float) – The share of the confidence put into this node
- __repr__()¶
Overrides Node.__repr__()
Methods
- calculate_bounds(**kwargs)¶
Calculate confidence bounds given a bound_method, such as t-test.
- Returns:
A dictionary mapping the bound name to its value, e.g., {“lower”:-1.0, “upper”: 1.0}
- calculate_value(**kwargs)¶
Calculate the value of the node given model weights, etc. This is the expected value of the base variable, not the bound.
- compute_HC_lowerbound(data, datasize, delta, **kwargs)¶
Calculate high confidence lower bound Used in safety test
- Parameters:
data (numpy ndarray) – Vector containing base variable evaluated at each observation in dataset
datasize (int) – The number of observations in the safety dataset
delta (float) – Confidence level, e.g. 0.05
- Returns:
lower, the high-confidence lower bound
- compute_HC_upper_and_lowerbound(data, datasize, delta_lower, delta_upper, **kwargs)¶
Calculate high confidence lower and upper bounds Used in safety test. Confidence levels for lower and upper bound do not have to be equivalent.
Depending on the bound_method, this is not always equivalent to calling compute_HC_lowerbound() and compute_HC_upperbound() independently.
- Parameters:
data (numpy ndarray) – Vector containing base variable evaluated at each observation in dataset
datasize (int) – The number of observations in the safety dataset
delta_lower – Confidence level for the lower bound, e.g. 0.05
delta_upper – Confidence level for the upper bound, e.g. 0.05
- Returns:
(lower,upper) the high-confidence lower and upper bounds.
- compute_HC_upperbound(data, datasize, delta, **kwargs)¶
Calculate high confidence upper bound Used in safety test
- Parameters:
data (numpy ndarray) – Vector containing base variable evaluated at each observation in dataset
datasize (int) – The number of observations in the safety dataset
delta (float) – Confidence level, e.g. 0.05
- Returns:
upper, the high-confidence upper bound
- mask_data(dataset, conditional_columns)¶
Mask features and labels using a joint AND mask where each of the conditional columns is True.
- Parameters:
dataset (dataset.Dataset object) – The candidate or safety dataset
conditional_columns (List(str)) – List of columns for which to create the joint AND mask on the dataset
- Returns:
The masked dataframe
- Return type:
numpy ndarray
- precalculate_data(X, Y, S)¶
Preconfigure dataset for candidate selection or safety test so that it does not need to be recalculated on each iteration through the parse tree
- Parameters:
X (pandas dataframe) – features
Y (pandas dataframe) – labels
- predict_HC_lowerbound(data, datasize, delta, **kwargs)¶
Calculate high confidence lower bound that we expect to pass the safety test. Used in candidate selection
- Parameters:
data (numpy ndarray) – Vector containing base variable evaluated at each observation in dataset
datasize (int) – The number of observations in the safety dataset
delta (float) – Confidence level, e.g. 0.05
- Returns:
lower, the predicted high-confidence lower bound
- predict_HC_upper_and_lowerbound(data, datasize, delta_lower, delta_upper, **kwargs)¶
Calculate high confidence lower and upper bounds that we expect to pass the safety test. Used in candidate selection. Confidence levels for lower and upper bound do not have to be equivalent.
Depending on the bound_method, this is not always equivalent to calling predict_HC_lowerbound() and predict_HC_upperbound() independently.
- Parameters:
data (numpy ndarray) – Vector containing base variable evaluated at each observation in dataset
datasize (int) – The number of observations in the safety dataset
delta_lower – Confidence level for the lower bound, e.g. 0.05
delta_upper – Confidence level for the upper bound, e.g. 0.05
- Returns:
(lower,upper) the predicted high-confidence lower and upper bounds.
- predict_HC_upperbound(data, datasize, delta, **kwargs)¶
Calculate high confidence upper bound that we expect to pass the safety test. Used in candidate selection
- Parameters:
data (numpy ndarray) – Vector containing base variable evaluated at each observation in dataset
datasize (int) – The number of observations in the safety dataset
delta (float) – Confidence level, e.g. 0.05
- Returns:
upper, the predicted high-confidence upper bound
- zhat(model, theta, data_dict, **kwargs)¶
Pair up male and female columns and compute a vector of: (y_i - y_hat_i | M) - (y_j - y_hat_j | F). There may not be the same number of male and female rows so the number of pairs is min(N_male,N_female)
- Parameters:
model (models.SeldonianModel object) – machine learning model
theta (numpy ndarray) – model weights
data_dict (dict) – contains inputs to model, such as features and labels