seldonian.parse_tree.parse_tree.ParseTree

class ParseTree(delta, regime, sub_regime, columns=[], custom_measure_functions={})

Bases: object

__init__(delta, regime, sub_regime, columns=[], custom_measure_functions={})

Class to represent a parse tree for a single behavioral constraint

Parameters:
  • delta (float) – Confidence level for the constraint. Specifies the maximum probability that the algorithm can return a solution violates the behavioral constraint. This gets broken up into smaller deltas for the base nodes.

  • regime (str) – The category of the machine learning algorithm, e.g., supervised_learning or reinforcement_learning

  • sub_regime (str) – The sub-category of ml algorithm, e.g. classification or regression for supervised learning. Use ‘all’ for RL.

  • columns (List(str)) – The names of the columns in the dataframe. Used to determine if conditional columns provided by user are appropriate.

Variables:
  • root (nodes.Node object) – Root node which contains the whole tree via left and right child attributes. Gets assigned when tree is built by create_from_ast()

  • constraint_str (str) – The string expression for the behavioral constraint

  • n_nodes (int) – Total number of nodes in the parse tree

  • n_base_nodes (int) – Number of base variable nodes in the parse tree. Does not include constants. If a base variable, such as PR | [M] appears more than once in the constraint_str each appearance contributes to n_base_nodes

  • base_node_dict (dict) – Keeps track of unique base variable nodes, their confidence bounds and whether the bounds have been calculated for a given base node already. Helpful for handling case where we have duplicate base nodes

  • n_unique_bounds_tot (int) – The total number of unique confidence bounds that need to be computed over all unique base nodes. This is set by assign_bounds_needed()

  • node_fontsize (int) – Fontsize used for graphviz visualizations

  • available_measure_functions (int) – A list of measure functions for the given regime and sub-regime, e.g. “Mean_Error” for supervised regression or “PR”, i.e. Positive Rate for supervised classification.

__repr__()

Return repr(self).

Methods

_abs(a)

Absolute value of a confidence interval

Parameters:

a (tuple) – Confidence interval like: (lower,upper)

_add(a, b)

Add two confidence intervals

Parameters:
  • a (tuple) – Confidence interval like: (lower,upper)

  • b (tuple) – Confidence interval like: (lower,upper)

_assign_bounds_helper(node, lower_needed, upper_needed, **kwargs)

Helper function to traverse the parse tree and assign which bounds we need to calculate on the base nodes.

Parameters:
  • node (Node object) – node in the parse tree

  • lower_needed (bool) – Whether lower bound needs to be calculated

  • upper_needed (bool) – Whether upper bound needs to be calculated

_assign_deltas_helper(node, weight_method, **kwargs)

Helper function to traverse the parse tree and assign delta values to base nodes.

Parameters:
  • node (Node object) – node in the parse tree

  • weight_method (str) – How you want to assign the deltas to the base nodes

_assign_infl_factors_helper(node, method, factors)

Helper function to traverse the parse tree and assign bound inflation factors values to base nodes.

Parameters:
  • node (Node object) – node in the parse tree

  • method (str) – How you want to assign the bound inflation factors to the base nodes

  • factors – If an integer and method==”constant”, that integer is applied to all bounds. If method==”manual”, this needs to be a vector of bound inflation factors to assign to each unique base node.

_ast2pt_node(ast_node)

From ast.AST node object, create one of the node objects from Nodes

Parameters:

ast_node (ast.AST node object) – node in the ast tree

_ast_tree_helper(ast_node)

From a given node in the ast tree, make a node in the tree and recurse to children of this node.

Parameters:

ast_node (ast.AST node object) – node in the ast tree

_div(a, b)

Divide two confidence intervals

Parameters:
  • a (tuple) – Confidence interval like: (lower,upper)

  • b (tuple) – Confidence interval like: (lower,upper)

_evaluator_helper(node, **kwargs)

Helper function for traversing through the tree to evaluate the constraint

Parameters:

node (Node object) – node in the parse tree

_exp(a)

Exponentiate a confidence interval

Parameters:

a (tuple) – Confidence interval like: (lower,upper)

_log(a)

Take log of a confidence interval

Parameters:

a (tuple) – Confidence interval like: (lower,upper)

_max(a, b)

Get the maximum of two confidence intervals

Parameters:
  • a (tuple) – Confidence interval like: (lower,upper)

  • b (tuple) – Confidence interval like: (lower,upper)

_min(a, b)

Get the minimum of two confidence intervals

Parameters:
  • a (tuple) – Confidence interval like: (lower,upper)

  • b (tuple) – Confidence interval like: (lower,upper)

_mult(a, b)

Multiply two confidence intervals

Parameters:
  • a (tuple) – Confidence interval like: (lower,upper)

  • b (tuple) – Confidence interval like: (lower,upper)

_parse_subscript(ast_node)

Helper function for dealing with base nodes with subscripts.

Parameters:

ast_node (ast.AST node object) – node in the ast tree

Returns:

node_class - which node class to use for this base node node_kwargs - keyword arguments used to build the base node

_pow(a, b)

Get the confidence interval on pow(a,b) where b and b are both be intervals

Parameters:
  • a (tuple) – Confidence interval like: (lower,upper)

  • b (tuple) – Confidence interval like: (lower,upper)

_preprocess_constraint_str(s)

Check if inequalities present and move everything to one side so final constraint string is in the form: {constraint_str} <= 0

Also does some validation checks to make sure string that was passed is valid

Parameters:

s (str) – mathematical expression written in Python syntax from which we build the parse tree

Returns:

String for g

Return type:

str

_propagate_value(node)

Helper function for propagating values

Parameters:

node (Node object) – node in the parse tree

_propagator_helper(node, **kwargs)

Helper function for traversing through the tree and propagating confidence bounds

Parameters:

node (Node object) – node in the parse tree

_protect_nan(bound, bound_type)

Handle nan as negative infinity if in lower bound and postitive infinity if in upper bound

Parameters:
  • bound (float) – The value of the upper or lower bound

  • bound_type (str) – ‘lower’ or ‘upper’

_sub(a, b)

Subract two confidence intervals

Parameters:
  • a (tuple) – Confidence interval like: (lower,upper)

  • b (tuple) – Confidence interval like: (lower,upper)

_validate_delta_vector(delta_vector)

Checks to ensure the supplied delta vector is the correct length. Also if it does not sum to self.delta normalize it so it does.

Parameters:

delta_vector – 1D array of delta values to assign to the unique base nodes

_validate_infl_factors(method, factors)

Checks to make sure supplied factors has correct dtype and size given the method.

Parameters:
  • method (str) – How you want to assign the bound inflation factors to the base nodes

  • factors – If an integer and method==”constant”, that integer is applied to all bounds. If method==”manual”, this needs to be a vector of bound inflation factors to assign to each unique base node.

assign_bounds_needed(**kwargs)

Depth-first search through the tree and decide which bounds are required to compute on each child node. There are cases where it is not always necessary to compute both lower and upper bounds because at the end all we care about is the upper bound of the root node.

assign_deltas(weight_method='equal', **kwargs)

Assign the delta values to the base nodes in the tree.

Parameters:

weight_method (str) – str, defaults to ‘equal’ How you want to assign the deltas to the base nodes. The default ‘equal’ splits up delta equally among unique base nodes. ‘manual’ allows specifying the weights as an array.

assign_infl_factors(method='constant', factors=2)

Assign the bound inflation factors (for candidate selection) to the base nodes in the tree.

Parameters:
  • method (str) – str, defaults to ‘constant’, which assigns a factor of the ‘factors’ value to all bounds. If method == “manual”, then factors should be a vector of values to use for the bounds whose length is equal to the number of total unique bounds across all base nodes.

  • factors – If an integer and method==”constant”, that integer is applied to all bounds. If method==”manual”, this needs to be a vector of bound inflation factors to assign to each unique base node.

build_tree(constraint_str, delta_weight_method='equal', delta_vector=[], infl_factor_method='constant', infl_factors=2)

Convenience function for building the tree from a constraint string, subdividing the tree delta to deltas for each base node, and assigning which nodes need upper and lower bounding.

Parameters:
  • constraint_str (str) – mathematical expression written in Python syntax from which we build the parse tree

  • delta_weight_method (str, defaults to 'equal') – str, How you want to assign the deltas to the base nodes. The default ‘equal’ splits up delta equally among unique base nodes

  • delta_vector – 1D array of delta values to assign to the unique base nodes.

  • infl_factor_method (str, defaults to 'constant') – How you want to assign the inflation factors to the base nodes. The default ‘constant’ applies a constant factor to each base node bound. ‘manual’ allows assinging unique inflation factors to each base node bound.

  • infl_factors – The bound inflation factors. Int if infl_factor_method=”constant”, array-like if infl_factor_method=”manual”.

create_from_ast(s)

Create the node structure of the tree given a mathematical string expression, s

Parameters:

s (str) – mathematical expression written in Python syntax from which we build the parse tree

evaluate_constraint(**kwargs)

Evaluate the constraint itself (not bounds) Postorder traverse (left, right, root) through the tree and calculate the values of the base nodes then propagate bounds using propagation logic

make_viz(title)

Make a graphviz diagram from a root node

Parameters:

title (str) – The title you want to display at the top of the graph

make_viz_helper(root, graph)

Helper function for make_viz() Recurses through the parse tree and adds nodes and edges to the graph

Parameters:
  • root (Node object) – root of the parse tree

  • graph (graphviz.Digraph object) – The graphviz graph object

propagate(node)

Helper function for propagating confidence bounds

Parameters:

node (Node object) – node in the parse tree

propagate_bounds(**kwargs)

Postorder traverse (left, right, root) through the tree and calculate confidence bounds on base nodes, then propagate bounds using propagation logic

reset_base_node_dict(reset_data=False)

Reset base node dict so that any bounds or values stored are removed. However, keeps the delta values and bound inflation factors for each bound that were set when the tree was built. If those need to be reset, create an entirely new parse tree.

Parameters:

reset_data (bool) – Whether to reset the cached data for each base node. This is needed less frequently than one needs to reset the bounds.