seldonian.models.trees.skrandomforest_model.SeldonianRandomForest

class SeldonianRandomForest(**rf_kwargs)

Bases: ClassificationModel

__init__(**rf_kwargs)

A Seldonian random forest model that re-labels leaf node probabilities from a vanilla decision tree built using SKLearn’s RandomForestClassifier object.

Variables:
  • classifier – The SKLearn classifier object

  • n_trees – The number of decision trees in the forest

__repr__()

Return repr(self).

Methods

fit(features, labels, **kwargs)

A wrapper around SKLearn’s fit() method. Returns the leaf node probabilities of SKLearn’s built trees in the forest. Assigns leaf node ids in a list of lists, where each sublist contains the ids for a single tree, ordered from left to right.

Parameters:
  • features (numpy ndarray) – Features

  • labels (1D numpy array) – Labels

Returns:

Flattend array of leaf node probabilites (of predicting the positive class) for all trees, ordered left to right in a given tree.

forward_pass(X)

Predict the probability of the postive class for each sample in X.

Parameters:

X – Feature matrix

Returns:

probs_pos_class: the vector of probabilities, leaf_nodes_hit: the ids of the leaf nodes that were

hit by each sample. These are needed for computing the Jacobian

get_jacobian(ans, theta, X)

Return the Jacobian d(forward_pass)_i/dtheta_{j+1}, where i run over datapoints and j run over model parameters. Here, a forward pass is 1/n * sum_k { forward_k(theta,X) }, where forward_k is the forward pass of a single decision tree. We can compute Jacobians for each tree separately and then horizontally stack them and add a 1/n out front.

Parameters:
  • ans – The result of the forward pass function evaluated on theta and X

  • theta – The weight vector, which isn’t used in this method

  • X – The features

Returns:

J, the Jacobian matrix

get_leaf_node_probs()

Retrieve the leaf node probabilities from the current forest of trees from left to right.

Returns:

Flattend array of leaf node probabilites (of predicting the positive class) for all trees, ordered left to right in a given tree.

predict(theta, X, **kwargs)

Call the autograd primitive (a workaround since our forward pass involves an external library)

Parameters:
  • theta (numpy ndarray) – model weights (not probabilities)

  • X (numpy ndarray) – model features

Returns:

model predictions

Return type:

numpy ndarray same shape as labels

set_leaf_node_values(probs)

Update the leaf node values, i.e., the number of samples that get categorized as 0 or 1, using the new probabilities, probs.

Parameters:

probs – A flattened array of the leaf node probabilities from all trees