
class SeldonianRandomForest(**rf_kwargs)

Bases: ClassificationModel


A Seldonian random forest model that re-labels leaf node probabilities from a vanilla decision tree built using SKLearn’s RandomForestClassifier object.

  • classifier – The SKLearn classifier object

  • n_trees – The number of decision trees in the forest


Return repr(self).


fit(features, labels, **kwargs)

A wrapper around SKLearn’s fit() method. Returns the leaf node probabilities of SKLearn’s built trees in the forest. Assigns leaf node ids in a list of lists, where each sublist contains the ids for a single tree, ordered from left to right.

  • features (numpy ndarray) – Features

  • labels (1D numpy array) – Labels


Flattend array of leaf node probabilites (of predicting the positive class) for all trees, ordered left to right in a given tree.


Predict the probability of the postive class for each sample in X.


X – Feature matrix


probs_pos_class: the vector of probabilities, leaf_nodes_hit: the ids of the leaf nodes that were

hit by each sample. These are needed for computing the Jacobian

get_jacobian(ans, theta, X)

Return the Jacobian d(forward_pass)_i/dtheta_{j+1}, where i run over datapoints and j run over model parameters. Here, a forward pass is 1/n * sum_k { forward_k(theta,X) }, where forward_k is the forward pass of a single decision tree. We can compute Jacobians for each tree separately and then horizontally stack them and add a 1/n out front.

  • ans – The result of the forward pass function evaluated on theta and X

  • theta – The weight vector, which isn’t used in this method

  • X – The features


J, the Jacobian matrix


Retrieve the leaf node probabilities from the current forest of trees from left to right.


Flattend array of leaf node probabilites (of predicting the positive class) for all trees, ordered left to right in a given tree.

predict(theta, X, **kwargs)

Call the autograd primitive (a workaround since our forward pass involves an external library)

  • theta (numpy ndarray) – model weights (not probabilities)

  • X (numpy ndarray) – model features


model predictions

Return type:

numpy ndarray same shape as labels


Update the leaf node values, i.e., the number of samples that get categorized as 0 or 1, using the new probabilities, probs.


probs – A flattened array of the leaf node probabilities from all trees