seldonian.models.trees.skrandomforest_model.SeldonianRandomForest¶
- class SeldonianRandomForest(**rf_kwargs)¶
Bases:
ClassificationModel
- __init__(**rf_kwargs)¶
A Seldonian random forest model that re-labels leaf node probabilities from a vanilla decision tree built using SKLearn’s RandomForestClassifier object.
- Variables:
classifier – The SKLearn classifier object
n_trees – The number of decision trees in the forest
- __repr__()¶
Return repr(self).
Methods
- fit(features, labels, **kwargs)¶
A wrapper around SKLearn’s fit() method. Returns the leaf node probabilities of SKLearn’s built trees in the forest. Assigns leaf node ids in a list of lists, where each sublist contains the ids for a single tree, ordered from left to right.
- Parameters:
features (numpy ndarray) – Features
labels (1D numpy array) – Labels
- Returns:
Flattend array of leaf node probabilites (of predicting the positive class) for all trees, ordered left to right in a given tree.
- forward_pass(X)¶
Predict the probability of the postive class for each sample in X.
- Parameters:
X – Feature matrix
- Returns:
probs_pos_class: the vector of probabilities, leaf_nodes_hit: the ids of the leaf nodes that were
hit by each sample. These are needed for computing the Jacobian
- get_jacobian(ans, theta, X)¶
Return the Jacobian d(forward_pass)_i/dtheta_{j+1}, where i run over datapoints and j run over model parameters. Here, a forward pass is 1/n * sum_k { forward_k(theta,X) }, where forward_k is the forward pass of a single decision tree. We can compute Jacobians for each tree separately and then horizontally stack them and add a 1/n out front.
- Parameters:
ans – The result of the forward pass function evaluated on theta and X
theta – The weight vector, which isn’t used in this method
X – The features
- Returns:
J, the Jacobian matrix
- get_leaf_node_probs()¶
Retrieve the leaf node probabilities from the current forest of trees from left to right.
- Returns:
Flattend array of leaf node probabilites (of predicting the positive class) for all trees, ordered left to right in a given tree.
- predict(theta, X, **kwargs)¶
Call the autograd primitive (a workaround since our forward pass involves an external library)
- Parameters:
theta (numpy ndarray) – model weights (not probabilities)
X (numpy ndarray) – model features
- Returns:
model predictions
- Return type:
numpy ndarray same shape as labels
- set_leaf_node_values(probs)¶
Update the leaf node values, i.e., the number of samples that get categorized as 0 or 1, using the new probabilities, probs.
- Parameters:
probs – A flattened array of the leaf node probabilities from all trees