Example: efficiently training a Seldonian facial recognition system
We will now go through an example to make the steps described in the outline above more concrete. We will use the same dataset and model from the Gender bias in facial recognition example. In that example, we trained a convolutional neural network (CNN) to classify gender from images of faces from the UTKFace dataset, subject to a fairness constraint enforcing that accuracy should be similar when predicting male and female faces. Before following along with the steps above, we need to set up our computing environment properly. We recommend following along with these steps in the Colab notebook linked at the top of this tutorial. However, we reproduce the steps here if you simply want to read along rather than run the cells yourself.
Preliminaries
Make sure GPU is enabled
Make sure that whatever system you're on is capable of using the GPU. The Colab notebook (link at the top of this page) shows how to do that for Colabs, but in general this amounts to downloading the correct drivers for PyTorch or Tensorflow.
Imports
Dataset preparation
First download the dataset from here. Unzip that file, revealing the data file called age_gender.csv
. The following code loads the dataset, shuffles it, and clip off 5 samples to make the dataset size more easily divisible when making mini-batches.
The next steps are to make the features and labels that we will use to train the model. This requires converting the flattened image data from the dataframe into the shape and data type that the model expects. After creating these, we save them to disk for fast loading later.
Step 1. Split data into two datasets
We'll call these the candidate ("cand") and safety, and use a 50/50 split. The data are already shuffled, so we'll split right down the middle. The first half will be candidate data and the second will be safety. We'll also make the PyTorch data loaders which come in handy for training with PyTorch.
Step 2. Train the full network on the candidate data only
Let's define the full network below.
Next, we instantiate the model, and put it on the GPU.
Here, we set up the training parameters and the training function.
Let's train for ten epochs.
Running that code produces the following output:
We evaluate the performance on the safety dataset using the following code.
Running that code produces the following output:
The result may differ slightly depending on your machine and random seed. We need to save the parameters of this trained model so we can apply them to the body-only model.
Step 3. Separate out the "body" and the "head" of the full network into two separate models
The body-only model is the full network minus the final fully connected layer (and the softmax):
Let's instantiate this model and put it on the GPU.
Step 4. Assign the weights from the trained full network to the new body-only model
This ensures that the body-only model is "trained." Remove the weights and bias from the last layer of the state dictionary from the trained full model so that we can copy the state dict from the trained model to the new headless model.
Step 5. Pass all of the data (both datasets from step 1) through the trained "body-only" model
Save the outputs of passing the data through the model. These are your new "latent features" that you will use as input to the Seldonian Toolkit. First, notice that the output of the last layer of the headless model has size: 256. Therefore, we will have 256 features for each image.
Pass candidate data in first, followed by safety data. When we use these features/labels in the Seldonian Toolkit, the candidate data are taken first during the candidate/safety split. This code also fills the labels we will save. The last part of the code saves the features and labels to pickle files.
Step 6. The head-only model is the model we will use in the toolkit
The head-only model should be initially untrained when used in the toolkit, so don't apply the weights learned in step 2 to the head. The model needs to be compatible with the toolkit, so regardless of the programming language used to define the full network, the head-only model needs to be implemented in Python. Specifically, the toolkit supports Numpy, PyTorch or Tensorflow models. We will just take the PyTorch implemented of the head from the full network and implement it as its own model.
Step 7. The data we will use are the latent features created in step 5
Let's create a Seldonian dataset object from these features, the labels and the sensitive attributes
Step 8. Assign the frac_data_in_safety parameter of the spec object to be the same split fraction as you used in step 1
We used a 50/50 split in step 1, so we just need to set frac_data_in_safety=0.5
. Recall that the candidate data that we use in the toolkit must match the dataset that we used to train the full model in step 2. In other words, no safety data should come from the dataset that was used to train the full model in step 2 because that would invalidate the safety/fairness guarantees. The data split that the toolkit performs does not reshuffle the data, so the candidate data will be the first half of the data we pass to the dataset object. This is the same half on which we trained the full model. That means that the latent features that will be used as candidate data in the toolkit came from the candidate data on which we trained the full model.
Step 9. Run the Seldonian Engine/Experiments as normal, except now the model is a simple linear model instead of a deep network
As we will see, using the head-only in the toolkit will be much faster than using the full network as we did in the Gender bias in facial recognition example. Let's set up the spec object we need to run the Engine. We already have the dataset object, so we just need the parse trees and the hyperparameters for the optimization. Note that we don't need to use mini-batches in gradient descent/ascent because the model is now a small linear model and no longer a deep network.
Now we are ready to run the Seldonian Engine.
If we run the above code, we can see that it passed the safety test, and it took less than 10 seconds on the CPU. Let's visualize the gradient descent process. Unless you are running this in Google Colab, you will probably need to change the path to the log file.
Running this code produces the following plot:
Run a Seldonian Experiment
Note: Running the following experiments is compute-intensive on the CPU. The experiments library is parallelized across multiple CPUs to speed up the computation. However, free-tier Colab notebooks such as the default one that opens when clicking the button at the top of this tutorial lack the number and quality of CPUs for running the experiments in a reasonable amount of time. In the Colab notebook, we prepopulated the results so that the experiments did not actually have to run. However, if you want to run these experiments yourself in full, we recommend using a machine with at least 4 CPUs. For reference, on a Mac M1 with 7 CPU cores the experiment takes between between 5 and 10 minutes to complete. Though we have not tested this code in Google Colab PRO or PRO+ notebooks, we expect that the resources allocated in those paid-tier notebooks will be sufficient to run the full experiment.
Now, we set up the parameters of the experiment. We will be using 10 trials with six data fractions. This setup is similar to the setup in the gender classifier example. Set n_workers
to the number of CPUs you want to use. Each CPU will get one trial at a time. Change results_dir
to where you want to save the results.
Here we define the ground truth dataset, which is the original dataset.
Next, we define the function used for evaluating the performance and its keyword arguments. Above, we set performance_metric='Accuracy'
, so that's the metric we will use for the left-most plot.
We will use the default constraint evaluation function (built-in to the Engine), so we don't need to specify anything for that, but we can batch the model forward pass when evaluating the constraints. To specify the batch size, we use the following dictionary:
Now we can make the plot generator and run the Seldonian experiment:
Running the above code will produce 10 trials for each data fraction, resulting in a total of 60 files. These will be saved as CSV files in ${results_dir}/qsa_results/trial_data
, where ${results_dir}
is whatever you set that variable to be. We want to compare these results to the same experiment using the full deep network with the constraint as well as the full network without the constraint - these are the two curves shown in Figure 3 of the Gender bias in facial recognition example. To do this, we can copy the results from that experiment into the results_dir
folder of this experiment. We did that, but renamed the qsa_results/
folder from the other experiment to qsa_fullmodel_results/
so that it wouldn't overwrite our qsa_results/
folder. The other folder we need to copy from that experiment is the facial_recog_cnn/
folder, which contains the results for the experiment on the full network lacking the constraint.
You'll notice in the parameter setup section of the current experiment that we defined the dictionary:
This dictionary maps the model name (the prefix to _results
in the model results folder path) to the name you want displayed in the legend of the plot. Let's now run the code to make the three plots for this experiment and the two experiments we copied over.
Running that code produces the following three plots.