seldonian.optimizers.gradient_descent.gradient_descent_adam¶
- gradient_descent_adam(primary_objective, n_constraints, upper_bounds_function, theta_init, lambda_init, batch_calculator, n_batches, batch_size=100, n_epochs=1, alpha_theta=0.05, alpha_lamb=0.05, beta_velocity=0.9, beta_rmsprop=0.9, gradient_library='autograd', clip_theta=None, verbose=False, debug=False, **kwargs)¶
Implements KKT optimization, i.e. simultaneous gradient descent/ascent using the Adam optimizer on a Lagrangian: L(theta,lambda) = f(theta) + lambda*g(theta), where f is the primary objective, lambda is a vector of Lagrange multipliers, and g is a vector of the upper bound functions. Gradient descent is done on theta and gradient ascent is done on lambda to find the saddle points of L. We only are interested in the optimal theta. Being part of candidate selection, If a nan or inf occurs during the optimization, NSF is returned. The optimal solution is defined as the feasible solution (i.e. all constraints satisfied), that has the smallest primary objective value.
- Parameters:
primary_objective (function or class method) – The objective function that would be solely optimized in the absence of behavioral constraints, i.e., the loss function
n_constraints – The number of constraints
upper_bounds_function (function or class method) – The function that calculates the upper bounds on the constraints
theta_init (float) – Initial model weights
lambda_init – Initial values for Lagrange multiplier terms
batch_calculator – A function/class method that sets the current batch and returns whether the batch is viable for generating a candidate solution
batches (int) – The number of batches per epoch
n_epochs (int) – The number of epochs to run
alpha_theta (float) – Initial learning rate for theta
alpha_lamb (float) – Initial learning rate for lambda
beta_velocity (float) – Exponential decay rate for velocity term
beta_rmsprop (float) – Exponential decay rate for rmsprop term
num_iters (int) – The number of iterations of gradient descent to run
gradient_library (str, defaults to "autograd") – The name of the library to use for computing automatic gradients.
clip_theta (tuple, list or numpy.ndarray, defaults to None) – Optional, the min and max values between which to clip all values in the theta vector
verbose – Boolean flag to control verbosity
debug – Boolean flag to print out info useful for debugging
- Returns:
solution, a dictionary containing the candidate solution and values of the parameters of the KKT optimization at each step.
- Return type:
dict