seldonian.optimizers.gradient_descent.gradient_descent_adam

gradient_descent_adam(primary_objective, n_constraints, upper_bounds_function, theta_init, lambda_init, batch_calculator, n_batches, batch_size=100, n_epochs=1, alpha_theta=0.05, alpha_lamb=0.05, beta_velocity=0.9, beta_rmsprop=0.9, gradient_library='autograd', clip_theta=None, verbose=False, debug=False, **kwargs)

Implements KKT optimization, i.e. simultaneous gradient descent/ascent using the Adam optimizer on a Lagrangian: L(theta,lambda) = f(theta) + lambda*g(theta), where f is the primary objective, lambda is a vector of Lagrange multipliers, and g is a vector of the upper bound functions. Gradient descent is done on theta and gradient ascent is done on lambda to find the saddle points of L. We only are interested in the optimal theta. Being part of candidate selection, If a nan or inf occurs during the optimization, NSF is returned. The optimal solution is defined as the feasible solution (i.e. all constraints satisfied), that has the smallest primary objective value.

Parameters:
  • primary_objective (function or class method) – The objective function that would be solely optimized in the absence of behavioral constraints, i.e., the loss function

  • n_constraints – The number of constraints

  • upper_bounds_function (function or class method) – The function that calculates the upper bounds on the constraints

  • theta_init (float) – Initial model weights

  • lambda_init – Initial values for Lagrange multiplier terms

  • batch_calculator – A function/class method that sets the current batch and returns whether the batch is viable for generating a candidate solution

  • batches (int) – The number of batches per epoch

  • n_epochs (int) – The number of epochs to run

  • alpha_theta (float) – Initial learning rate for theta

  • alpha_lamb (float) – Initial learning rate for lambda

  • beta_velocity (float) – Exponential decay rate for velocity term

  • beta_rmsprop (float) – Exponential decay rate for rmsprop term

  • num_iters (int) – The number of iterations of gradient descent to run

  • gradient_library (str, defaults to "autograd") – The name of the library to use for computing automatic gradients.

  • clip_theta (tuple, list or numpy.ndarray, defaults to None) – Optional, the min and max values between which to clip all values in the theta vector

  • verbose – Boolean flag to control verbosity

  • debug – Boolean flag to print out info useful for debugging

Returns:

solution, a dictionary containing the candidate solution and values of the parameters of the KKT optimization at each step.

Return type:

dict