How to specify algorithms and algorithm specific options

The algorithm argument

The algorithm argument can either be string with the name of a algorithm that is implemented in estimagic, or a function that fulfills the interface laid out in Internal optimizers for estimagic.

Which algorithms are available in estimagic depends on the packages a user has installed. We list all implemented algorithms below.

The algo_options argument

algo_options is a dictionary with optional keyword arguments that are passed to the optimization algorithm.

We align the names of all algo_options across algorithms.

Since some optimizers support many tuning parameters we group some of them using the first part of their name (e.g. all convergence criteria names start with convergence_).

All option names only contain _. However, to make the group membership more visible, you can also specify them separating the group with a . from the rest of the option’s name. For example, if you wanted to set some tuning parameters of nag_dfols you could specify your algo_options like this:

algo_options = {
    "trustregion.threshold_successful": 0.2,
    "trustregion.threshold_very_successful": 0.9,
    "trustregion.shrinking_factor.not_successful": 0.4,
    "trustregion.shrinking_factor.lower_radius": 0.2,
    "trustregion.shrinking_factor.upper_radius": 0.8,
    "convergence.scaled_criterion_tolerance": 0.0,
    "convergence.noise_corrected_criterion_tolerance": 1.1,
}

Estimagic then automatically replaces the . with _ before passing them to the internal optimizer.

However, not all algorithms support all options. Which options are supported and very specific details of what the options mean are documented for each algorithm.

To make it easier to switch between algorithms, we simply ignore non-supported options and issue a warning that explains which options have been ignored.

To find more information on algo_options that more than one algorithm allows for see The default algorithm options.

How to read the algorithms documentation

Below we document the supported algorithms. The documentation refers to the internal optimizer interface (see Internal optimizers for estimagic). However, those functions are not intended to be called by the user. Instead users should call them by calling maximize or minimize with the corresponding algorithm argument.

The arguments criterion_and_derivative, x, lower_bound and upper_bound of the signatures below should simply be ignored.

The other arguments can be set as algo_options when calling maximize or minimize.

Supported Algorithms

Algorithms from scipy

estimagic supports most scipy algorithms. You do not need to install additional dependencies to use them:

estimagic.optimization.scipy_optimizers.scipy_lbfgsb(criterion_and_derivative, x, lower_bounds, upper_bounds, *, convergence_relative_criterion_tolerance=2e-09, convergence_absolute_gradient_tolerance=1e-05, stopping_max_criterion_evaluations=1000000, stopping_max_iterations=1000000, limited_memory_storage_length=10, max_line_search_steps=20)[source]

Minimize a scalar function of one or more variables using the L-BFGS-B algorithm.

Do not call this function directly but pass its name “scipy_lbfgsb” to estimagic’s maximize or minimize function as algorithm argument. Specify your desired arguments as a dictionary and pass them as algo_options to minimize or maximize.

The optimizer is taken from scipy, which calls the Fortran code written by the original authors of the algorithm. The Fortran code includes the corrections and improvements that were introduced in a follow up paper.

lbfgsb is a limited memory version of the original bfgs algorithm, that deals with lower and upper bounds via an active set approach.

The lbfgsb algorithm is well suited for differentiable scalar optimization problems with up to several hundred parameters.

It is a quasi-newton line search algorithm. At each trial point it evaluates the criterion function and its gradient to find a search direction. It then approximates the hessian using the stored history of gradients and uses the hessian to calculate a candidate step size. Then it uses a gradient based line search algorithm to determine the actual step length. Since the algorithm always evaluates the gradient and criterion function jointly, the user should provide a criterion_and_derivative function that exploits the synergies in the calculation of criterion and gradient.

The lbfgsb algorithm is almost perfectly scale invariant. Thus, it is not necessary to scale the parameters.

Below, only details of the optional algorithm options are listed. For the mandatory arguments see Internal optimizers for estimagic. For more background on those options, see Naming conventions for optional arguments.

Parameters
  • convergence_relative_criterion_tolerance (float) –

    Stop when the relative improvement between two iterations is smaller than this. More formally, this is expressed as

    \[\frac{(f^k - f^{k+1})}{\max{{|f^k|, |f^{k+1}|, 1}}} \leq \text{relative_criterion_tolerance}\]

  • convergence_absolute_gradient_tolerance (float) – Stop if all elements of the projected gradient are smaller than this.

  • stopping_max_criterion_evaluations (int) – If the maximum number of function evaluation is reached, the optimization stops but we do not count this as convergence.

  • stopping_max_iterations (int) – If the maximum number of iterations is reached, the optimization stops, but we do not count this as convergence.

  • limited_memory_storage_length (int) – Maximum number of saved gradients used to approximate the hessian matrix.

Returns

See Output of internal optimizers for details.

Return type

dict

estimagic.optimization.scipy_optimizers.scipy_slsqp(criterion_and_derivative, x, lower_bounds, upper_bounds, *, convergence_absolute_criterion_tolerance=1e-08, stopping_max_iterations=1000000)[source]

Minimize a scalar function of one or more variables using the SLSQP algorithm.

Do not call this function directly but pass its name “scipy_slsqp” to estimagic’s maximize or minimize function as algorithm argument. Specify your desired arguments as a dictionary and pass them as algo_options to minimize or maximize.

SLSQP stands for Sequential Least Squares Programming.

SLSQP is a line search algorithm. It is well suited for continuously differentiable scalar optimization problems with up to several hundred parameters.

The optimizer is taken from scipy which wraps the SLSQP optimization subroutine originally implemented by [nag1].

Note

SLSQP’s general nonlinear constraints are not supported yet by estimagic.

Below, only details of the optional algorithm options are listed. For the mandatory arguments see Internal optimizers for estimagic. For more background on those options, see Naming conventions for optional arguments.

Parameters
  • convergence_absolute_criterion_tolerance (float) – Precision goal for the value of f in the stopping criterion.

  • stopping_max_iterations (int) – If the maximum number of iterations is reached, the optimization stops, but we do not count this as convergence.

Returns

See Output of internal optimizers for details.

Return type

dict

estimagic.optimization.scipy_optimizers.scipy_neldermead(criterion_and_derivative, x, *, stopping_max_iterations=1000000, stopping_max_criterion_evaluations=1000000, convergence_absolute_criterion_tolerance=1e-08, convergence_absolute_params_tolerance=1e-08, adaptive=False)[source]

Minimize a scalar function using the Nelder-Mead algorithm.

Do not call this function directly but pass its name scipy_neldermead to estimagic’s maximize or minimize function as algorithm argument. Specify your desired arguments as a dictionary and pass them as algo_options to minimize or maximize.

The Nelder-Mead algorithm is a direct search method (based on function comparison) and is often applied to nonlinear optimization problems for which derivatives are not known. Unlike most modern optimization methods, the Nelder–Mead heuristic can converge to a non-stationary point, unless the problem satisfies stronger conditions than are necessary for modern methods.

Nelder-Mead is never the best algorithm to solve a problem but rarely the worst. Its popularity is likely due to historic reasons and much larger than its properties warrant.

The argument initial_simplex is not supported by estimagic as it is not compatible with estimagic’s handling of constraints.

Below, only details of the optional algorithm options are listed. For the mandatory arguments see Internal optimizers for estimagic. For more background on those options, see Naming conventions for optional arguments.

Parameters
  • stopping_max_iterations (int) – If the maximum number of iterations is reached, the optimization stops, but we do not count this as convergence.

  • stopping_max_criterion_evaluations (int) – If the maximum number of function evaluation is reached, the optimization stops but we do not count this as convergence.

  • convergence_absolute_params_tolerance (float) – Absolute difference in parameters between iterations that is tolerated to declare convergence. As no relative tolerances can be passed to Nelder-Mead, estimagic sets a non zero default for this.

  • convergence_absolute_criterion_tolerance (float) – Absolute difference in the criterion value between iterations that is tolerated to declare convergence. As no relative tolerances can be passed to Nelder-Mead, estimagic sets a non zero default for this.

  • adaptive (bool) – Adapt algorithm parameters to dimensionality of problem. Useful for high-dimensional minimization ([nag2], p. 259-277). scipy’s default is False.

Returns

See Output of internal optimizers for details.

Return type

dict

estimagic.optimization.scipy_optimizers.scipy_powell(criterion_and_derivative, x, lower_bounds, upper_bounds, *, convergence_relative_params_tolerance=1e-05, convergence_relative_criterion_tolerance=2e-09, stopping_max_criterion_evaluations=1000000, stopping_max_iterations=1000000)[source]

Minimize a scalar function using the modified Powell method.

Do not call this function directly but pass its name “scipy_powell” to estimagic’s maximize or minimize function as algorithm argument. Specify your desired arguments as a dictionary and pass them as algo_options to minimize or maximize.

Warning

In our benchmark using a quadratic objective function, the Powell algorithm did not find the optimum very precisely (less than 4 decimal places). If you require high precision, you should refine an optimum found with Powell with another local optimizer.

The criterion function need not be differentiable.

Powell’s method is a conjugate direction method, minimising the function by a bi-directional search in each parameter’s dimension.

The argument direc, which is the initial set of direction vectors and which is part of the scipy interface is not supported by estimagic because it is incompatible with how estimagic handles constraints.

Below, only details of the optional algorithm options are listed. For the mandatory arguments see Internal optimizers for estimagic. For more background on those options, see Naming conventions for optional arguments.

Parameters
  • convergence_relative_params_tolerance (float) – Stop when the relative movement between parameter vectors is smaller than this.

  • convergence_relative_criterion_tolerance (float) –

    Stop when the relative improvement between two iterations is smaller than this. More formally, this is expressed as

    \[\frac{(f^k - f^{k+1})}{\max{{|f^k|, |f^{k+1}|, 1}}} \leq \text{relative_criterion_tolerance}\]

  • stopping_max_criterion_evaluations (int) – If the maximum number of function evaluation is reached, the optimization stops but we do not count this as convergence.

  • stopping_max_iterations (int) – If the maximum number of iterations is reached, the optimization stops, but we do not count this as convergence.

Returns

See Output of internal optimizers for details.

Return type

dict

estimagic.optimization.scipy_optimizers.scipy_bfgs(criterion_and_derivative, x, lower_bounds, upper_bounds, *, convergence_absolute_gradient_tolerance=1e-05, stopping_max_iterations=1000000, norm=numpy.inf)[source]

Minimize a scalar function of one or more variables using the BFGS algorithm.

Do not call this function directly but pass its name “scipy_bfgs” to estimagic’s maximize or minimize function as algorithm argument. Specify your desired arguments as a dictionary and pass them as algo_options to minimize or maximize.

BFGS stands for Broyden-Fletcher-Goldfarb-Shanno algorithm. It is a quasi-Newton method that can be used for solving unconstrained nonlinear optimization problems.

BFGS is not guaranteed to converge unless the function has a quadratic Taylor expansion near an optimum. However, BFGS can have acceptable performance even for non-smooth optimization instances.

Below, only details of the optional algorithm options are listed. For the mandatory arguments see Internal optimizers for estimagic. For more background on those options, see Naming conventions for optional arguments.

Parameters
  • convergence_absolute_gradient_tolerance (float) – Stop if all elements of the gradient are smaller than this.

  • stopping_max_iterations (int) – If the maximum number of iterations is reached, the optimization stops, but we do not count this as convergence.

  • norm (float) – Order of the vector norm that is used to calculate the gradient’s “score” that is compared to the gradient tolerance to determine convergence. Defaut is infinite which means that the largest entry of the gradient vector is compared to the gradient tolerance.

Returns

See Output of internal optimizers for details.

Return type

dict

estimagic.optimization.scipy_optimizers.scipy_conjugate_gradient(criterion_and_derivative, x, *, convergence_absolute_gradient_tolerance=1e-05, stopping_max_iterations=1000000, norm=numpy.inf)[source]

Minimize a function using a nonlinear conjugate gradient algorithm.

Do not call this function directly but pass its name “scipy_conjugate_gradient” to estimagic’s maximize or minimize function as algorithm argument. Specify your desired arguments as a dictionary and pass them as algo_options to minimize or maximize.

The conjugate gradient method finds functions’ local optima using just the gradient.

This conjugate gradient algorithm is based on that of Polak and Ribiere, detailed in [nag3], pp. 120-122.

Conjugate gradient methods tend to work better when:

  • the criterion has a unique global minimizing point, and no local minima or other stationary points.

  • the criterion is, at least locally, reasonably well approximated by a quadratic function.

  • the criterion is continuous and has a continuous gradient.

  • the gradient is not too large, e.g., has a norm less than 1000.

  • The initial guess is reasonably close to the criterion’s global minimizer.

Below, only details of the optional algorithm options are listed. For the mandatory arguments see Internal optimizers for estimagic. For more background on those options, see Naming conventions for optional arguments.

Parameters
  • convergence_absolute_gradient_tolerance (float) – Stop if all elements of the gradient are smaller than this.

  • stopping_max_iterations (int) – If the maximum number of iterations is reached, the optimization stops, but we do not count this as convergence.

  • norm (float) – Order of the vector norm that is used to calculate the gradient’s “score” that is compared to the gradient tolerance to determine convergence. Defaut is infinite which means that the largest entry of the gradient vector is compared to the gradient tolerance.

Returns

See Output of internal optimizers for details.

Return type

dict

estimagic.optimization.scipy_optimizers.scipy_newton_cg(criterion_and_derivative, x, *, convergence_relative_params_tolerance=1e-05, stopping_max_iterations=1000000)[source]

Minimize a scalar function using Newton’s conjugate gradient algorithm.

Do not call this function directly but pass its name “scipy_newton_cg” to estimagic’s maximize or minimize function as algorithm argument. Specify your desired arguments as a dictionary and pass them as algo_options to minimize or maximize.

Warning

In our benchmark using a quadratic objective function, the truncated newton algorithm did not find the optimum very precisely (less than 4 decimal places). If you require high precision, you should refine an optimum found with Powell with another local optimizer.

Newton’s conjugate gradient algorithm uses an approximation of the Hessian to find the minimum of a function. It is practical for small and large problems (see [nag3], p. 140).

Newton-CG methods are also called truncated Newton methods. This function differs scipy_truncated_newton because

  • scipy_newton_cg’s algorithm is written purely in Python using NumPy and scipy while scipy_truncated_newton’s algorithm calls a C function.

  • scipy_newton_cg’s algorithm is only for unconstrained minimization while scipy_truncated_newton’s algorithm supports bounds.

Conjugate gradient methods tend to work better when:

  • the criterion has a unique global minimizing point, and no local minima or other stationary points.

  • the criterion is, at least locally, reasonably well approximated by a quadratic function.

  • the criterion is continuous and has a continuous gradient.

  • the gradient is not too large, e.g., has a norm less than 1000.

  • The initial guess is reasonably close to the criterion’s global minimizer.

Below, only details of the optional algorithm options are listed. For the mandatory arguments see Internal optimizers for estimagic. For more background on those options, see Naming conventions for optional arguments.

Parameters
  • convergence_relative_params_tolerance (float) – Stop when the relative movement between parameter vectors is smaller than this. Newton CG uses the average relative change in the parameters for determining the convergence.

  • stopping_max_iterations (int) – If the maximum number of iterations is reached, the optimization stops, but we do not count this as convergence.

Returns

See Output of internal optimizers for details.

Return type

dict

estimagic.optimization.scipy_optimizers.scipy_cobyla(criterion_and_derivative, x, *, stopping_max_iterations=1000000, convergence_relative_params_tolerance=1e-05, trustregion_initial_radius=None)[source]

Minimize a scalar function of one or more variables using the COBYLA algorithm.

Do not call this function directly but pass its name “scipy_cobyla” to estimagic’s maximize or minimize function as algorithm argument. Specify your desired arguments as a dictionary and pass them as algo_options to minimize or maximize.

COBYLA stands for Constrained Optimization By Linear Approximation. It is deriviative-free and supports nonlinear inequality and equality constraints.

Note

Cobyla’s general nonlinear constraints is not supported yet by estimagic.

Scipy’s implementation wraps the FORTRAN implementation of the algorithm.

For more information on COBYLA see [nag4], [nag5] and [nag6].

Below, only details of the optional algorithm options are listed. For the mandatory arguments see Internal optimizers for estimagic. For more background on those options, see Naming conventions for optional arguments.

Parameters
  • stopping_max_iterations (int) – If the maximum number of iterations is reached, the optimization stops, but we do not count this as convergence.

  • convergence_relative_params_tolerance (float) – Stop when the relative movement between parameter vectors is smaller than this. In case of COBYLA this is a lower bound on the size of the trust region and can be seen as the required accuracy in the variables but this accuracy is not guaranteed.

  • trustregion_initial_radius (float) – Initial value of the trust region radius. Since a linear approximation is likely only good near the current simplex, the linear program is given the further requirement that the solution, which will become the next evaluation point must be within a radius RHO_j from x_j. RHO_j only decreases, never increases. The initial RHO_j is the trustregion_initial_radius. In this way COBYLA’s iterations behave like a trust region algorithm.

Returns

See Output of internal optimizers for details.

Return type

dict

estimagic.optimization.scipy_optimizers.scipy_truncated_newton(criterion_and_derivative, x, lower_bounds, upper_bounds, *, stopping_max_criterion_evaluations=1000000, stopping_max_iterations=1000000, convergence_absolute_criterion_tolerance=0, convergence_absolute_params_tolerance=0, convergence_absolute_gradient_tolerance=1e-05, func_min_estimate=0, max_hess_evaluations_per_iteration=- 1, max_step_for_line_search=0, line_search_severity=- 1, finitie_difference_precision=0, criterion_rescale_factor=- 1)[source]

Minimize a scalar function using truncated Newton algorithm.

Do not call this function directly but pass its name “scipy_truncated_newton” to estimagic’s maximize or minimize function as algorithm argument. Specify your desired arguments as a dictionary and pass them as algo_options to minimize or maximize.

This function differs from scipy_newton_cg because

  • scipy_newton_cg’s algorithm is written purely in Python using NumPy and scipy while scipy_truncated_newton’s algorithm calls a C function.

  • scipy_newton_cg’s algorithm is only for unconstrained minimization while scipy_truncated_newton’s algorithm supports bounds.

Conjugate gradient methods tend to work better when:

  • the criterion has a unique global minimizing point, and no local minima or other stationary points.

  • the criterion is, at least locally, reasonably well approximated by a quadratic function.

  • the criterion is continuous and has a continuous gradient.

  • the gradient is not too large, e.g., has a norm less than 1000.

  • The initial guess is reasonably close to the criterion’s global minimizer.

estimagic does not support the scale nor offset argument as they are not compatible with the way estimagic handles constraints. It also does not support messg_num which is an additional way to control the verbosity of the optimizer.

Below, only details of the optional algorithm options are listed. For the mandatory arguments see Internal optimizers for estimagic. For more background on those options, see Naming conventions for optional arguments.

Parameters
  • func_min_estimate (float) – Minimum function value estimate. Defaults to 0.

  • stopping_max_iterations (int) – If the maximum number of iterations is reached, the optimization stops, but we do not count this as convergence.

  • stopping_max_criterion_evaluations (int) – If the maximum number of function evaluation is reached, the optimization stops but we do not count this as convergence.

  • convergence_absolute_params_tolerance (float) – Absolute difference in parameters between iterations after scaling that is tolerated to declare convergence.

  • convergence_absolute_criterion_tolerance (float) – Absolute difference in the criterion value between iterations after scaling that is tolerated to declare convergence.

  • convergence_absolute_gradient_tolerance (float) – Stop if the value of the projected gradient (after applying x scaling factors) is smaller than this. If convergence_absolute_gradient_tolerance < 0.0, convergence_absolute_gradient_tolerance is set to 1e-2 * sqrt(accuracy).

  • max_hess_evaluations_per_iteration (int) – Maximum number of hessian*vector evaluations per main iteration. If max_hess_evaluations == 0, the direction chosen is - gradient. If max_hess_evaluations < 0, max_hess_evaluations is set to max(1,min(50,n/2)) where n is the length of the parameter vector. This is also the default.

  • max_step_for_line_search (float) – Maximum step for the line search. It may be increased during the optimization. If too small, it will be set to 10.0. By default we use scipy’s default.

  • line_search_severity (float) – Severity of the line search. If < 0 or > 1, set to 0.25. Estimagic defaults to scipy’s default.

  • finitie_difference_precision (float) – Relative precision for finite difference calculations. If <= machine_precision, set to sqrt(machine_precision). Estimagic defaults to scipy’s default.

  • criterion_rescale_factor (float) – Scaling factor (in log10) used to trigger criterion rescaling. If 0, rescale at each iteration. If a large value, never rescale. If < 0, rescale is set to 1.3. Estimagic defaults to scipy’s default.

Returns

See Output of internal optimizers for details.

Return type

dict

estimagic.optimization.scipy_optimizers.scipy_trust_constr(criterion_and_derivative, x, lower_bounds, upper_bounds, *, convergence_absolute_gradient_tolerance=1e-08, convergence_relative_params_tolerance=1e-05, stopping_max_iterations=1000000, trustregion_initial_radius=None)[source]

Minimize a scalar function of one or more variables subject to constraints.

Do not call this function directly but pass its name “scipy_trust_constr” to estimagic’s maximize or minimize function as algorithm argument. Specify your desired arguments as a dictionary and pass them as algo_options to minimize or maximize.

Warning

In our benchmark using a quadratic objective function, the trust_constr algorithm did not find the optimum very precisely (less than 4 decimal places). If you require high precision, you should refine an optimum found with Powell with another local optimizer.

Note

Its general nonlinear constraints’ handling is not supported yet by estimagic.

It swiches between two implementations depending on the problem definition. It is the most versatile constrained minimization algorithm implemented in SciPy and the most appropriate for large-scale problems. For equality constrained problems it is an implementation of Byrd-Omojokun Trust-Region SQP method described in [nag7] and in [nag8], p. 549. When inequality constraints are imposed as well, it swiches to the trust-region interior point method described in [nag9]. This interior point algorithm in turn, solves inequality constraints by introducing slack variables and solving a sequence of equality-constrained barrier problems for progressively smaller values of the barrier parameter. The previously described equality constrained SQP method is used to solve the subproblems with increasing levels of accuracy as the iterate gets closer to a solution.

It approximates the Hessian using the Broyden-Fletcher-Goldfarb-Shanno (BFGS) Hessian update strategy.

Below, only details of the optional algorithm options are listed. For the mandatory arguments see Internal optimizers for estimagic. For more background on those options, see Naming conventions for optional arguments.

Parameters
  • convergence_absolute_gradient_tolerance (float) – Tolerance for termination by the norm of the Lagrangian gradient. The algorithm will terminate when both the infinity norm (i.e., max abs value) of the Lagrangian gradient and the constraint violation are smaller than the convergence_absolute_gradient_tolerance. For this algorithm we use scipy’s gradient tolerance for trust_constr. This smaller tolerance is needed for the sum of sqares tests to pass.

  • stopping_max_iterations (int) – If the maximum number of iterations is reached, the optimization stops, but we do not count this as convergence.

  • convergence_relative_params_tolerance (float) – Tolerance for termination by the change of the independent variable. The algorithm will terminate when the radius of the trust region used in the algorithm is smaller than the convergence_relative_params_tolerance.

  • trustregion_initial_radius (float) – Initial value of the trust region radius. The trust radius gives the maximum distance between solution points in consecutive iterations. It reflects the trust the algorithm puts in the local approximation of the optimization problem. For an accurate local approximation the trust-region should be large and for an approximation valid only close to the current point it should be a small one. The trust radius is automatically updated throughout the optimization process, with trustregion_initial_radius being its initial value.

Returns

See Output of internal optimizers for details.

Return type

dict

Algorithms from the Toolkit for Advanced Optimization (TAO)

At the moment, estimagic only supports TAO’s POUNDERs algorithm.

The POUNDERs algorithm by Stefan Wild is tailored to minimize a non-linear sum of squares objective function. Remember to cite [nag10] when using POUNDERs in addition to estimagic.

To use POUNDERs you need to have petsc4py installed.

estimagic.optimization.tao_optimizers.tao_pounders(criterion_and_derivative, x, lower_bounds, upper_bounds, *, convergence_absolute_gradient_tolerance=1e-05, convergence_relative_gradient_tolerance=1e-08, convergence_scaled_gradient_tolerance=1e-08, trustregion_initial_radius=None, stopping_max_iterations=1000000)[source]

Minimize a function using the POUNDERs algorithm.

Do not call this function directly but pass its name “tao_pounders” to estimagic’s maximize or minimize function as algorithm argument. Specify your desired arguments as a dictionary and pass them as algo_options to minimize or maximize.

POUNDERs ([nag11], [nag10], GitHub repository) can be a useful tool for economists who estimate structural models using indirect inference, because unlike commonly used algorithms such as Nelder-Mead, POUNDERs is tailored for minimizing a non-linear sum of squares objective function, and therefore may require fewer iterations to arrive at a local optimum than Nelder-Mead.

The criterion function func() should return a dictionary with the following fields:

  1. "value": The sum of squared (potentially weighted) errors.

  2. "root_contributions": An array containing the root (weighted) contributions.

Scaling the problem is necessary such that bounds correspond to the unit hypercube \([0, 1]^n\). For unconstrained problems, scale each parameter such that unit changes in parameters result in similar order-of-magnitude changes in the criterion value(s).

POUNDERs has several convergence criteria. Let \(X\) be the current parameter vector, \(X_0\) the initial parameter vector, \(g\) the gradient, and \(f\) the criterion function.

absolute_gradient_tolerance stops the optimization if the norm of the gradient falls below \(\epsilon\).

\[||g(X)|| < \epsilon\]

relative_gradient_tolerance stops the optimization if the norm of the gradient relative to the criterion value falls below \(epsilon\).

\[||g(X)|| / |f(X)| < \epsilon\]

scaled_gradient_tolerance stops the optimization if the norm of the gradient is lower than some fraction \(epsilon\) of the norm of the gradient at the initial parameters.

\[||g(X)|| / ||g(X0)|| < \epsilon\]
Parameters
  • absolute_gradient_tolerance (float) – Stop if relative norm of gradient is less than this. If set to False the algorithm will not consider absolute_gradient_tolerance.

  • relative_gradient_tolerance (float) – Stop if norm of gradient is less than this. If set to False the algorithm will not consider relative_gradient_tolerance.

  • scaled_gradient_tolerance (float) – Stop if norm of gradient is reduced by this factor. If set to False the algorithm will not consider relative_gradient_tolerance.

  • trustregion_initial_radius (float) – Initial value of the trust region radius. It must be \(> 0\).

  • stopping_max_iterations (int) – Alternative Stopping criterion. If set the routine will stop after the number of specified iterations or after the step size is sufficiently small. If the variable is set the default criteria will all be ignored.

Returns

Dictionary with processed optimization results.

Return type

results (dict)

Algorithms from the Numerical Algorithms Group (NAG)

Currently, estimagic supports the Derivative-Free Optimizer for Least-Squares Minimization (DF-OLS) and BOBYQA by the Numerical Algorithms Group.

To use DF-OLS you need to have the dfols package installed. BOBYQA requires the pybobyqa package .

estimagic.optimization.nag_optimizers.nag_dfols(criterion_and_derivative, x, lower_bounds, upper_bounds, *, clip_criterion_if_overflowing=True, convergence_minimal_trustregion_radius_tolerance=1e-08, convergence_noise_corrected_criterion_tolerance=1.0, convergence_scaled_criterion_tolerance=0.0, convergence_slow_progress=None, initial_directions='coordinate', interpolation_rounding_error=0.1, noise_additive_level=None, noise_multiplicative_level=None, noise_n_evals_per_point=None, random_directions_orthogonal=True, stopping_max_criterion_evaluations=1000000, threshold_for_safety_step=0.5, trustregion_expansion_factor_successful=2.0, trustregion_expansion_factor_very_successful=4.0, trustregion_fast_start_options=None, trustregion_initial_radius=None, trustregion_method_to_replace_extra_points='geometry_improving', trustregion_n_extra_points_to_replace_successful=0, trustregion_n_interpolation_points=None, trustregion_precondition_interpolation=True, trustregion_reset_options=None, trustregion_shrinking_factor_not_successful=None, trustregion_shrinking_factor_lower_radius=None, trustregion_shrinking_factor_upper_radius=None, trustregion_threshold_successful=0.1, trustregion_threshold_very_successful=0.7)[source]

Minimize a function with least squares structure using DFO-LS.

Do not call this function directly but pass its name “nag_dfols” to estimagic’s maximize or minimize function as algorithm argument. Specify your desired arguments as a dictionary and pass them as algo_options to minimize or maximize.

The DFO-LS algorithm [nag12] is designed to solve the nonlinear least-squares minimization problem (with optional bound constraints). Remember to cite [nag12] when using DF-OLS in addition to estimagic.

\[\begin{split}\min_{x\in\mathbb{R}^n} &\quad f(x) := \sum_{i=1}^{m}r_{i}(x)^2 \\ \text{s.t.} &\quad \text{lower_bounds} \leq x \leq \text{upper_bounds}\end{split}\]

The \(r_{i}\) are called root contributions in estimagic.

DFO-LS is a derivative-free optimization algorithm, which means it does not require the user to provide the derivatives of f(x) or \(r_{i}(x)\), nor does it attempt to estimate them internally (by using finite differencing, for instance).

There are two main situations when using a derivative-free algorithm (such as DFO-LS) is preferable to a derivative-based algorithm (which is the vast majority of least-squares solvers):

  1. If the residuals are noisy, then calculating or even estimating their derivatives may be impossible (or at least very inaccurate). By noisy, we mean that if we evaluate \(r_{i}(x)\) multiple times at the same value of x, we get different results. This may happen when a Monte Carlo simulation is used, for instance.

  2. If the residuals are expensive to evaluate, then estimating derivatives (which requires n evaluations of each \(r_{i}(x)\) for every point of interest x) may be prohibitively expensive. Derivative-free methods are designed to solve the problem with the fewest number of evaluations of the criterion as possible.

To read the detailed documentation of the algorithm click here.

There are four possible convergence criteria:

  1. when the lower trust region radius is shrunk below a minimum (convergence_minimal_trustregion_radius_tolerance).

  2. when the improvements of iterations become very small (convergence_slow_progress). This is very similar to relative_criterion_tolerance but convergence_slow_progress is more general allowing to specify not only the threshold for convergence but also a period over which the improvements must have been very small.

  3. when a sufficient reduction to the criterion value at the start parameters has been reached, i.e. when \(\frac{f(x)}{f(x_0)} \leq \text{convergence_scaled_criterion_tolerance}\)

  4. when all evaluations on the interpolation points fall within a scaled version of the noise level of the criterion function. This is only applicable if the criterion function is noisy. You can specify this criterion with convergence_noise_corrected_criterion_tolerance.

DF-OLS supports resetting the optimization and doing a fast start by starting with a smaller interpolation set and growing it dynamically. For more information see their detailed documentation and [nag12].

Parameters
  • clip_criterion_if_overflowing (bool) – see The default algorithm options.

  • convergence_minimal_trustregion_radius_tolerance (float) – see The default algorithm options.

  • convergence_noise_corrected_criterion_tolerance (float) –

    Stop when the evaluations on the set of interpolation points all fall within this factor of the noise level. The default is 1, i.e. when all evaluations are within the noise level. If you want to not use this criterion but still flag your criterion function as noisy, set this tolerance to 0.0.

    Warning

    Very small values, as in most other tolerances don’t make sense here.

  • convergence_scaled_criterion_tolerance (float) – Terminate if a point is reached where the ratio of the criterion value to the criterion value at the start params is below this value, i.e. if \(f(x_k)/f(x_0) \leq \text{convergence_scaled_criterion_tolerance}\). Note this is deactivated unless the lowest mathematically possible criterion value (0.0) is actually achieved.

  • convergence_slow_progress (dict) – Arguments for converging when the evaluations over several iterations only yield small improvements on average, see see The default algorithm options for details.

  • initial_directions (str) – see The default algorithm options.

  • interpolation_rounding_error (float) – see The default algorithm options.

  • noise_additive_level (float) – Used for determining the presence of noise and the convergence by all interpolation points being within noise level. 0 means no additive noise. Only multiplicative or additive is supported.

  • noise_multiplicative_level (float) – Used for determining the presence of noise and the convergence by all interpolation points being within noise level. 0 means no multiplicative noise. Only multiplicative or additive is supported.

  • noise_n_evals_per_point (callable) – How often to evaluate the criterion function at each point. This is only applicable for criterion functions with noise, when averaging multiple evaluations at the same point produces a more accurate value. The input parameters are the upper_trustregion_radius (\(\Delta\)), the lower_trustregion_radius (\(\rho\)), how many iterations the algorithm has been running for, n_iterations and how many resets have been performed, n_resets. The function must return an integer. Default is no averaging (i.e. noise_n_evals_per_point(...) = 1).

  • random_directions_orthogonal (bool) – see The default algorithm options.

  • stopping_max_criterion_evaluations (int) – see The default algorithm options.

  • threshold_for_safety_step (float) – see The default algorithm options.

  • trustregion_expansion_factor_successful (float) – see The default algorithm options.

  • trustregion_expansion_factor_very_successful (float) – see The default algorithm options.

  • trustregion_fast_start_options (dict) – see The default algorithm options.

  • trustregion_initial_radius (float) – Initial value of the trust region radius.

  • trustregion_method_to_replace_extra_points (str) – If replacing extra points in successful iterations, whether to use geometry improving steps or the momentum method. Can be “geometry_improving” or “momentum”.

  • trustregion_n_extra_points_to_replace_successful (int) – The number of extra points (other than accepting the trust region step) to replace. Useful when trustregion_n_interpolation_points > len(x) + 1.

  • trustregion_n_interpolation_points (int) – The number of interpolation points to use. The default is len(x) + 1. If using resets, this is the number of points to use in the first run of the solver, before any resets.

  • trustregion_precondition_interpolation (bool) – see The default algorithm options.

  • trustregion_shrinking_factor_not_successful (float) – see The default algorithm options.

  • trustregion_shrinking_factor_lower_radius (float) – see The default algorithm options.

  • trustregion_shrinking_factor_upper_radius (float) – see The default algorithm options.

  • trustregion_threshold_successful (float) – Share of the predicted improvement that has to be achieved for a trust region iteration to count as successful.

  • trustregion_threshold_very_successful (float) – Share of the predicted improvement that has to be achieved for a trust region iteration to count as very successful.

Returns

See Output of internal optimizers for details.

Return type

results (dict)

estimagic.optimization.nag_optimizers.nag_pybobyqa(criterion_and_derivative, x, lower_bounds, upper_bounds, *, clip_criterion_if_overflowing=True, convergence_criterion_value=None, convergence_minimal_trustregion_radius_tolerance=1e-08, convergence_noise_corrected_criterion_tolerance=1.0, convergence_slow_progress=None, initial_directions='coordinate', interpolation_rounding_error=0.1, noise_additive_level=None, noise_multiplicative_level=None, noise_n_evals_per_point=None, random_directions_orthogonal=True, seek_global_optimum=False, stopping_max_criterion_evaluations=1000000, threshold_for_safety_step=0.5, trustregion_expansion_factor_successful=2.0, trustregion_expansion_factor_very_successful=4.0, trustregion_initial_radius=None, trustregion_minimum_change_hession_for_underdetermined_interpolation=True, trustregion_n_interpolation_points=None, trustregion_precondition_interpolation=True, trustregion_reset_options=None, trustregion_shrinking_factor_not_successful=None, trustregion_shrinking_factor_lower_radius=None, trustregion_shrinking_factor_upper_radius=None, trustregion_threshold_successful=0.1, trustregion_threshold_very_successful=0.7)[source]

Minimize a function using the BOBYQA algorithm.

Do not call this function directly but pass its name “nag_pybobyqa” to estimagic’s maximize or minimize function as algorithm argument. Specify your desired arguments as a dictionary and pass them as algo_options to minimize or maximize.

BOBYQA ([nag13], [nag14], [nag15]) is a derivative-free trust-region method. It is designed to solve nonlinear local minimization problems.

Remember to cite [nag13] and [nag14] when using pybobyqa in addition to estimagic. If you take advantage of the seek_global_optimum option, cite [nag15] additionally.

There are two main situations when using a derivative-free algorithm like BOBYQA is preferable to derivative-based algorithms:

  1. The criterion function is not deterministic, i.e. if we evaluate the criterion function multiple times at the same parameter vector we get different results.

  2. The criterion function is very expensive to evaluate and only finite differences are available to calculate its derivative.

The detailed documentation of the algorithm can be found here.

There are four possible convergence criteria:

  1. when the trust region radius is shrunk below a minimum. This is approximately equivalent to an absolute parameter tolerance.

  2. when the criterion value falls below an absolute, user-specified value, the optimization terminates successfully.

  3. when insufficient improvements have been gained over a certain number of iterations. The (absolute) threshold for what constitutes an insufficient improvement, how many iterations have to be insufficient and with which iteration to compare can all be specified by the user.

  4. when all evaluations on the interpolation points fall within a scaled version of the noise level of the criterion function. This is only applicable if the criterion function is noisy.

Parameters
  • clip_criterion_if_overflowing (bool) – see The default algorithm options.

  • convergence_criterion_value (float) – Terminate successfully if the criterion value falls below this threshold. This is deactivated (i.e. set to -inf) by default.

  • convergence_minimal_trustregion_radius_tolerance (float) – Minimum allowed value of the trust region radius, which determines when a successful termination occurs.

  • convergence_noise_corrected_criterion_tolerance (float) –

    Stop when the evaluations on the set of interpolation points all fall within this factor of the noise level. The default is 1, i.e. when all evaluations are within the noise level. If you want to not use this criterion but still flag your criterion function as noisy, set this tolerance to 0.0.

    Warning

    Very small values, as in most other tolerances don’t make sense here.

  • convergence_slow_progress (dict) – Arguments for converging when the evaluations over several iterations only yield small improvements on average, see see The default algorithm options for details.

  • initial_directions (str) – see The default algorithm options.

  • interpolation_rounding_error (float) – see The default algorithm options.

  • noise_additive_level (float) – Used for determining the presence of noise and the convergence by all interpolation points being within noise level. 0 means no additive noise. Only multiplicative or additive is supported.

  • noise_multiplicative_level (float) – Used for determining the presence of noise and the convergence by all interpolation points being within noise level. 0 means no multiplicative noise. Only multiplicative or additive is supported.

  • noise_n_evals_per_point (callable) – How often to evaluate the criterion function at each point. This is only applicable for criterion functions with noise, when averaging multiple evaluations at the same point produces a more accurate value. The input parameters are the upper_trustregion_radius (delta), the lower_trustregion_radius (rho), how many iterations the algorithm has been running for, n_iterations and how many resets have been performed, n_resets. The function must return an integer. Default is no averaging (i.e. noise_n_evals_per_point(...) = 1).

  • random_directions_orthogonal (bool) – see The default algorithm options.

  • seek_global_optimum (bool) – whether to apply the heuristic to escape local minima presented in [nag15]. Only applies for noisy criterion functions.

  • stopping_max_criterion_evaluations (int) – see The default algorithm options.

  • threshold_for_safety_step (float) – see The default algorithm options.

  • trustregion_expansion_factor_successful (float) – see The default algorithm options.

  • trustregion_expansion_factor_very_successful (float) – see The default algorithm options.

  • trustregion_initial_radius (float) – Initial value of the trust region radius.

  • trustregion_minimum_change_hession_for_underdetermined_interpolation (bool) – Whether to solve the underdetermined quadratic interpolation problem by minimizing the Frobenius norm of the Hessian, or change in Hessian.

  • trustregion_n_interpolation_points (int) –

    The number of interpolation points to use. With $n=len(x)$ the default is $2n+1$ if the criterion is not noisy. Otherwise, it is set to $(n+1)(n+2)/2)$.

    Larger values are particularly useful for noisy problems. Py-BOBYQA requires

    \[n + 1 \leq \text{trustregion_n_interpolation_points} \leq (n+1)(n+2)/2.\]

  • trustregion_precondition_interpolation (bool) – see The default algorithm options.

  • trustregion_reset_options (dict) – Options for resetting the optimization, see The default algorithm options for details.

  • trustregion_shrinking_factor_not_successful (float) – see The default algorithm options.

  • trustregion_shrinking_factor_upper_radius (float) – see The default algorithm options.

  • trustregion_shrinking_factor_lower_radius (float) – see The default algorithm options.

  • trustregion_threshold_successful (float) – see The default algorithm options.

  • trustregion_threshold_very_successful (float) – see The default algorithm options.

Returns

See Output of internal optimizers for details.

Return type

results (dict)

References

nag1

Dieter Kraft. A software package for sequential quadratic programming. Technical Report, DLR German Aerospace Center – Institute for Flight Mechanics, Köln, Germany, 1988. URL: http://degenerateconic.com/wp-content/uploads/2018/03/DFVLR_FB_88_28.pdf.

nag2

Fuchang Gao and Lixing Han. Implementing the nelder-mead simplex algorithm with adaptive parameters. Computational Optimization and Applications, 51(1):259–277, 2012.

nag3(1,2)

Jorge Nocedal and Stephen Wright. Numerical optimization. Springer Science & Business Media, 2006.

nag4

Michael JD Powell. A direct search optimization method that models the objective and constraint functions by linear interpolation. In Advances in optimization and numerical analysis, pages 51–67. Springer, 1994.

nag5

Michael JD Powell. Direct search algorithms for optimization calculations. Acta numerica, pages 287–336, 1998.

nag6

Michael JD Powell. A view of algorithms for optimization without derivatives. Mathematics Today-Bulletin of the Institute of Mathematics and its Applications, 43(5):170–174, 2007.

nag7

Marucha Lalee, Jorge Nocedal, and Todd Plantenga. On the implementation of an algorithm for large-scale equality constrained optimization. SIAM Journal on Optimization, 8(3):682–706, 1998.

nag8

AR Conn, NI Gould, and PL Toint. Nonlinear equations and nonlinear fitting. In Trust region methods, volume 1, pages 749–774. Siam, 2000.

nag9

Richard H Byrd, Mary E Hribar, and Jorge Nocedal. An interior point algorithm for large-scale nonlinear programming. SIAM Journal on Optimization, 9(4):877–900, 1999.

nag10(1,2)

Stefan M. Wild. Solving derivative-free nonlinear least squares problems with pounders. Technical Report, Argonne National Laboratory, 2015. URL: https://www.mcs.anl.gov/papers/P5120-0414.pdf.

nag11

S Benson, LC McInnes, JJ Moré, T Munson, and J Sarich. Tao user manual (revision 3.7). Technical Report, Technical Report ANL/MCS-TM-322, Argonne National Laboratory, 2017. URL: http://web.mit.edu/tao-petsc_v3.7/tao_manual.pdf.

nag12(1,2,3)

Coralia Cartis, Jan Fiala, Benjamin Marteau, and Lindon Roberts. Improving the flexibility and robustness of model-based derivative-free optimization solvers. 2018. arXiv:1804.00154.

nag13(1,2)

Michael JD Powell. The bobyqa algorithm for bound constrained optimization without derivatives. Cambridge NA Report NA2009/06, University of Cambridge, Cambridge, pages 26–46, 2009.

nag14(1,2)

Coralia Cartis, Jan Fiala, Benjamin Marteau, and Lindon Roberts. Improving the flexibility and robustness of model-based derivative-free optimization solvers. 2018. arXiv:1804.00154.

nag15(1,2,3)

Coralia Cartis, Lindon Roberts, and Oliver Sheridan-Methven. Escaping local minima with derivative-free methods: a numerical investigation. 2018. arXiv:1812.11343.