Download the notebook here

[1]:
import numpy as np
import pandas as pd

from estimagic import minimize

Which optimizer to use

This is the very very very short guide on selecting a suitable optimization algorithm based on a minimum of information. We are working on a longer version that contains more background information and can be found here.

However, we will also keep this short guide for very impatient people who feel lucky enough.

To select an optimizer, you need to answer two questions:

  1. Is your criterion function differentiable?

  2. Do you have a nonlinear least squares structure (i.e. do you sum some kind of squared residuals at the end of your criterion function)?

Define some inputs

Again, we use versions of the sphere function to illustrate how you select these algorithms in practice

[2]:
def sphere(params):
    """Spherical criterion function.

    The unique local and global optimum of this function is at
    the zero vector. It is differentiable, convex and extremely
    well behaved in any possible sense.

    Args:
        params (pandas.DataFrame): DataFrame with the columns
            "value", "lower_bound", "upper_bound" and potentially more.

    Returns:
        dict: A dictionary with the entries "value" and "root_contributions".

    """
    out = {
        "value": (params["value"] ** 2).sum(),
        "root_contributions": params["value"],
    }
    return out


def sphere_gradient(params):
    """Gradient of spherical criterion function"""
    return params["value"] * 2


start_params = pd.DataFrame(
    data=np.arange(5) + 1,
    columns=["value"],
    index=[f"x_{i}" for i in range(5)],
)
start_params
[2]:
value
x_0 1
x_1 2
x_2 3
x_3 4
x_4 5

Differentiable criterion function

Use scipy_lbfsgsb as optimizer and provide the closed form derivative if you can. If you do not provide a derivative, estimagic will calculate it numerically. However, this is less precise and slower.

[3]:
minimize(
    criterion=sphere,
    params=start_params,
    algorithm="scipy_lbfgsb",
    derivative=sphere_gradient,
)
[3]:
{'solution_x': array([ 1.11022302e-16,  2.22044605e-16,  0.00000000e+00,  4.44089210e-16,
        -8.88178420e-16]),
 'solution_criterion': 1.0477058897466563e-30,
 'solution_derivative': array([ 2.22044605e-16,  4.44089210e-16,  0.00000000e+00,  8.88178420e-16,
        -1.77635684e-15]),
 'solution_hessian': None,
 'n_criterion_evaluations': 3,
 'n_derivative_evaluations': None,
 'n_iterations': 2,
 'success': True,
 'reached_convergence_criterion': None,
 'message': b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL',
 'solution_params':      lower_bound  upper_bound         value
 x_0         -inf          inf  1.110223e-16
 x_1         -inf          inf  2.220446e-16
 x_2         -inf          inf  0.000000e+00
 x_3         -inf          inf  4.440892e-16
 x_4         -inf          inf -8.881784e-16}

Note that this solves a 5 dimensional problem with just 3 criterion evaluations. For higher dimensions it will need more, but it scales very well to dozens and hundreds of parameters.

If you are worried about being stuck in a local optimum, start the optimization several times from random start values and take the best solution of all runs. This will still be much faster than using a global optimizer.

Not differentiable, only scalar output

Use nag_pybobyqa. Note that for this you need to install the PyBOBYQA package if you do not already have it:

pip install Py-BOBYQA

Then you select the algorithm as follows:

[4]:
minimize(criterion=sphere, params=start_params, algorithm="nag_pybobyqa")
[4]:
{'solution_criterion': 1.2999837387099211e-43,
 'n_criterion_evaluations': 33,
 'message': 'Success: rho has reached rhoend',
 'success': True,
 'reached_convergence_criterion': None,
 'solution_x': array([-2.68103994e-22, -7.69644900e-23,  1.87575580e-22, -1.07763474e-22,
        -7.34678507e-23]),
 'solution_derivative': array([9.99200776e-15, 2.22044606e-14, 1.42108543e-14, 2.15526949e-22,
        1.24344980e-14]),
 'solution_hessian': array([[ 2.00000000e+00, -2.39216631e-15,  5.40507868e-16,
         -4.95351595e-16,  1.78148798e-15],
        [-2.39216631e-15,  2.00000000e+00,  1.02427464e-15,
         -2.35943190e-15,  4.50984317e-16],
        [ 5.40507868e-16,  1.02427464e-15,  2.00000000e+00,
         -3.80803391e-15, -7.00163695e-16],
        [-4.95351595e-16, -2.35943190e-15, -3.80803391e-15,
          2.00000000e+00,  3.87423301e-15],
        [ 1.78148798e-15,  4.50984317e-16, -7.00163695e-16,
          3.87423301e-15,  2.00000000e+00]]),
 'solution_params':      lower_bound  upper_bound         value
 x_0         -inf          inf -2.681040e-22
 x_1         -inf          inf -7.696449e-23
 x_2         -inf          inf  1.875756e-22
 x_3         -inf          inf -1.077635e-22
 x_4         -inf          inf -7.346785e-23,
 'n_derivative_evaluations': 'Not reported by nag_pybobyqa',
 'n_iterations': 'Not reported by nag_pybobyqa'}

Not differentiable, least squares structure

Use nag_dfols or tao_pounders. To use nag_dfols you need to install it via:

pip install DFO-LS

To use tao_pounders you need to install petsc4py via:

conda install petsc4py.

Note that petsc4py is only available on Linux on Ubuntu.

Both optimizers will only work if your criterion function returns a dictionary that contains the entry root_contributions. This needs to be a numpy array or pandas.Series that contains the residuals of the least squares problem.

nag_dfols performs better for noisy criterion functions. tao_pounders performs better for deterministic but very nonlinear criterion functions.

[6]:
minimize(
    criterion=sphere,
    params=start_params,
    algorithm="nag_dfols",
)
[6]:
{'solution_criterion': 9.590576455224451e-28,
 'n_criterion_evaluations': 9,
 'message': 'Success: Objective is sufficiently small',
 'success': True,
 'reached_convergence_criterion': None,
 'solution_x': array([-1.33226763e-15,  1.99840144e-14, -1.90958360e-14, -6.21724894e-15,
        -1.24344979e-14]),
 'solution_params':      lower_bound  upper_bound         value
 x_0         -inf          inf -1.332268e-15
 x_1         -inf          inf  1.998401e-14
 x_2         -inf          inf -1.909584e-14
 x_3         -inf          inf -6.217249e-15
 x_4         -inf          inf -1.243450e-14,
 'solution_derivative': 'Not reported by nag_dfols',
 'solution_hessian': 'Not reported by nag_dfols',
 'n_derivative_evaluations': 'Not reported by nag_dfols',
 'n_iterations': 'Not reported by nag_dfols'}
[7]:
minimize(
    criterion=sphere,
    params=start_params,
    algorithm="tao_pounders",
)
[7]:
{'solution_x': array([1.34645209e-14, 1.39723623e-14, 5.99670193e-15, 2.22044605e-16,
        2.22044605e-16]),
 'solution_criterion': 4.125792718474871e-28,
 'solution_derivative': None,
 'solution_hessian': None,
 'n_criterion_evaluations': 50,
 'n_derivative_evaluations': None,
 'n_iterations': None,
 'success': True,
 'reached_convergence_criterion': 'step size small',
 'message': 'step size small',
 'solution_criterion_values': array([1.34645209e-14, 1.39723623e-14, 5.99670193e-15, 2.22044605e-16,
        2.22044605e-16]),
 'gradient_norm': 2.031204745582008e-26,
 'criterion_norm': 0.0,
 'convergence_code': 6,
 'solution_params':      lower_bound  upper_bound         value
 x_0         -inf          inf  1.346452e-14
 x_1         -inf          inf  1.397236e-14
 x_2         -inf          inf  5.996702e-15
 x_3         -inf          inf  2.220446e-16
 x_4         -inf          inf  2.220446e-16}