Download the notebook here
[1]:
import numpy as np
import pandas as pd
from estimagic import minimize
Which optimizer to use#
This is the very very very short guide on selecting a suitable optimization algorithm based on a minimum of information. We are working on a longer version that contains more background information and can be found here.
However, we will also keep this short guide for very impatient people who feel lucky enough.
To select an optimizer, you need to answer two questions:
Is your criterion function differentiable?
Do you have a nonlinear least squares structure (i.e. do you sum some kind of squared residuals at the end of your criterion function)?
Define some inputs#
Again, we use versions of the sphere function to illustrate how you select these algorithms in practice
[2]:
def sphere(params):
"""Spherical criterion function.
The unique local and global optimum of this function is at
the zero vector. It is differentiable, convex and extremely
well behaved in any possible sense.
Args:
params (pandas.DataFrame): DataFrame with the columns
"value", "lower_bound", "upper_bound" and potentially more.
Returns:
dict: A dictionary with the entries "value" and "root_contributions".
"""
out = {
"value": (params["value"] ** 2).sum(),
"root_contributions": params["value"],
}
return out
def sphere_gradient(params):
"""Gradient of spherical criterion function"""
return params["value"] * 2
start_params = pd.DataFrame(
data=np.arange(5) + 1,
columns=["value"],
index=[f"x_{i}" for i in range(5)],
)
start_params
[2]:
value | |
---|---|
x_0 | 1 |
x_1 | 2 |
x_2 | 3 |
x_3 | 4 |
x_4 | 5 |
Differentiable criterion function#
Use scipy_lbfsgsb
as optimizer and provide the closed form derivative if you can. If you do not provide a derivative, estimagic will calculate it numerically. However, this is less precise and slower.
[3]:
minimize(
criterion=sphere,
params=start_params,
algorithm="scipy_lbfgsb",
derivative=sphere_gradient,
)
[3]:
{'solution_x': array([ 1.11022302e-16, 2.22044605e-16, 0.00000000e+00, 4.44089210e-16,
-8.88178420e-16]),
'solution_criterion': 1.0477058897466563e-30,
'solution_derivative': array([ 2.22044605e-16, 4.44089210e-16, 0.00000000e+00, 8.88178420e-16,
-1.77635684e-15]),
'solution_hessian': None,
'n_criterion_evaluations': 3,
'n_derivative_evaluations': None,
'n_iterations': 2,
'success': True,
'reached_convergence_criterion': None,
'message': b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL',
'solution_params': lower_bound upper_bound value
x_0 -inf inf 1.110223e-16
x_1 -inf inf 2.220446e-16
x_2 -inf inf 0.000000e+00
x_3 -inf inf 4.440892e-16
x_4 -inf inf -8.881784e-16}
Note that this solves a 5 dimensional problem with just 3 criterion evaluations. For higher dimensions it will need more, but it scales very well to dozens and hundreds of parameters.
If you are worried about being stuck in a local optimum, start the optimization several times from random start values and take the best solution of all runs. This will still be much faster than using a global optimizer.
Not differentiable, only scalar output#
Use nag_pybobyqa
. Note that for this you need to install the PyBOBYQA
package if you do not already have it:
pip install Py-BOBYQA
Then you select the algorithm as follows:
[4]:
minimize(criterion=sphere, params=start_params, algorithm="nag_pybobyqa")
[4]:
{'solution_criterion': 1.2999837387099211e-43,
'n_criterion_evaluations': 33,
'message': 'Success: rho has reached rhoend',
'success': True,
'reached_convergence_criterion': None,
'solution_x': array([-2.68103994e-22, -7.69644900e-23, 1.87575580e-22, -1.07763474e-22,
-7.34678507e-23]),
'solution_derivative': array([9.99200776e-15, 2.22044606e-14, 1.42108543e-14, 2.15526949e-22,
1.24344980e-14]),
'solution_hessian': array([[ 2.00000000e+00, -2.39216631e-15, 5.40507868e-16,
-4.95351595e-16, 1.78148798e-15],
[-2.39216631e-15, 2.00000000e+00, 1.02427464e-15,
-2.35943190e-15, 4.50984317e-16],
[ 5.40507868e-16, 1.02427464e-15, 2.00000000e+00,
-3.80803391e-15, -7.00163695e-16],
[-4.95351595e-16, -2.35943190e-15, -3.80803391e-15,
2.00000000e+00, 3.87423301e-15],
[ 1.78148798e-15, 4.50984317e-16, -7.00163695e-16,
3.87423301e-15, 2.00000000e+00]]),
'solution_params': lower_bound upper_bound value
x_0 -inf inf -2.681040e-22
x_1 -inf inf -7.696449e-23
x_2 -inf inf 1.875756e-22
x_3 -inf inf -1.077635e-22
x_4 -inf inf -7.346785e-23,
'n_derivative_evaluations': 'Not reported by nag_pybobyqa',
'n_iterations': 'Not reported by nag_pybobyqa'}
Not differentiable, least squares structure#
Use nag_dfols
or tao_pounders
. To use nag_dfols
you need to install it via:
pip install DFO-LS
To use tao_pounders
you need to install petsc4py
via:
conda install petsc4py
.
Note that petsc4py
is only available on Linux on Ubuntu.
Both optimizers will only work if your criterion function returns a dictionary that contains the entry root_contributions
. This needs to be a numpy array or pandas.Series that contains the residuals of the least squares problem.
nag_dfols
performs better for noisy criterion functions. tao_pounders
performs better for deterministic but very nonlinear criterion functions.
[6]:
minimize(
criterion=sphere,
params=start_params,
algorithm="nag_dfols",
)
[6]:
{'solution_criterion': 9.590576455224451e-28,
'n_criterion_evaluations': 9,
'message': 'Success: Objective is sufficiently small',
'success': True,
'reached_convergence_criterion': None,
'solution_x': array([-1.33226763e-15, 1.99840144e-14, -1.90958360e-14, -6.21724894e-15,
-1.24344979e-14]),
'solution_params': lower_bound upper_bound value
x_0 -inf inf -1.332268e-15
x_1 -inf inf 1.998401e-14
x_2 -inf inf -1.909584e-14
x_3 -inf inf -6.217249e-15
x_4 -inf inf -1.243450e-14,
'solution_derivative': 'Not reported by nag_dfols',
'solution_hessian': 'Not reported by nag_dfols',
'n_derivative_evaluations': 'Not reported by nag_dfols',
'n_iterations': 'Not reported by nag_dfols'}
[7]:
minimize(
criterion=sphere,
params=start_params,
algorithm="tao_pounders",
)
[7]:
{'solution_x': array([1.34645209e-14, 1.39723623e-14, 5.99670193e-15, 2.22044605e-16,
2.22044605e-16]),
'solution_criterion': 4.125792718474871e-28,
'solution_derivative': None,
'solution_hessian': None,
'n_criterion_evaluations': 50,
'n_derivative_evaluations': None,
'n_iterations': None,
'success': True,
'reached_convergence_criterion': 'step size small',
'message': 'step size small',
'solution_criterion_values': array([1.34645209e-14, 1.39723623e-14, 5.99670193e-15, 2.22044605e-16,
2.22044605e-16]),
'gradient_norm': 2.031204745582008e-26,
'criterion_norm': 0.0,
'convergence_code': 6,
'solution_params': lower_bound upper_bound value
x_0 -inf inf 1.346452e-14
x_1 -inf inf 1.397236e-14
x_2 -inf inf 5.996702e-15
x_3 -inf inf 2.220446e-16
x_4 -inf inf 2.220446e-16}