Download the notebook here
[26]:
import numpy as np
import pandas as pd
from estimagic import minimize
from estimagic.logging.read_log import read_optimization_iteration
First Optimization with estimagic#
This tutorial shows how to do an optimization with estimagic. It uses a very simple criterion function in order to focus on the mechanics of doing an optimization. More details on the topics covered here can be found in the how to guides.
Setting up criterion function and derivatives#
Criterion functions in estimagic take a DataFrame with parameters as first argument and return a dictionary that contains the output of the criterion function.
The output dictionary must contain the entry “value”, which is a scalar but can also contain an arbitrary number of additional entries. Entries with special meaning are “contributions” and “root_contributions”, which are used by specialized optimizers (e.g. nonlinear least squares optimizers use the “root_contributions”). All other entries are simply stored in a log file. If none of the optional entries are required, the criterion function can also simply return a scalar.
[27]:
def sphere(params):
"""Spherical criterion function.
The unique local and global optimum of this function is at
the zero vector. It is differentiable, convex and extremely
well behaved in any possible sense.
Args:
params (pandas.DataFrame): DataFrame with the columns
"value", "lower_bound", "upper_bound" and potentially more.
Returns:
dict: A dictionary with the entries "value" and "root_contributions".
"""
out = {
"value": (params["value"] ** 2).sum(),
"root_contributions": params["value"],
}
return out
def sphere_gradient(params):
"""Gradient of spherical criterion function"""
return params["value"] * 2
Setting up start parameters#
The start parameters must contain the column “value” but can also contain an arbitrary number of other columns. Columns with special meaning are “lower_bound”, “upper_bound”, “name” and “group”. The bounds are used during optimization, name and group are used in the dashboard.
They can have an arbitrary index or even MultiIndex. This is very helpful to organize parameters in a complex optimization problem.
[28]:
start_params = pd.DataFrame(
data=np.arange(5) + 1,
columns=["value"],
index=[f"x_{i}" for i in range(5)],
)
start_params
[28]:
value | |
---|---|
x_0 | 1 |
x_1 | 2 |
x_2 | 3 |
x_3 | 4 |
x_4 | 5 |
Running a simple optimization#
Estimagic’s minimize
function works similarly to scipy’s minimize
function. A big difference is however, that estimagic does not have a default optimization algorithm. This is on purpose, because the algorithm choice should always be dependent on the problem one wants to solve.
Another difference is that estimagic also has a maximize
function that works exactly as minimize
, but does a maximization.
The output of minimize
is a dictionary, that contains the solution parameters and criterion values as well as other information.
[29]:
res = minimize(
criterion=sphere,
params=start_params,
algorithm="scipy_lbfgsb",
derivative=sphere_gradient,
)
res
[29]:
{'solution_x': array([ 1.11022302e-16, 2.22044605e-16, 0.00000000e+00, 4.44089210e-16,
-8.88178420e-16]),
'solution_criterion': 1.0477058897466563e-30,
'solution_derivative': array([ 2.22044605e-16, 4.44089210e-16, 0.00000000e+00, 8.88178420e-16,
-1.77635684e-15]),
'solution_hessian': None,
'n_criterion_evaluations': 3,
'n_derivative_evaluations': None,
'n_iterations': 2,
'success': True,
'reached_convergence_criterion': None,
'message': b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL',
'solution_params': lower_bound upper_bound value
x_0 -inf inf 1.110223e-16
x_1 -inf inf 2.220446e-16
x_2 -inf inf 0.000000e+00
x_3 -inf inf 4.440892e-16
x_4 -inf inf -8.881784e-16}
[19]:
res["solution_params"].round(6)
[19]:
lower_bound | upper_bound | value | |
---|---|---|---|
x_0 | -inf | inf | 0.0 |
x_1 | -inf | inf | 0.0 |
x_2 | -inf | inf | 0.0 |
x_3 | -inf | inf | 0.0 |
x_4 | -inf | inf | -0.0 |
Running an optimization with a least squares optimizer#
Using a least squares optimizer in estimagic is exactly the same as using another optimizer. That is the goal and result of allowing the output of the criterion function to be a dictionary.
[20]:
res = minimize(
criterion=sphere,
params=start_params,
algorithm="tao_pounders",
derivative=sphere_gradient,
)
res["solution_params"].round(2)
[20]:
lower_bound | upper_bound | value | |
---|---|---|---|
x_0 | -inf | inf | 0.0 |
x_1 | -inf | inf | 0.0 |
x_2 | -inf | inf | 0.0 |
x_3 | -inf | inf | 0.0 |
x_4 | -inf | inf | 0.0 |
Available optimizers#
Which optimizers are available depends on the optional packages you have installed. For an overview check out this.
Adding bounds#
Bounds are simply added as additional columns in the start parameters. If a parameter has no bound, use np.inf
for upper bounds and -np.inf
for lower bounds.
[21]:
params_with_bounds = start_params.copy()
params_with_bounds["lower_bound"] = [0, 1, 0, -1, 0]
params_with_bounds["upper_bound"] = [np.inf] * 5
res = minimize(
criterion=sphere,
params=params_with_bounds,
algorithm="scipy_lbfgsb",
derivative=sphere_gradient,
)
res["solution_params"].round(6)
[21]:
lower_bound | upper_bound | value | |
---|---|---|---|
x_0 | 0.0 | inf | 0.0 |
x_1 | 1.0 | inf | 1.0 |
x_2 | 0.0 | inf | 0.0 |
x_3 | -1.0 | inf | 0.0 |
x_4 | 0.0 | inf | 0.0 |
Fixing parameters via constraints#
Fixing parameters is very handy in complex optimizations. It is very simple in estimagic:
[22]:
constraints = [{"loc": ["x_0", "x_3"], "type": "fixed", "value": [1, 4]}]
res = minimize(
criterion=sphere,
params=start_params,
algorithm="tao_pounders",
derivative=sphere_gradient,
constraints=constraints,
)
res["solution_params"].round(2)
[22]:
lower_bound | upper_bound | value | |
---|---|---|---|
x_0 | -inf | inf | 1.0 |
x_1 | -inf | inf | 0.0 |
x_2 | -inf | inf | 0.0 |
x_3 | -inf | inf | 4.0 |
x_4 | -inf | inf | 0.0 |
As you probably suspect, the estimagic constraint syntax is much more general than what we just did. For details see how to specify constraints
Using and reading persistent logging#
In fact, we have already been using a persistent log the whole time. It is stored under “logging.db” in our working directory. If you want to store it in a different place, you can do that:
[23]:
res = minimize(
criterion=sphere,
params=start_params,
algorithm="scipy_lbfgsb",
derivative=sphere_gradient,
logging="my_log.db",
)
[24]:
# the second argument works like an index to a list, i.e.
# -1 gives the last entry
read_optimization_iteration("my_log.db", -1)
[24]:
{'rowid': 12,
'timestamp': datetime.datetime(2021, 1, 12, 17, 33, 36, 376057),
'exceptions': None,
'valid': True,
'hash': None,
'value': 1.0477058897466563e-30,
'root_contributions': x_0 1.110223e-16
x_1 2.220446e-16
x_2 0.000000e+00
x_3 4.440892e-16
x_4 -8.881784e-16
Name: value, dtype: float64,
'params': lower_bound upper_bound value group name
x_0 -inf inf 1.110223e-16 All Parameters x_0
x_1 -inf inf 2.220446e-16 All Parameters x_1
x_2 -inf inf 0.000000e+00 All Parameters x_2
x_3 -inf inf 4.440892e-16 All Parameters x_3
x_4 -inf inf -8.881784e-16 All Parameters x_4}
The persistent log file is always instantly synchronized when the optimizer tries a new parameter vector. This is very handy if an optimization has to be aborted and you want to extract the current status. It is also used by the estimagic dashboard.
Passing algorithm specific options to minimize#
Most algorithms have a few optional arguments. Examples are convergence criteria or tuning parameters. We standardize the names of these options as much as possible, but not all algorithms support all options. You can find an overview of supported arguments here.
[25]:
algo_options = {
"convergence.relative_criterion_tolerance": 1e-9,
"stopping.max_iterations": 100_000,
}
res = minimize(
criterion=sphere,
params=start_params,
algorithm="scipy_lbfgsb",
derivative=sphere_gradient,
algo_options=algo_options,
)
res["solution_params"].round(6)
[25]:
lower_bound | upper_bound | value | |
---|---|---|---|
x_0 | -inf | inf | 0.0 |
x_1 | -inf | inf | 0.0 |
x_2 | -inf | inf | 0.0 |
x_3 | -inf | inf | 0.0 |
x_4 | -inf | inf | -0.0 |