Download the notebook here

[26]:
import numpy as np
import pandas as pd

from estimagic import minimize
from estimagic.logging.read_log import read_optimization_iteration

First Optimization with estimagic

This tutorial shows how to do an optimization with estimagic. It uses a very simple criterion function in order to focus on the mechanics of doing an optimization. A more interesting example can be found in the ordered logit example. More details on the topics covered here can be found in the how to guides.

Setting up criterion function and derivatives

Criterion functions in estimagic take a DataFrame with parameters as first argument and return a dictionary that contains the output of the criterion function.

The output dictionary must contain the entry “value”, which is a scalar but can also contain an arbitrary number of additional entries. Entries with special meaning are “contributions” and “root_contributions”, which are used by specialized optimizers (e.g. nonlinear least squares optimizers use the “root_contributions”). All other entries are simply stored in a log file. If none of the optional entries are required, the criterion function can also simply return a scalar.

[27]:
def sphere(params):
    """Spherical criterion function.

    The unique local and global optimum of this function is at
    the zero vector. It is differentiable, convex and extremely
    well behaved in any possible sense.

    Args:
        params (pandas.DataFrame): DataFrame with the columns
            "value", "lower_bound", "upper_bound" and potentially more.

    Returns:
        dict: A dictionary with the entries "value" and "root_contributions".

    """
    out = {
        "value": (params["value"] ** 2).sum(),
        "root_contributions": params["value"],
    }
    return out


def sphere_gradient(params):
    """Gradient of spherical criterion function"""
    return params["value"] * 2

Setting up start parameters

The start parameters must contain the column “value” but can also contain an arbitrary number of other columns. Columns with special meaning are “lower_bound”, “upper_bound”, “name” and “group”. The bounds are used during optimization, name and group are used in the dashboard.

They can have an arbitrary index or even MultiIndex. This is very helpful to organize parameters in a complex optimization problem.

[28]:
start_params = pd.DataFrame(
    data=np.arange(5) + 1,
    columns=["value"],
    index=[f"x_{i}" for i in range(5)],
)
start_params
[28]:
value
x_0 1
x_1 2
x_2 3
x_3 4
x_4 5

Running a simple optimization

Estimagic’s minimize function works similarly to scipy’s minimize function. A big difference is however, that estimagic does not have a default optimization algorithm. This is on purpose, because the algorithm choice should always be dependent on the problem one wants to solve.

Another difference is that estimagic also has a maximize function that works exactly as minimize, but does a maximization.

The output of minimize is a dictionary, that contains the solution parameters and criterion values as well as other information.

[29]:
res = minimize(
    criterion=sphere,
    params=start_params,
    algorithm="scipy_lbfgsb",
    derivative=sphere_gradient,
)
res
[29]:
{'solution_x': array([ 1.11022302e-16,  2.22044605e-16,  0.00000000e+00,  4.44089210e-16,
        -8.88178420e-16]),
 'solution_criterion': 1.0477058897466563e-30,
 'solution_derivative': array([ 2.22044605e-16,  4.44089210e-16,  0.00000000e+00,  8.88178420e-16,
        -1.77635684e-15]),
 'solution_hessian': None,
 'n_criterion_evaluations': 3,
 'n_derivative_evaluations': None,
 'n_iterations': 2,
 'success': True,
 'reached_convergence_criterion': None,
 'message': b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL',
 'solution_params':      lower_bound  upper_bound         value
 x_0         -inf          inf  1.110223e-16
 x_1         -inf          inf  2.220446e-16
 x_2         -inf          inf  0.000000e+00
 x_3         -inf          inf  4.440892e-16
 x_4         -inf          inf -8.881784e-16}
[19]:
res["solution_params"].round(6)
[19]:
lower_bound upper_bound value
x_0 -inf inf 0.0
x_1 -inf inf 0.0
x_2 -inf inf 0.0
x_3 -inf inf 0.0
x_4 -inf inf -0.0

Running an optimization with a least squares optimizer

Using a least squares optimizer in estimagic is exactly the same as using another optimizer. That is the goal and result of allowing the output of the criterion function to be a dictionary.

[20]:
res = minimize(
    criterion=sphere,
    params=start_params,
    algorithm="tao_pounders",
    derivative=sphere_gradient,
)
res["solution_params"].round(2)
[20]:
lower_bound upper_bound value
x_0 -inf inf 0.0
x_1 -inf inf 0.0
x_2 -inf inf 0.0
x_3 -inf inf 0.0
x_4 -inf inf 0.0

Adding bounds

Bounds are simply added as additional columns in the start parameters. If a parameter has no bound, use np.inf for upper bounds and -np.inf for lower bounds.

[21]:
params_with_bounds = start_params.copy()
params_with_bounds["lower_bound"] = [0, 1, 0, -1, 0]
params_with_bounds["upper_bound"] = [np.inf] * 5

res = minimize(
    criterion=sphere,
    params=params_with_bounds,
    algorithm="scipy_lbfgsb",
    derivative=sphere_gradient,
)
res["solution_params"].round(6)
[21]:
lower_bound upper_bound value
x_0 0.0 inf 0.0
x_1 1.0 inf 1.0
x_2 0.0 inf 0.0
x_3 -1.0 inf 0.0
x_4 0.0 inf 0.0

Fixing parameters via constraints

Fixing parameters is very handy in complex optimizations. It is very simple in estimagic:

[22]:
constraints = [{"loc": ["x_0", "x_3"], "type": "fixed", "value": [1, 4]}]
res = minimize(
    criterion=sphere,
    params=start_params,
    algorithm="tao_pounders",
    derivative=sphere_gradient,
    constraints=constraints,
)
res["solution_params"].round(2)
[22]:
lower_bound upper_bound value
x_0 -inf inf 1.0
x_1 -inf inf 0.0
x_2 -inf inf 0.0
x_3 -inf inf 4.0
x_4 -inf inf 0.0

As you probably suspect, the estimagic constraint syntax is much more general than what we just did. For details see how to specify constraints

Using and reading persistent logging

In fact, we have already been using a persistent log the whole time. It is stored under “logging.db” in our working directory. If you want to store it in a different place, you can do that:

[23]:
res = minimize(
    criterion=sphere,
    params=start_params,
    algorithm="scipy_lbfgsb",
    derivative=sphere_gradient,
    logging="my_log.db",
)
[24]:
# the second argument works like an index to a list, i.e.
# -1 gives the last entry
read_optimization_iteration("my_log.db", -1)
[24]:
{'rowid': 12,
 'timestamp': datetime.datetime(2021, 1, 12, 17, 33, 36, 376057),
 'exceptions': None,
 'valid': True,
 'hash': None,
 'value': 1.0477058897466563e-30,
 'root_contributions': x_0    1.110223e-16
 x_1    2.220446e-16
 x_2    0.000000e+00
 x_3    4.440892e-16
 x_4   -8.881784e-16
 Name: value, dtype: float64,
 'params':      lower_bound  upper_bound         value           group name
 x_0         -inf          inf  1.110223e-16  All Parameters  x_0
 x_1         -inf          inf  2.220446e-16  All Parameters  x_1
 x_2         -inf          inf  0.000000e+00  All Parameters  x_2
 x_3         -inf          inf  4.440892e-16  All Parameters  x_3
 x_4         -inf          inf -8.881784e-16  All Parameters  x_4}

The persistent log file is always instantly synchronized when the optimizer tries a new parameter vector. This is very handy if an optimization has to be aborted and you want to extract the current status. It is also used by the estimagic dashboard.

Passing algorithm specific options to minimize

Most algorithms have a few optional arguments. Examples are convergence criteria or tuning parameters. We standardize the names of these options as much as possible, but not all algorithms support all options. You can find an overview of supported arguments here.

[25]:
algo_options = {
    "convergence.relative_criterion_tolerance": 1e-9,
    "stopping.max_iterations": 100_000,
}

res = minimize(
    criterion=sphere,
    params=start_params,
    algorithm="scipy_lbfgsb",
    derivative=sphere_gradient,
    algo_options=algo_options,
)
res["solution_params"].round(6)
[25]:
lower_bound upper_bound value
x_0 -inf inf 0.0
x_1 -inf inf 0.0
x_2 -inf inf 0.0
x_3 -inf inf 0.0
x_4 -inf inf -0.0