How to calculate first derivatives#

In this guide, we show you how to compute first derivatives with estimagic - while introducing some core concepts.

import estimagic as em
import numpy as np
import pandas as pd

Introduction#

As in the getting started section, let’s lookt at the sphere function $\(f(x) = x^\top x.\)$

def sphere_scalar(params):
    return params**2

The derivative of \(f\) is given by \(f'(x) = 2 x\). With numerical derivatives, we have to specify the value of \(x\) at which we want to compute the derivative. Let’s first consider two scalar points \(x = 0\) and \(x=1\). We have \(f'(0) = 0\) and \(f'(1) = 2\).

To compute the derivative using estimagicm we simply pass the function sphere and the params to the function first_derivative:

fd = em.first_derivative(func=sphere_scalar, params=0)
fd["derivative"]
array(0.)
fd = em.first_derivative(func=sphere_scalar, params=1)
fd["derivative"]
array(2.)

Notice that the output of first_derivative is a dictionary containing the derivative under the key “derivative”. We discuss the ouput in more detail below.

Gradient and Jacobian#

The scalar case from above extends directly to the multivariate case. Let’s consider two cases:

Gradient

\(f_1: \mathbb{R}^N \to \mathbb{R}\)

Jacobian

\(f_2: \mathbb{R}^N \to \mathbb{R}^M\)

The first derivative of \(f_1\) is usually referred to as the gradient, while the first derivative of \(f_2\) is usually called the Jacobian.

Gradient#

Let’s again use the sphere function, but this time with a vector input. The gradient is a 1-dimensional vector of shape (N,).

def sphere(params):
    return params @ params
fd = em.first_derivative(sphere, params=np.arange(4))
fd["derivative"]
array([0., 2., 4., 6.])

Jacobian#

As an example, let’s now use the function $\(f(x) = (x^\top x) \begin{pmatrix}1\\2\\3 \end{pmatrix},\)\( with \)f: \mathbb{R}^N \to \mathbb{R}^3$. The Jacobian is a 2-dimensional object of shape (M, N), where M is the output dimension.

def sphere_multivariate(params):
    return (params @ params) * np.arange(3)
fd = em.first_derivative(sphere_multivariate, params=np.arange(4))
fd["derivative"]
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  2.,  4.,  6.],
       [ 0.,  4.,  8., 12.]])

The output of first_derivative#

As we have already seen in the introduction, the output of first_derivative is a dictionary. This dictionary always contains an entry “derivative” which is the numerical derivative. Besides this entry, several additional entries may be found, conditional on the state of certain arguments.

return_func_value

If the argument return_func_value is True, the output dictionary will contain an additional entry under the key “func_value” denoting the function value evaluated at the params vector.

return_info

If the argument return_info is True, the output dictionary will contain one to two additional entries. In this case it will always contain the entry “func_evals”, which is a data frame containing all internally executed function evaluations. And if n_steps is larger than 1, it will also contain “derivative_candidates”, which is a data frame containing derivative estimates used in the Richardson extrapolation.

For an explaination of the argument n_steps and the Richardson method, please see the API Reference and the Richardson Extrapolation explanation in the documentation.

The objects returned when return_info is True are rarely of any use directly and can be safely ignored. However, they are necessary data when using the plotting function derivative_plot as explained below. For better understanding, we print each of these additional objects once:

fd = em.first_derivative(
    sphere_scalar, params=0, n_steps=2, return_func_value=True, return_info=True
)
assert fd["func_value"] == sphere_scalar(0)
fd["func_evals"]
step eval
sign step_number dim_x dim_f
1 0 0 0 1.490116e-09 2.220446e-18
1 0 0 2.980232e-09 8.881784e-18
-1 0 0 0 1.490116e-09 2.220446e-18
1 0 0 2.980232e-09 8.881784e-18
fd["derivative_candidates"]
der err
method num_term dim_x dim_f
forward 1 0 0 4.470348e-09 8.467417e-08
backward 1 0 0 -4.470348e-09 8.467417e-08
central 1 0 0 0.000000e+00 0.000000e+00

The params argument#

Above we used a numpy.ndarray as the params argument. In estimagic, params can be arbitrary pytrees. Examples are (nested) dictionaries of numbers, arrays, and pandas objects. Let’s look at a few cases.

pandas#

params = pd.DataFrame(
    [["time_pref", "delta", 0.9], ["time_pref", "beta", 0.6], ["price", "price", 2]],
    columns=["category", "name", "value"],
).set_index(["category", "name"])

params
value
category name
time_pref delta 0.9
beta 0.6
price price 2.0
def sphere_pandas(params):
    return params["value"] @ params["value"]
fd = em.first_derivative(sphere_pandas, params)
fd["derivative"]
category   name 
time_pref  delta    1.8
           beta     1.2
price      price    4.0
dtype: float64

nested dicts#

params = {"a": 0, "b": 1, "c": pd.Series([2, 3, 4])}

params
{'a': 0,
 'b': 1,
 'c': 0    2
 1    3
 2    4
 dtype: int64}
def dict_sphere(params):
    return params["a"] ** 2 + params["b"] ** 2 + (params["c"] ** 2).sum()
fd = em.first_derivative(
    func=dict_sphere,
    params=params,
)

fd["derivative"]
{'a': array(0.),
 'b': array(2.),
 'c': 0    4.0
 1    6.0
 2    8.0
 dtype: float64}

Description of the output#

The output of first_derivative when using a general pytree is straight-forward. Nevertheless, this explanation requires terminolgy of pytrees. Please refer to the JAX documentation of pytrees.

The output tree of first_derivative has the same structure as the params tree. Equivalent to the numpy case, where the gradient is a vector of shape (len(params),). If, however, the params tree contains non-scalar entries, like numpy.ndarray’s, pandas.Series’, or pandas.DataFrame’s, the output is not expanded but a block is created instead. In the above example, the entry params["c"] is a pandas.Series with 3 entries. Thus, the first derivative output contains the corresponding 3x1-block of the gradient at the position ["c"]:

Multiprocessing#

For slow-to-evaluate functions, one may increase computation speed by running the function evaluations in parallel. This can be easily done by setting the n_cores argument. For example, if we wish to evaluate the function on 2 cores, we simply write

fd = em.first_derivative(sphere_scalar, params=0, n_cores=2)