How to calculate second derivatives#

In this guide, we show you how to compute second derivatives with estimagic, while introducing some core concepts.

import estimagic as em
import numpy as np
import pandas as pd

Introduction#

Instead of the sphere function, let’s now look at an ellipse $\(f(x) = x^\top W x,\)\( with a weighting matrix \)W$.

The second derivative of \(f\) is given by \(f''(x) = W + W^\top\). With numerical derivatives, we have to specify the value of \(x\) at which we want to compute the derivative. Note that in this case the second derivative should be independent of the value of \(x\).

def ellipse_scalar(params):
    weight = 1
    return weight * params**2

Let’s first consider two scalar points \(x = 0\) and \(x=1\). Since the second derivative here is constant, we have \(f''(0) = f''(1) = 2\).

To compute the derivative using estimagic, we simply pass the function ellipse_scalar and params to the function second_derivative:

sd = em.second_derivative(func=ellipse_scalar, params=0)
sd["derivative"]
array(2.)
sd = em.second_derivative(func=ellipse_scalar, params=1)
sd["derivative"]
array(1.99999625)

Notice that the output of second_derivative is a dictionary containing the derivative under the key “derivative”. We discuss the ouput in more detail below.

Hessian and Batch-Hessian#

The scalar case from above extends directly to the multivariate case. Let’s consider two cases:

Hessian

\(f_1: \mathbb{R}^N \to \mathbb{R}\)

Batch-Hessian

\(f_2: \mathbb{R}^N \to \mathbb{R}^M\)

The second derivative of \(f_1\) is usually referred to as the Hessian, while the second derivative of \(f_2\) is usually called a Batch-Hessian.

Hessian#

Let’s again use the ellipse function, but this time with a vector input. The hessian is a 2-dimensional object of shape (N, N).

def ellipse(params):
    weight = np.arange(len(params) ** 2).reshape(len(params), len(params))
    return params @ weight @ params
sd = em.second_derivative(ellipse, params=np.arange(4))
sd["derivative"].round(2)
array([[ 0.,  5., 10., 15.],
       [ 5., 10., 15., 20.],
       [10., 15., 20., 25.],
       [15., 20., 25., 30.]])

Batch-Hessian#

As an example let’s now use the function $\(f(x) = (x^\top x) \begin{pmatrix}1\\2\\3 \end{pmatrix},\)\( with \)f: \mathbb{R}^N \to \mathbb{R}^3$. The Batch-Hessian is now a 3-dimensional object of shape (M, N, N), where M is the output dimension.

def ellipse_multivariate(params):
    weight = np.arange(len(params) ** 2).reshape(len(params), len(params))
    return (params @ weight @ params) * np.arange(3)
sd = em.second_derivative(ellipse_multivariate, params=np.arange(4))
sd["derivative"].round(2)
array([[[ 0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.]],

       [[ 0.,  5., 10., 15.],
        [ 5., 10., 15., 20.],
        [10., 15., 20., 25.],
        [15., 20., 25., 30.]],

       [[ 0., 10., 20., 30.],
        [10., 20., 30., 40.],
        [20., 30., 40., 50.],
        [30., 40., 50., 60.]]])

The output of second_derivative#

As we have already seen in the introduction, the output of first_derivative is a dictionary. This dictionary always contains an entry “derivative” which is the numerical derivative. Besides this entry, several additional entries may be found, conditional on the state of certain arguments.

return_func_value

If the argument return_func_value is True, the output dictionary will contain an additional entry under the key “func_value” denoting the function value evaluated at the params vector.

sd = em.second_derivative(
    ellipse_scalar, params=0, return_func_value=True, return_info=True
)
assert sd["func_value"] == ellipse_scalar(0)

The params argument#

Above we used a numpy.ndarray as the params argument. In estimagic, params can be arbitrary pytrees. Examples are (nested) dictionaries of numbers, arrays and pandas objects. Lets look at a few cases.

pandas#

params = pd.DataFrame(
    [["time_pref", "delta", 0.9], ["time_pref", "beta", 0.6], ["price", "price", 2]],
    columns=["category", "name", "value"],
).set_index(["category", "name"])

params
value
category name
time_pref delta 0.9
beta 0.6
price price 2.0
def ellipse_pandas(params):
    weight = np.arange(len(params) ** 2).reshape(len(params), len(params))
    return params["value"] @ weight @ params["value"]
sd = em.second_derivative(ellipse_pandas, params)
sd["derivative"]
category time_pref price
name delta beta price
category name
time_pref delta 0.000000 4.000009 7.999982
beta 4.000009 7.999659 11.999973
price price 7.999982 11.999973 15.999964

nested dicts#

params = {"a": 0, "b": 1, "c": pd.Series([2, 3, 4])}

params
{'a': 0,
 'b': 1,
 'c': 0    2
 1    3
 2    4
 dtype: int64}
def dict_sphere(params):
    return params["a"] ** 2 + params["b"] ** 2 + (params["c"] ** 2).sum()
sd = em.second_derivative(
    func=dict_sphere,
    params=params,
)

sd["derivative"]
{'a': {'a': array(2.00072215),
  'b': array(0.),
  'c': 0    0.0
  1    0.0
  2    0.0
  dtype: float64},
 'b': {'a': array(0.),
  'b': array(1.9999955),
  'c': 0    0.0
  1    0.0
  2    0.0
  dtype: float64},
 'c': {'a': 0    0.0
  1    0.0
  2    0.0
  dtype: float64,
  'b': 0    0.0
  1    0.0
  2    0.0
  dtype: float64,
  'c':           0         1         2
  0  1.999995  0.000004 -0.000003
  1  0.000004  2.000001  0.000000
  2 -0.000003  0.000000  1.999997}}

Description of the output#

The output of second_derivative when using a general pytrees looks more complex but is easy once we remember that the second derivative is equivalent to applying the first derivative twice. This explanation requires terminolgy of pytrees. Please refer to the JAX documentation of pytrees.

The output tree is a product of the params tree with itself. This is equivalent to the numpy case, where the hessian is a matrix of shape (len(params), len(params)). If, however, the params tree contains non-scalar entries like numpy.ndarray’s, pandas.Series’, or pandas.DataFrame’s, the output is not expanded but a block is created instead. In the above example, the entry params["c"] is a 3-dimensional pandas.Series. Thus, the second derivative output contains the corresponding 3x3-block of the hessian at the position ["c"]["c"]:

sd["derivative"]["c"]["c"].round(3)
0 1 2
0 2.0 0.0 -0.0
1 0.0 2.0 0.0
2 -0.0 0.0 2.0

Multiprocessing#

For slow-to-evaluate functions, one may increase computation speed by running the function evaluations in parallel. This can be easily done by setting the n_cores argument. For example, if we wish to evaluate the function on 2 cores we simply write

sd = em.second_derivative(ellipse_scalar, params=0, n_cores=2)