How to calculate first derivatives#
In this guide, we show you how to compute first derivatives with estimagic - while introducing some core concepts.
import estimagic as em
import numpy as np
import pandas as pd
Introduction#
As in the getting started section, let’s lookt at the sphere function $\(f(x) = x^\top x.\)$
def sphere_scalar(params):
return params**2
The derivative of \(f\) is given by \(f'(x) = 2 x\). With numerical derivatives, we have to specify the value of \(x\) at which we want to compute the derivative. Let’s first consider two scalar points \(x = 0\) and \(x=1\). We have \(f'(0) = 0\) and \(f'(1) = 2\).
To compute the derivative using estimagicm we simply pass the function sphere
and the params
to the function first_derivative
:
fd = em.first_derivative(func=sphere_scalar, params=0)
fd["derivative"]
array(0.)
fd = em.first_derivative(func=sphere_scalar, params=1)
fd["derivative"]
array(2.)
Notice that the output of first_derivative
is a dictionary containing the derivative under the key “derivative”. We discuss the ouput in more detail below.
Gradient and Jacobian#
The scalar case from above extends directly to the multivariate case. Let’s consider two cases:
Gradient |
\(f_1: \mathbb{R}^N \to \mathbb{R}\) |
Jacobian |
\(f_2: \mathbb{R}^N \to \mathbb{R}^M\) |
The first derivative of \(f_1\) is usually referred to as the gradient, while the first derivative of \(f_2\) is usually called the Jacobian.
Gradient#
Let’s again use the sphere function, but this time with a vector input. The gradient is a 1-dimensional vector of shape (N,).
def sphere(params):
return params @ params
fd = em.first_derivative(sphere, params=np.arange(4))
fd["derivative"]
array([0., 2., 4., 6.])
Jacobian#
As an example, let’s now use the function $\(f(x) = (x^\top x) \begin{pmatrix}1\\2\\3 \end{pmatrix},\)\( with \)f: \mathbb{R}^N \to \mathbb{R}^3$. The Jacobian is a 2-dimensional object of shape (M, N), where M is the output dimension.
def sphere_multivariate(params):
return (params @ params) * np.arange(3)
fd = em.first_derivative(sphere_multivariate, params=np.arange(4))
fd["derivative"]
array([[ 0., 0., 0., 0.],
[ 0., 2., 4., 6.],
[ 0., 4., 8., 12.]])
The output of first_derivative
#
As we have already seen in the introduction, the output of first_derivative
is a dictionary. This dictionary always contains an entry “derivative” which is the numerical derivative. Besides this entry, several additional entries may be found, conditional on the state of certain arguments.
return_func_value
If the argument return_func_value
is True
, the output dictionary will contain an additional entry under the key “func_value” denoting the function value evaluated at the params vector.
return_info
If the argument return_info
is True
, the output dictionary will contain one to two additional entries. In this case it will always contain the entry “func_evals”, which is a data frame containing all internally executed function evaluations. And if n_steps
is larger than 1, it will also contain “derivative_candidates”, which is a data frame containing derivative estimates used in the Richardson extrapolation.
For an explaination of the argument
n_steps
and the Richardson method, please see the API Reference and the Richardson Extrapolation explanation in the documentation.
The objects returned when return_info
is True
are rarely of any use directly and can be safely ignored. However, they are necessary data when using the plotting function derivative_plot
as explained below. For better understanding, we print each of these additional objects once:
fd = em.first_derivative(
sphere_scalar, params=0, n_steps=2, return_func_value=True, return_info=True
)
assert fd["func_value"] == sphere_scalar(0)
fd["func_evals"]
step | eval | ||||
---|---|---|---|---|---|
sign | step_number | dim_x | dim_f | ||
1 | 0 | 0 | 0 | 1.490116e-09 | 2.220446e-18 |
1 | 0 | 0 | 2.980232e-09 | 8.881784e-18 | |
-1 | 0 | 0 | 0 | 1.490116e-09 | 2.220446e-18 |
1 | 0 | 0 | 2.980232e-09 | 8.881784e-18 |
fd["derivative_candidates"]
der | err | ||||
---|---|---|---|---|---|
method | num_term | dim_x | dim_f | ||
forward | 1 | 0 | 0 | 4.470348e-09 | 8.467417e-08 |
backward | 1 | 0 | 0 | -4.470348e-09 | 8.467417e-08 |
central | 1 | 0 | 0 | 0.000000e+00 | 0.000000e+00 |
The params
argument#
Above we used a numpy.ndarray
as the params
argument. In estimagic, params can be arbitrary pytrees. Examples are (nested) dictionaries of numbers, arrays, and pandas objects. Let’s look at a few cases.
pandas#
params = pd.DataFrame(
[["time_pref", "delta", 0.9], ["time_pref", "beta", 0.6], ["price", "price", 2]],
columns=["category", "name", "value"],
).set_index(["category", "name"])
params
value | ||
---|---|---|
category | name | |
time_pref | delta | 0.9 |
beta | 0.6 | |
price | price | 2.0 |
def sphere_pandas(params):
return params["value"] @ params["value"]
fd = em.first_derivative(sphere_pandas, params)
fd["derivative"]
category name
time_pref delta 1.8
beta 1.2
price price 4.0
dtype: float64
nested dicts#
params = {"a": 0, "b": 1, "c": pd.Series([2, 3, 4])}
params
{'a': 0,
'b': 1,
'c': 0 2
1 3
2 4
dtype: int64}
def dict_sphere(params):
return params["a"] ** 2 + params["b"] ** 2 + (params["c"] ** 2).sum()
fd = em.first_derivative(
func=dict_sphere,
params=params,
)
fd["derivative"]
{'a': array(0.),
'b': array(2.),
'c': 0 4.0
1 6.0
2 8.0
dtype: float64}
Description of the output#
The output of first_derivative
when using a general pytree is straight-forward. Nevertheless, this explanation requires terminolgy of pytrees. Please refer to the JAX documentation of pytrees.
The output tree of first_derivative
has the same structure as the params tree. Equivalent to the numpy case, where the gradient is a vector of shape (len(params),)
. If, however, the params tree contains non-scalar entries, like numpy.ndarray
’s, pandas.Series
’, or pandas.DataFrame
’s, the output is not expanded but a block is created instead. In the above example, the entry params["c"]
is a pandas.Series
with 3 entries. Thus, the first derivative output contains the corresponding 3x1-block of the gradient at the position ["c"]
:
Multiprocessing#
For slow-to-evaluate functions, one may increase computation speed by running the function evaluations in parallel. This can be easily done by setting the n_cores
argument. For example, if we wish to evaluate the function on 2
cores, we simply write
fd = em.first_derivative(sphere_scalar, params=0, n_cores=2)