Download the notebook here
How to calculate first derivatives#
In this guide we discuss how to compute first derivatives with estimagic
. In the introduction we first introduce some core concepts. Aftwards we extend the scalar case to gradients and jacobians. Then we present a useful visual debugging feature and at last we talk about multiprocessing. For even more details on arguments and outputs see the API Reference.
[1]:
from functools import partial
import numpy as np
import pandas as pd
from estimagic.differentiation.derivatives import first_derivative
from estimagic.visualization.derivative_plot import derivative_plot
Introduction#
Lets start with differentiating a simple function, say a parabola. We want to differentiate the function \(f(x) = x^2\), which we write in Python as
[2]:
def f(x):
return x**2
The derivative of \(f\) is given by \(f'(x) = 2 x\). As we are computing numerical derivatives we have to specify the value of \(x\) at which we want to compute the derivative. We will consider two points points \(x = 0\) and \(x=1\) which gives \(f'(0) = 0\) and \(f'(1) = 2\).
To compute the derivative using estimagic
we simply pass the function f
and the params
vector at which we want to evaluate the derivative to the function first_derivative
:
[3]:
params = 0
derivative = first_derivative(f, params)
derivative["derivative"]
[3]:
array([-1.11022302e-16])
[4]:
params = 1
derivative = first_derivative(f, params)
derivative["derivative"]
[4]:
array([2.])
Notice that the output of first_derivative
is a dictionary containing the derivative under the key “derivative”.
Gradient and Jacobian#
The scalar case from above extends seamlessly to the multivariate case. Let us therefore consider two cases: (1) \(f_1: \mathbb{R}^N \to \mathbb{R}\) and (2) \(f_2: \mathbb{R}^N \to \mathbb{R}^M\). The first derivative of \(f_1\) is usually referred to as the gradient while the first derivative of \(f_2\) is usually called the Jacobian.
Lets choose simple functions \(f_1(x) = \sum_{n = 1}^N x_n^2\) and \(f_2 = Ax\), where \(A\) is a (\(M \times N\)) matrix full of ones except for the very first entry \(A_{1, 1} = 20\). We know that the gradient of \(f_1\) is given as \(\nabla f_1 (x) = 2 x^\top\) and the Jacobian of \(f_2\) is given as \(J(f_2)(x) = A\). Lets see if we can reproduce this result using the first_derivative
function.
[5]:
def f1(x):
return np.sum(x**2)
def f2(x, M):
A = np.ones((M, len(x)))
A[0, 0] = 20
return A @ x
N, M = 3, 2
params = np.ones(N)
[6]:
derivative = first_derivative(f1, params)
derivative["derivative"]
[6]:
array([2., 2., 2.])
[7]:
derivative = first_derivative(partial(f2, M=M), params)
derivative["derivative"]
[7]:
array([[20., 1., 1.],
[ 1., 1., 1.]])
The output of first_derivative
#
As we have already seen in the introduction, the output of first_derivative
is a dictionary. This dictionary always contains an entry “derivative” which is the numerical derivative. Besides this entry several additional entries can be found conditional on the state of certain arguments.
``return_func_value``
If the argument return_func_value
is True
the output dictionary will contain an additional entry under the key “func_value” denoting the function value evaluated at the params vector.
``return_info``
If the argument return_info
is True
the output dictionary will contain one to two additional entries. In this case it will always contain the entry “func_evals” which is a data frame containing all internally executed function evaluations. And if n_steps
is larger than 1, it will also contain “derivative_candidates”, which is a data frame containing derivative estimates used in the Richardson extrapolation.
For an explaination of the argument
n_steps
and the Richardson method please see the API Reference and the Richardson Extrapolation explaination in the documentation.
The objects returned when return_info
is True
are rarely of any use directly and can be safely ignored. However, they are necessary data when using the plotting function derivative_plot
as explained below. For a better understanding we print each of these additional objects once:
[8]:
derivative = first_derivative(f, 0, return_func_value=True, n_steps=2)
[9]:
derivative["func_value"] == f(0)
[9]:
True
[10]:
derivative["func_evals"]
[10]:
step | eval | ||||
---|---|---|---|---|---|
sign | step_number | dim_x | dim_f | ||
1 | 0 | 0 | 0 | 1.490116e-09 | 2.220446e-18 |
1 | 0 | 0 | 2.980232e-09 | 8.881784e-18 | |
-1 | 0 | 0 | 0 | 1.490116e-09 | 2.220446e-18 |
1 | 0 | 0 | 2.980232e-09 | 8.881784e-18 |
[11]:
derivative["derivative_candidates"]
[11]:
der | err | ||||
---|---|---|---|---|---|
method | num_term | dim_x | dim_f | ||
forward | 1 | 0 | 0 | 4.470348e-09 | 8.467416e-08 |
backward | 1 | 0 | 0 | -4.470348e-09 | 8.467415e-08 |
central | 1 | 0 | 0 | 3.700743e-17 | 1.292522e-15 |
The params argument#
Above we used a numpy.ndarray
as the params
argument. estimagic
is designed to work very well with named params, especially pandas.Series
and pandas.DataFrame
. In the case where the function to evaluate uses a data frame the application is the same. Consider the below function h
and the corresponding params
argument:
[12]:
def h(params):
a = params.loc[("time_pref", "delta"), "value"] ** 2
b = params.loc[("time_pref", "beta"), "value"] ** 2
return a + b
[13]:
params = pd.DataFrame(
[["time_pref", "delta", 0.9], ["time_pref", "beta", 0.6], ["price", "price", 2]],
columns=["category", "name", "value"],
).set_index(["category", "name"])
Nothing changes if we want to compute the derivative, however, output is a named pandas.Series
now, which makes it easier to match the derivative value with a specific parameter.
[14]:
derivative = first_derivative(h, params)
[15]:
derivative["derivative"]
[15]:
category name
time_pref delta 1.8
beta 1.2
price price 0.0
dtype: float64
Plotting#
In many cases one is interested in analyzing the estimated numerical derivative more closely. This can be helpful when one is debugging an optimization process which is stuck etc. The derivatives are drawn as the first-order Taylor approximation.
The function derivative_plot
works on the dictionary returned by first_derivative
, as long as return_func_value
and return_info
are set to True and n_steps
is larger than 1.
The figure visualizes the function evaluations, the best estimate of the derivative and forward, central and backward derivative estimates. Forward and backward estimates come with bands that are calculated by applying the standard (forward/backward) formula on the smallest and largest possible steps. These bands are not confidence intervals, they shall merely give a rough overview where the true derivative may lie.
[16]:
derivative = first_derivative(f, 0, n_steps=4, return_func_value=True)
[17]:
fig = derivative_plot(derivative)
Multivariate functions#
For multivariate input or output functions the plot generalizes seamlessly:
[18]:
def g(x):
y1 = x[0] ** 3 + x[1]
y2 = x[2] ** 2 - x[0]
return np.array([y1, y2])
params = np.zeros(3)
[19]:
derivative = first_derivative(g, params, n_steps=4, return_func_value=True)
[20]:
fig = derivative_plot(derivative)
Multiprocessing#
For slow-to-evaluate functions one may increase computation speed by running the function evaluations in parallel. This can be easily done by setting the n_cores
argument. For example, if we wish to evaluate the function on 2
cores we simply write
[21]:
derivative = first_derivative(f, params=0, n_cores=2)